Quantcast
Channel: Community : All Content - OpenCL
Viewing all 2400 articles
Browse latest View live

Silent __private memory size limit?

$
0
0

Hi,

our application launches few OpenCL kernels in a loop, each iteration waiting for the previous one to complete (clFinish). One of the kernels is quite complex and uses nearly 18 kB of private memory per work item. We had very hard time making it work on AMD platform (no significant problems with nVidia or Intel). The application ran OK for few iterations of the loop and then suddenly enqueuing of the complex kernel started returning "out of resources" error. Compilation and first enqueue calls were all OK. Finally we tried replacing the __private memory buffers with pieces of __global buffer for each work item (reducing __private usage to about 3 kB per work item) and it started working even on AMD.

 

My question: Is there any private memory size limit? I'd like to know whether we have fixed the issue in our code (reduced private memory usage) or only fixed one of side effects of some bug which is still there.

 

All of this was happening on Ubuntu linux (12.04) with following driver:

[6.750882] <6>[fglrx] module loaded - fglrx 13.35.5 [Mar 12 2014] with 1 minors

 

When we tried with Windows 7, the graphics driver always crashed.

 

Thanks,

Martin Jirman


Is possible to compile OpenCL code for all current video cards?

$
0
0

I need to distribute my program, which use opencl code without opencl sources. So i need to compile it to binary and load program from binary. This is simple, but each video card compile different binary for it self. How can i avoid this, and compile my source code for different videocards, whitout having this videocards. Maybe some compiler exist, where i can set spesific video card and get binary code for it? Thanks!

OpenCL Image2D Format

$
0
0

Hello Guys,

 

I am starting with OpenCL to image processing... So I am starting to learn the basics of Image objects and how to upload image data to an OpenCL device.

 

I read the specification, even bought a book ( OpenCL in Action ) and I am doing fine until now.

 

But... I am facing a problem, ( I think its a problem ) about image data format.

 

To make my project some kind of portable I am using ANSI C and OpenCL ( to processing ) + SDL to GUI and image loading functions.

 

SDL load image functions give me an byte array of pixel data... the format is the same of a simple BMP 3Bytes per pixel data ( 1byte to BLUE, GREEN and RED ).

 

To create an image2d ( clCreateImage2d ) I got the following method declaration:

 

cl_memclCreateImage2D(cl_contextcontext,
cl_mem_flagsflags,
constcl_image_format*image_format,
size_timage_width,
size_timage_height,
size_timage_row_pitch,
void*host_ptr,
cl_int*errcode_ret)

 

TheimageFormatargument is intriguing me... (cl_image_format). The docs say I can use a "CL_BGRA" format.. so I must put an extra byte for each pixel in my buffer right? ( today my buffer is [b,g,r][b,g,r]... I will add an extra byte so it will become [b,g,r,1][b,g,r,1] )... its the right approach?

 

What really intrigues me is the second field of cl_image_format. (cl_channel_type image_channel_data_type) the cl_channel_type enumeration says I can use CL_UNSIGNED_INT8 as data type... It is messing with my mind...

 

The docs says ( about CL_UNSIGNED_INT8 ):Each channel component is an unnormalized unsigned 8-bit integer value.

 

In myhost program,my byte array is made of"unsigned int"( 4byte each )... so If I send it to a clCreateImage2D using theCL_UNSIGNED_INT8as acl_channel_typein mycl_image_formatparameter it will work? OpenCL will convert my 4byte info to a 8byte info?

or may I convert my byte buffer from integer to long/double values?

 

What I am missing? I think it maybe simpler... but I am missing something I cant see... May someone give-me a hand?

Calculation error on GPU only

$
0
0

Hi,

 

as updates to older threads don't seem to receive a lot of attention, I'm creating this new one. I'm asking to head over to the original thread and help me find a workaround or solution to this "behavior" which I'd call an AMD-OCL-compiler bug (until proven otherwise ;-).

 

Thanks

Kernel Compilation "LLVM ERROR"

$
0
0

Here is the OpenCL (I've marked the statements that seem to cause the issue - lines 8 and 21):

(If I were to change tempint on those lines to any literal uint the kernel compiles fine - madness)

uint wide_add_vector(uint* res, const uint* a, const uint* b)
{  ulong carry=0;  #pragma unroll    for(uint i=0;i<4;i++){  ulong tmp=(ulong)(a[i])+b[i]+carry;  uint tempint = (uint)(tmp&0xFFFFFFFF);  res[i] = tempint; // <---- Problem statement  carry=tmp>>32;  }  return carry;
}


uint wide_add_scalar(uint* res, const uint* a, uint b)
{
  ulong carry=b;  #pragma unroll    for(uint i=0;i<4;i++){  ulong tmp=a[i]+carry;  uint tempint = (uint)(tmp&0xFFFFFFFF);  res[i] = tempint; // <---- Problem statement  carry=tmp>>32;  }  return carry;
}


void wide_mul(uint* res_hi, uint* res_lo, const uint* a, const uint* b)
{


  ulong carry=0, acc=0;  #pragma unroll    for(uint i=0; i<4; i++){  #pragma unroll        for(uint j=0; j<=i; j++){  ulong tmp=(ulong)(a[j])*b[i-j];  acc+=tmp;            carry+=(acc < tmp);  }  res_lo[i]=(uint)(acc&0xFFFFFFFF);  acc= (carry<<32) | (acc>>32);  carry=carry>>32;  }  #pragma unroll    for(uint i=1; i<4; i++){  #pragma unroll        for(uint j=i; j<4; j++){  ulong tmp=(ulong)(a[j])*b[4-j+i-1];  acc+=tmp;            carry+=(acc < tmp);  }  res_hi[i-1]=(uint)(acc&0xFFFFFFFF);  acc= (carry<<32) | (acc>>32);  carry=carry>>32;  }  res_hi[3]=acc;
}


void wide_copy_global(__global uint *res, const uint *a)
{
  #pragma unroll    for(uint i=0;i<8;i++){  res[i]=a[i];  }
}


__kernel void bitecoin_miner(ulong roundId,ulong roundSalt,ulong chainHash, uint4 c, uint hashSteps, __global uint* proofBuffer)
{
    uint workerID = get_global_id(0);        uint cArray[4] = {c.x,c.y,c.z,c.w};        uint x[8] = {workerID,0,(uint)roundId,(uint)roundId,(uint)roundSalt,(uint)roundSalt,(uint)chainHash,(uint)chainHash};        for(uint j=0;j<hashSteps;j++)    {        uint tmp[8];                wide_mul(tmp+4, tmp, x, cArray); // cArray; not to be confused with carry.                uint carry=wide_add_vector(x, tmp, x+4);                wide_add_scalar(x+4, tmp+4, carry);    }        wide_copy_global(proofBuffer+8*workerID,x);
}

 

When run I get:

LogLevel = 2 -> 2
[MyClient], 1395075385.62, 2, Created log.
Will try to connect to address Minty at port 4000
Found 1 platforms  Platform 0 : Advanced Micro Devices, Inc.
Choosing platform 0
Found 2 devices  Device 0 : Tahiti  Device 1 : Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
Choosing device 0
LLVM ERROR: Cannot select: 0x855acbc3a0: i32 = setcc 0x855acbcca0, 0x855ac3a080, 0x855ac3a480 [ORD=52] [ID=30]  0x855acbcca0: i64 = add 0x855ac3a080, 0x855ac3aa80 [ORD=49] [ID=28]    0x855ac3a080: i64,ch = CopyFromReg 0x855ac2b1d0, 0x855ac3a680 [ORD=49] [ID=19]      0x855ac3a680: i64 = Register %vreg33 [ORD=49] [ID=7]    0x855ac3aa80: i64 = mul 0x855acbcda0, 0x855ac37450 [ORD=48] [ID=27]      0x855acbcda0: i64,ch = load 0x855ac2b1d0, 0x855ac37250, 0x855ac3a380<LD4[%scevgep106], zext from i32> [ORD=47] [ID=26]        0x855ac37250: i32 = add 0x855ac36640, 0x855ac38960 [ORD=45] [ID=25]          0x855ac36640: i32 = sub 0x855ac37850, 0x855ac37050 [ORD=44] [ID=24]            0x855ac37850: i32 = FrameIndex<0> [ORD=41] [ID=1]            0x855ac37050: i32 = shl 0x855acbbc90, 0x855ac3a980 [ORD=44] [ID=23]              0x855acbbc90: i32,ch = CopyFromReg 0x855ac2b1d0, 0x855ac36940 [ORD=43] [ID=18]                0x855ac36940: i32 = Register %vreg30 [ORD=43] [ID=3]              0x855ac3a980: i32 = Constant<2> [ORD=44] [ID=4]          0x855ac38960: i32 = Constant<8> [ORD=45] [ID=5]        0x855ac3a380: i32 = undef [ORD=46] [ID=6]      0x855ac37450: i64 = zero_extend 0x855acbbd90 [ORD=42] [ID=21]        0x855acbbd90: i32,ch = CopyFromReg 0x855ac2b1d0, 0x855acbba90 [ORD=42] [ID=17]          0x855acbba90: i32 = Register %vreg31 [ORD=42] [ID=2]  0x855ac3a080: i64,ch = CopyFromReg 0x855ac2b1d0, 0x855ac3a680 [ORD=49] [ID=19]    0x855ac3a680: i64 = Register %vreg33 [ORD=49] [ID=7]
In function: __OpenCL_bitecoin_miner_kernel
Press any key to continue . . .

 

If I put it into Kernel Analyzer it just freezes.

 

Any ideas?

 

The system is:

Windows 8.1 64-bit, Visual Studio 2013

HD7970 Driver Version 13.350.1005.0

Catalyst 14.2

AMD APP SDK 2.9

 

Many Thanks

Henry

OpenCL FFT implementation

$
0
0

Hi all,

I`m trying to understand the following article related to subject:

 

http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-case-study-fast-fourier-transform-part-ii/

 

namely FFT_64 kernel. Author says:

 

"The above shown listing begins with a function map_id() that computes the relative memory offsets within each workgroup."

 

But how exactly should this function look like? I guess it should map instances to avoid bank conflicts but have no idea how exactly should this function be implemented. Can someone help me with that?

 

Thanks,

-Sergio

x264 demo not working on AMD platform

$
0
0

Hi,

 

I have downloaded a OpenCL enabled x264 encoder from Universität Heidelberg , compilation was fine on AMD platform but it failed to run on Ubuntu_13.10 + HD7970, segmentation fault.

Sharing the compiled binaries here:

 

Command to run the demo: # ./x264 --threads 8 -A none --no-cabac --no-deblock --subme 0 --me dia --qp 16 --output out.264 result.y4m

 

change x264 executable permission if required.

 

Can some one help me on it?

Silent __private memory size limit?

$
0
0

Hi,

our application launches few OpenCL kernels in a loop, each iteration waiting for the previous one to complete (clFinish). One of the kernels is quite complex and uses nearly 18 kB of private memory per work item. We had very hard time making it work on AMD platform (no significant problems with nVidia or Intel). The application ran OK for few iterations of the loop and then suddenly enqueuing of the complex kernel started returning "out of resources" error. Compilation and first enqueue calls were all OK. Finally we tried replacing the __private memory buffers with pieces of __global buffer for each work item (reducing __private usage to about 3 kB per work item) and it started working even on AMD.

 

My question: Is there any private memory size limit? I'd like to know whether we have fixed the issue in our code (reduced private memory usage) or only fixed one of side effects of some bug which is still there.

 

All of this was happening on Ubuntu linux (12.04) with following driver:

[6.750882] <6>[fglrx] module loaded - fglrx 13.35.5 [Mar 12 2014] with 1 minors

 

When we tried with Windows 7, the graphics driver always crashed.

 

Thanks,

Martin Jirman


Different behaviors when device has reached its maximum global memory limit

$
0
0

Hello,

 

I have a HD 7970 with 6G and I have an OpenCL program(.exe) that takes up about 3.3G of Global memory.

 

Two odd behaviors:

First I'll call the OpenCL program I want to start myOpenCLApp.exe

 

Behavior 1:

I am able to create two myOpenCLApp.exe and successfully get an output by spawning it as a process via CreateProcessA of windows API. This is odd since creating two myOpenCLApp.exe surpasses my global memory limit by ~600Mb. I observe the same behavior with smaller global memory on other GPU devices i.e NVidia. Using my tools, I throttling on GPU activity and the execution time slows down considerably.

I create the process like this:

CreateProcessA("myOpenCLApp.exe", NULL, NULL, NULL,false, 0, NULL, NULL, &sinfo, &pinfo);

CreateProcessA("myOpenCLApp.exe", NULL, NULL, NULL,false, 0, NULL, NULL, &sinfo, &pinfo);

Total memory for both processes: 6.63GB. This is puzzling as I expect the second call to CreateProcessA to fail. However, what seems to happen is that both processes run just fine just really slow.

 

What is this behavior I'm seeing and where can I find more info on it? I have not seem much material online about this.

 

Behavior 2:

I used the command line for this.

On the command line I use:

start /b myOpenCLApp.exe

 

and again

start /b myOpenCLApp.exe

 

This gives me an error when I try to start the second process CL_MEM_OBJECT_ALLOCATION_FAILURE .

I expect to see this when starting the process the way I did for Behavior 1.

What's going on here?

 

Behavior 1 has one common parent while Behavior 2 is somehow different?

 

I also observe the behavior 1's pattern when applying to multiple threads and one common parent.

 

Please let me know if I missed something obvious

 

Regards.

4x r9-280x/sky700 6GB cards memory allocation issue

$
0
0

Hi,

 

When we run more than 2x 6GB cards (either 4x R9-280x or 4x Sky700) in a 4 card machine, the AMD driver will not allocate more than 3GB (with 4 cards, for 3 cards, see below) per card even with the GPU_MAX_ALLOC_PERCENT and GPU_FORCE_64BIT_PTR set.

 

No problem with 2 cards.

 

Any pointers?

 

Thanks,

 

Lionel

 

PS: Here is the relevant clinfo when 3 Sky 700 cards are installed, I have highlighted the abnormal memory availability:

  Max memory allocation: 631242752

  Global memory size: 1173356544

  Unified memory for Host and Device: 0

  Max memory allocation: 6144655360

  Global memory size: 6315573248

  Unified memory for Host and Device: 0

  Max memory allocation: 6144655360

  Global memory size: 6315573248

Cannot access both AMD and Intel GPU devices

$
0
0

My DELL E6540 laptop has a Intel HD4600 GPU as well as an AMD HD8790M.

 

Before installing the HD8790M driver OpenCL programs could see both the CPU and the GPU device under the Intel platform.

After the install of the AMD driver the Intel GPU OpenCL is missing, and the Intel platform only shows the CPU.

Yet both GPUs can be seen with OpenGL/Cg programs.

 

Current driver versions are:

AMD 13.350.1005.0 (Catalyst 14_2_beta_1.3)

Intel 10.18.10.34496

 

I tried re-installing the Intel driver, also after a uninstall with Driver Display Uninstaller of the Intel driver.

Laptop has no BIOS options regarding video.

 

Help is appreciated to get both GPU devices accessible using OpenCL.


Issues testing HSA driver

$
0
0

Hello,

 

I am trying to make CPU and GPU synchronize via HSA using a shared variable in a A10 - 7850, so I tried to modify the examples provided with the HSA driver. This is my kernel code:

 

#pragma OPENCL EXTENSION cl_amd_c11_atomics : enable 
#define NULL 0 

__kernel void consumer(global volatile atomic_int * data)

{

     int id = get_global_id(0);

     int counter = 2;

     while(data[0] == 0)

     {

         counter = (counter + 1) ;

     }

     data[0] = counter;

The host simply sets data[0] = 0; sleeps for 2 ms and sets data[0] = 1.

 

First issue:

This example works fine, but whenever I update counter with a different operation (I tested counter = (counter + 1);  counter = (counter + 1)%10; counter = (counter * 3); ) it freezes for a moment and restarts the computer.

Before restarting, windows 8.1 informs of an unhandled exception (sometimes SYSTEM_THREAD_EXCEPTION_NOT_HANDLED and sometimes SYSTEM_SERVICE_EXCEPTION) in the file amdkfd.sys

Is there any restriction with the beta driver, or am I doing something wrong=

 

Second issue:

I might want to launch this kernel in the CPU, but when building the program, it does not recognize the #pragma. Is there any way to force the CPU to recognize said pragma?

 

Thank you.

OpenCL Samples Visual Studio Compile Errors

$
0
0

I just downloaded the the AMD OpenCL SDK and tried compiling one of the samples in Visual Studio 2010. These are the errors I get:

Error 1 error C1083: Cannot open include file: 'CL/cl.hpp': No such file or directory C:\Users\watkinsp\AMD APP SDK\2.9\samples\opencl\cl\DynamicOpenCLDetection\VectorAddition\VectorAddition.cpp 20 1 VectorAddition

Error 2 error C1083: Cannot open include file: 'CL/cl.h': No such file or directory c:\users\watkinsp\amd app sdk\2.9\samples\opencl\cl\template\Template.hpp 24 1 Template

Error 3 error C1083: Cannot open include file: 'CL/opencl.h': No such file or directory C:\Users\watkinsp\AMD APP SDK\2.9\include\SDKUtil\CLUtil.hpp 24 1 BufferImageInterop

Error 4 error LNK1104: cannot open file 'OpenCL.lib' C:\Users\watkinsp\AMD APP SDK\2.9\samples\opencl\cl\DynamicOpenCLDetection\LINK DynamicOpenCLDetection

Error 5 error C1083: Cannot open include file: 'CL/opencl.h': No such file or directory C:\Users\watkinsp\AMD APP SDK\2.9\include\SDKUtil\CLUtil.hpp 24 1 TransferOverlap

Error 6 error C1083: Cannot open include file: 'CL/opencl.h': No such file or directory C:\Users\watkinsp\AMD APP SDK\2.9\include\SDKUtil\CLUtil.hpp 24 1 URNG

Error 7 error C1083: Cannot open include file: 'CL/opencl.h': No such file or directory C:\Users\watkinsp\AMD APP SDK\2.9\include\SDKUtil\CLUtil.hpp 24 1 SobelFilter

Error 8 error C1083: Cannot open include file: 'CL/opencl.h': No such file or directory C:\Users\watkinsp\AMD APP SDK\2.9\include\SDKUtil\CLUtil.hpp 24 1 SimpleImage

 

I haven't modified anything. I guess I just assumed I could open the provided Visual Studio 2010 solution and compile one of the samples out of the box. Did I miss an installation step?

Blender Cycles(Opencl on AMD GPUS)

$
0
0

Dear Opencl Developer

Why doesn't the AMD opencl compiler work with blender cycles?

whenever i compile the blender cycles kernel, the system either crashes due to lack of memory, or takes too long to compile the blender cycles kernel(which thereafter comes up with the following error:

opencl build failed:errors in console

calclcompile failederror: creating kernel_ocl_path_trace failed!

can't open file c:\tmp\5688.blend@ for writing:no file or directory

 

).

When is AMD opencl compiler going to work properly with blender cycles?

Why can't the AMD opencl compiler developers test their compiler against blender cycles?

 

Seasons Greetings,

 

npm1,

 

PS i am as well as others(i assume) are considering to make a switch from AMD GPUs to Nvidia.

How branch affect the work-items in one wavefront

$
0
0

In OpenCL, a wavefront containing 64 work-items is scheduled each time. As all work-items work in lock-step manner, so even one work-item is delayed (encounter cache miss or else), then all other work-items have to wait for that one. Then what confuse me is that: because in actual scheuduling process, a quarter of the wavefront (i.e. 16 work-items) is scheduled onto GPU cores in one cycle, and the whole wavefront will be executed in 4 consequent cycles.

1) One work-item in the first quarter is delayed, all other three quarters will be delayed?

2) If only one work-item from the second quarter is delayed, then the first quarter will be not delayed, but the 3rd and 4th quarter will be delayed?

 

 

Is that true on AMD GPUs?


About the OpecnCL 1.2 beta driver for Kaveri APU

$
0
0

Hi all,

 

I have a A-10 7850k APU and want to try the OpecnCL 1.2 beta driver for developer, but when I tried to installed the driver, I found that the document said it only supports the Asus A88X-PRO motherboard. Unfortunately, mine is A88XM-A and A88X-PRO isn't available in my region. I have checked that A88XM-A supports IOMMU as well and can't figure out why it's not supported by the driver. I'm desperately want to know why this beta driver only support one motherboard and if it possible that my motherboard is supported actually?

 

Thanks.

Finding minimum of square of difference between two arrays

$
0
0

Hi,

I have been trying to execute a simple kernel but it returns garbage values and I am unable to figure why. I want to find the closest set of planes from a given plane set using the angles between the planes. So, the criteria is to find the minimum of the square of the difference of the corresponding angles. In this case, the correct answer should be given as the planes which have near similar orientation. I am getting the desirable answer in CPU. But when I am sending it to kernel, it sends out a different answer not consistent with my calculations.

 

__kernel void getTransformation( __global uint* permut1, __global int4* combo1, __global int4* combo2, , int size1, int size2, __global float4* trans)
{  int gid = get_global_id(0);  float2 temp_dot;  float min_dot = FLT_MAX;  int ind = 0;  for(int i=0;i<size2;i++)  {  temp_dot = (dot2[i].x - dot1[permut1[gid]].x,dot2[i].y - dot1[permut1[gid]].y);  if((temp_dot.x*temp_dot.x + temp_dot.y*temp_dot.y) < min_dot)  {  min_dot = temp_dot.x*temp_dot.x + temp_dot.y*temp_dot.y;  ind = i;  }  }  float4 num_pl2 = combo2[ind];  trans[gid] = convert_float4_rtp(num_pl2);
}

Linux 290x OpenCL ?

$
0
0

Hello,

 

I am trying to get 290x to be visible under Ubuntu 12.04 with latest 13.11 beta6 64bit Linux drivers. It looks like fglrx module is loaded and aticonfig is functioning

 

# aticonfig --adapter=0 --od-getclocks

 

Adapter 0 - AMD Radeon R9 290 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    300           150

 

        Performance Level :    0
        Current Bus Speed :    2500
         Current Bus Lane :    1
                 GPU load :    0%

 

 

However clinfo is returning no GPU devices. I have tried to login from the console to X and also set the COMPUTE environment variable. But nothing helped.

 

Is this a known problem that 290x series do not support OpenCL on Linux or any ideas on what may be the reason?

 

Thanks,

Evren

OpenCV-CL buffers and kernel launch

$
0
0

Hi,

 

I've some questions about the difficulties I'm facing while using OpenCV-CL

1. Where can I find the complete documentation of features of OpenCV-CL? (at least an API list)

2. Is there an cv::ocl API to launch a kernel? Something like cv::ocl::create_kernel<>(),  cv::ocl::enqueuendrange<>()

3. How do I transfer a video stored in a vector of cv::Mat (std::vector<Mat>) to an cl::ocl buffer. I know how to copy cv::Mat to oclMat, but how to pass a vector?

 

Thanks,

Liu

Different behaviors when device has reached its maximum global memory limit

$
0
0

Hello,

 

I have a HD 7970 with 6G and I have an OpenCL program(.exe) that takes up about 3.3G of Global memory.

 

Two odd behaviors:

First I'll call the OpenCL program I want to start myOpenCLApp.exe

 

Behavior 1:

I am able to create two myOpenCLApp.exe and successfully get an output by spawning it as a process via CreateProcessA of windows API. This is odd since creating two myOpenCLApp.exe surpasses my global memory limit by ~600Mb. I observe the same behavior with smaller global memory on other GPU devices i.e NVidia. Using my tools, I throttling on GPU activity and the execution time slows down considerably.

I create the process like this:

CreateProcessA("myOpenCLApp.exe", NULL, NULL, NULL,false, 0, NULL, NULL, &sinfo, &pinfo);

CreateProcessA("myOpenCLApp.exe", NULL, NULL, NULL,false, 0, NULL, NULL, &sinfo, &pinfo);

Total memory for both processes: 6.63GB. This is puzzling as I expect the second call to CreateProcessA to fail. However, what seems to happen is that both processes run just fine just really slow.

 

What is this behavior I'm seeing and where can I find more info on it? I have not seem much material online about this.

 

Behavior 2:

I used the command line for this.

On the command line I use:

start /b myOpenCLApp.exe

 

and again

start /b myOpenCLApp.exe

 

This gives me an error when I try to start the second process CL_MEM_OBJECT_ALLOCATION_FAILURE .

I expect to see this when starting the process the way I did for Behavior 1.

What's going on here?

 

Behavior 1 has one common parent while Behavior 2 is somehow different?

 

I also observe the behavior 1's pattern when applying to multiple threads and one common parent.

 

Please let me know if I missed something obvious

 

Regards.

Viewing all 2400 articles
Browse latest View live




Latest Images