Quantcast
Channel: Community : All Content - OpenCL
Viewing all articles
Browse latest Browse all 2400

Why run´s this Code faster on the CPU than the GPU

$
0
0

Hello to everyone,

 

I am currently trying to get familiar with jocl, and learn the basics.

 

For that I tried a basic Sample, in which I fill a array representing an Image with shades of blue.

So that every Work-Item has its own intensity value of the blue component.

Here´s the example:

 

__kernel void sampleKernel(__global float *intensitys, __global float *picture)

        {

            int gid = get_global_id(0);

            int width = 1800;

            int height = 1000;

            for(int j = 0; j < 2000; j ++){

                int position = (height - gid - 1) * width;

                for(int i = 0; i < width; i++){

                     picture[position+i] = 255 * intensitys[gid];

                }

            }

        }

 

I added the 2000-loop only for more computation time, so that I can benchmark it better. It has no influence on the final image.

 

My problem is that the execution time on the GPU is longer than on the CPU

I use global_work_size of 1000 for every line of the Image

local_work_size 64 for GPU     executiontime: 540ms

local_work_size 4 for CPU       executiontime: 387ms

 

I tried several local_work_size´s but the GPU was always slower.

 

I thought it could be the IO between GPU and CPU but removing the 2000-loop results in nearly 0ms computation times for

both GPU and CPU.

 

Doubling the loop to 4000 results in double computation time so the IO has no big influence on the computation time

 

I realy don´t know why, the GPU should with it´s 1000 shaders perform much better than the CPU with its 4 cores.

 

I appreciate every hint. Thanks for your help in advance!

 

The code is in the appendix


Viewing all articles
Browse latest Browse all 2400

Latest Images

Trending Articles



Latest Images