I have a simple test kernel which I compile in kernelanalyzer2. It says on Tahiti it will use 110 VGPR. I tried to put #pragma unroll 1 and it does not have any effect at all. Is there a known way to avoid compiler from using so many registers? ( keep in mind this is a dummy test kernel, but this seems to effect an actual kernel and reduce occupancy)
__kernel void test (__global double *distrValues, __global double *distrValuesOut) { __private const int id = get_global_id(0); __private double den2; __private int i; for (i=0;i<53;i++) { den2 += distrValues[i]; } distrValuesOut[id]=den2; }
Thanks!