The following seems to be a bug in the gcn compiler, starting with catalyst 12.9, and does not occur on other architectures like Cayman and Barts. The test program fills an array[64] with some data and then reads it back in reverse order. To do this, the compiler computes the offset (63 - gx) to pointer pB where gx >= 0 && < 64. The problem occurs because the new compiler uses the constant value 63 as an offset in the ds_read_b32 instruction, which then causes the register address of ds_read_b32 to be negative!. The old compiler does not use the ds_read_b32 offset option. Several similar expressions cause the same problem. Unfortunately this is a pretty common 'C' expression.....
result = *(pB + (63 - gx));
Just a comment, gcn is still relatively new and being optimized so bugs can happen however, the compiler is often beyond amazing in the code it produces and getting faster. Go compiler team.
__kernel void localpointer_rw(global int *in, global int *out)
{
int i, gx = get_global_id(0); //gx goes from 0 to 63
local int buf[64],*pB;
buf[gx] = gx; //assign 64 numbers to all buf[n] locations
pB = &buf[0]; //make a pointer
i = *(pB+(63-gx)); //ERROR! reads only when gx == 0 !!
out[gx] = i;
}
Dissassembly from (good) old compiler
v_sub_i32 v1, vcc, 0, v0 <------ start:gx in v0; gx = -gx;
v_lshlrev_b32 v1, 2, v1 mmultiply by 4
v_add_i32 v1, vcc, 0x000000fc, v1 add constant offset 252
v_lshlrev_b32 v2, 2, v0
ds_write_b32 v2, v0
s_waitcnt lgkmcnt(0)
ds_read_b32 v0, v1 read
Dissassembly of newer code
v_lshlrev_b32 v1, 2, v0
v_sub_i32 v2, vcc, 0, v0 <------ start:gx in v0; gx = -gx;
ds_write_b32 v1, v0
v_lshlrev_b32 v0, 2, v2 multiply by 4
s_waitcnt lgmkcnt(0)
ds_read_b32 v0, v0 offset:252 read from offset 252 with negative address register