Hello,
I am hitting an odd bug running one of my OpenCL kernels on a new HD 7790. This is a kernel that I've verified on a HD 7770, and also on some Fermi and Kepler cards also.
After a lot of narrowing, I am strongly suspecting it's some kind of compiler bug. Unfortunately it doesn't look like CodeXL will disassemble Bonaire ISA yet so I can't confirm if it's doing something odd. I also can't debug the kernel.
Are there any known issues with register clobbering or similar? I have 'AMD APP SDK Runtime 10.0.1124.2'.
I'll try to make a standalone test, but this is the gist of the problem code:
struct MyStruct *m = (__global struct MyStruct *)(basePtr + offset);
if(m->magic != 123)
{
... dump debug diagnostics to global memory // This never happens
return;
}
if(...)
{
// loads + arithmetic
// no stores, and no touching 'm'
}
else
{
// loads + arithmetic
// no stores, and no touching 'm'
}
if(m->magic != 123)
{
... dump debug diagnostics to global memory // This always happens
return;
}
The result is that I get the dump the 2nd time I check m->magic not the 1st. Nothing should be modifying global memory here. There's just the one kernel running with clFinish before and after - and it's 100% reproducible.
I dumped 'basePtr', 'offset' and 'm' and I can see m is corrupt (m != basePtr + offset).