Hi, I'm running Win7/64, HD5770, Catalyst 13.1. When my program compiles I receive this output when targeting the GPU (runs fine on CPU):
Select device - OpenCL Platform 1/1: Advanced Micro Devices, Inc., Version: OpenCL 1.2 AMD-APP (1084.4)
Get device info - Device 1/1: Juniper (Advanced Micro Devices, Inc.),
device version: OpenCL 1.2 AMD-APP (1084.4), driver version: 1084.4 (VM)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing
Global memory:1073741824, Global memory cache: 0, local memory: 32768, workgroup size: 256, Work dimensions: 3[256, 256, 256, 0, 0] , Max clock speed:960, compute units:10
Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -g -DMORE_CLASSES -DCL_GPU_SIEVE").LLVM ERROR: Cannot select: 0x8660700: i8 = setcc 0x8655250, 0x77c6140, 0x8659990 [ID=58] dbg:barrett.cl:5169:39
0x8655250: i32 = AMDILISD::ADD 0x77c6140, 0x77b5530 [ID=55] dbg:barrett.cl:5169:39
0x77c6140: i32,ch = llvm.AMDIL.mulhi.u32 0x5e14070, 0x864b1b0, 0x8660500, 0x77c4220 [ORD=221913] [ID=48]
0x864b1b0: i32 = TargetConstant<2674> [ORD=221906] [ID=17]
0x8660500: i32,ch = llvm.AMDIL.mad24.u32 0x5e14070, 0x865e6e0, 0x865e1e0, 0x8656460, 0x8664440 [ORD=221904] [ID=44]
0x865e6e0: i32 = TargetConstant<2623> [ORD=221904] [ID=13]
0x865e1e0: i32 = Constant<4620> [ORD=221904] [ID=14]
0x8656460: i32,ch = llvm.AMDIL.mul24.u32 0x5e14070, 0x77be2c0, 0x77b4c20, 0x8665850 [ORD=221900] [ID=39]
0x77be2c0: i32 = TargetConstant<2666> [ORD=221900] [ID=9]
0x77b4c20: i32,ch = CopyFromReg 0x5e14070, 0x77c2d00 [ORD=221900] [ID=32]
0x77c2d00: i32 = Register %vreg1776 [ORD=221900] [ID=10]
0x8665850: i32 = AMDILISD::VEXTRACT 0x77c07e0, 0x865d4d0 [ORD=221899] [ID=36]
0x77c07e0: v4i32,ch = llvm.AMDIL.get.group.id 0x5e14070, 0x865d9d0 [ORD=221898] [ID=31]
0x865d9d0: i32 = TargetConstant<2564> [ORD=221898] [ID=8]
0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]
0x8664440: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x865d4d0 [ORD=221903] [ID=41]
0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]
0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]
0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]
0x8661110: i32 = undef [ORD=221902] [ID=12]
0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]
0x77c4220: i32,ch = CopyFromReg 0x5e14070, 0x8664240 [ORD=221896] [ID=30] dbg:barrett.cl:5156:51
0x8664240: i32 = Register %vreg1773 [ORD=221896] [ID=6]
0x77b5530: i32 = and 0x8658a80, 0x77c4220 [ORD=221916] [ID=53] dbg:barrett.cl:5169:39
0x8658a80: i32 = setcc 0x77c5a30, 0x77c5430, 0x62debd0 [ID=51] dbg:barrett.cl:5162:128
0x77c5a30: i32 = AMDILISD::ADD 0x77bfad0, 0x8661e10 [ORD=221907] [ID=46] dbg:barrett.cl:5162:128
0x77bfad0: i32,ch = llvm.AMDIL.mulhi.u32 0x5e14070, 0x864b1b0, 0x865e1e0, 0x8656460 [ORD=221906] [ID=43]
0x864b1b0: i32 = TargetConstant<2674> [ORD=221906] [ID=17]
0x865e1e0: i32 = Constant<4620> [ORD=221904] [ID=14]
0x8656460: i32,ch = llvm.AMDIL.mul24.u32 0x5e14070, 0x77be2c0, 0x77b4c20, 0x8665850 [ORD=221900] [ID=39]
0x77be2c0: i32 = TargetConstant<2666> [ORD=221900] [ID=9]
0x77b4c20: i32,ch = CopyFromReg 0x5e14070, 0x77c2d00 [ORD=221900] [ID=32]
0x77c2d00: i32 = Register %vreg1776 [ORD=221900] [ID=10]
0x8665850: i32 = AMDILISD::VEXTRACT 0x77c07e0, 0x865d4d0 [ORD=221899] [ID=36]
0x77c07e0: v4i32,ch = llvm.AMDIL.get.group.id 0x5e14070, 0x865d9d0 [ORD=221898] [ID=31]
0x865d9d0: i32 = TargetConstant<2564> [ORD=221898] [ID=8]
0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]
0x8661e10: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x77c4a20 [ORD=221905] [ID=40]
0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]
0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]
0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]
0x8661110: i32 = undef [ORD=221902] [ID=12]
0x77c4a20: i32 = TargetConstant<2> [ORD=221905] [ID=27]
0x77c5430: i32 = setcc 0x8660500, 0x8664440, 0x8659990 [ID=47] dbg:barrett.cl:5162:128
0x8660500: i32,ch = llvm.AMDIL.mad24.u32 0x5e14070, 0x865e6e0, 0x865e1e0, 0x8656460, 0x8664440 [ORD=221904] [ID=44]
0x865e6e0: i32 = TargetConstant<2623> [ORD=221904] [ID=13]
0x865e1e0: i32 = Constant<4620> [ORD=221904] [ID=14]
0x8656460: i32,ch = llvm.AMDIL.mul24.u32 0x5e14070, 0x77be2c0, 0x77b4c20, 0x8665850 [ORD=221900] [ID=39]
0x77be2c0: i32 = TargetConstant<2666> [ORD=221900] [ID=9]
0x77b4c20: i32,ch = CopyFromReg 0x5e14070, 0x77c2d00 [ORD=221900] [ID=32]
0x77c2d00: i32 = Register %vreg1776 [ORD=221900] [ID=10]
0x8665850: i32 = AMDILISD::VEXTRACT 0x77c07e0, 0x865d4d0 [ORD=221899] [ID=36]
0x77c07e0: v4i32,ch = llvm.AMDIL.get.group.id 0x5e14070, 0x865d9d0 [ORD=221898] [ID=31]
0x865d9d0: i32 = TargetConstant<2564> [ORD=221898] [ID=8]
0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]
0x8664440: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x865d4d0 [ORD=221903] [ID=41]
0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]
0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]
0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]
0x8661110: i32 = undef [ORD=221902] [ID=12]
0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]
0x8664440: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x865d4d0 [ORD=221903] [ID=41]
0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]
0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]
0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]
0x8661110: i32 = undef [ORD=221902] [ID=12]
0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]
0x77c4220: i32,ch = CopyFromReg 0x5e14070, 0x8664240 [ORD=221896] [ID=30] dbg:barrett.cl:5156:51
0x8664240: i32 = Register %vreg1773 [ORD=221896] [ID=6]
0x77c6140: i32,ch = llvm.AMDIL.mulhi.u32 0x5e14070, 0x864b1b0, 0x8660500, 0x77c4220 [ORD=221913] [ID=48]
0x864b1b0: i32 = TargetConstant<2674> [ORD=221906] [ID=17]
0x8660500: i32,ch = llvm.AMDIL.mad24.u32 0x5e14070, 0x865e6e0, 0x865e1e0, 0x8656460, 0x8664440 [ORD=221904] [ID=44]
0x865e6e0: i32 = TargetConstant<2623> [ORD=221904] [ID=13]
0x865e1e0: i32 = Constant<4620> [ORD=221904] [ID=14]
0x8656460: i32,ch = llvm.AMDIL.mul24.u32 0x5e14070, 0x77be2c0, 0x77b4c20, 0x8665850 [ORD=221900] [ID=39]
0x77be2c0: i32 = TargetConstant<2666> [ORD=221900] [ID=9]
0x77b4c20: i32,ch = CopyFromReg 0x5e14070, 0x77c2d00 [ORD=221900] [ID=32]
0x77c2d00: i32 = Register %vreg1776 [ORD=221900] [ID=10]
0x8665850: i32 = AMDILISD::VEXTRACT 0x77c07e0, 0x865d4d0 [ORD=221899] [ID=36]
0x77c07e0: v4i32,ch = llvm.AMDIL.get.group.id 0x5e14070, 0x865d9d0 [ORD=221898] [ID=31]
0x865d9d0: i32 = TargetConstant<2564> [ORD=221898] [ID=8]
0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]
0x8664440: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x865d4d0 [ORD=221903] [ID=41]
0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]
0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]
0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]
0x8661110: i32 = undef [ORD=221902] [ID=12]
0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]
0x77c4220: i32,ch = CopyFromReg 0x5e14070, 0x8664240 [ORD=221896] [ID=30] dbg:barrett.cl:5156:51
0x8664240: i32 = Register %vreg1773 [ORD=221896] [ID=6]
What does that mean, and how do I avoid it?
Hmm, when I just tried it again with the packed-up zip, it fails in the CPU as well (run 'mfakto -d c' to let it choose the CPU), but with this error:
Select device - (CPU) - OpenCL Platform 1/1: Advanced Micro Devices, Inc., Version: OpenCL 1.2 AMD-APP (1084.4)
Get device info - Device 1/1: AMD Phenom(tm) II X4 955 Processor (AuthenticAMD),
device version: OpenCL 1.2 AMD-APP (1084.4), driver version: 1084.4 (sse2)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing
Global memory:4293038080, Global memory cache: 65536, local memory: 32768, workgroup size: 1024, Work dimensions: 3[1024, 1024, 1024, 0, 0] , Max clock speed:3208, compute units:4
Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -g -DMORE_CLASSES -DCL_GPU_SIEVE").
BUILD OUTPUT
".\barrett.cl", line 665: warning: statement is unreachable
nn.d0 = n.d0 * qi;
^
".\barrett.cl", line 987: warning: statement is unreachable
nn.d0 = n.d0 * qi;
^
".\barrett.cl", line 4975: warning: variable "exp96" was declared but never
referenced
__private int96_t exp96, my_k_base, f_base;
^
".\barrett.cl", line 4976: warning: variable "f" was declared but never
referenced
__private int96_v a, u, f;
^
"C:\Users\Bertram\AppData\Local\Temp\OCL6DE4.tmp.cl", line 536: warning:
variable "as" was declared but never referenced
int90_v a, as, b, r, m;
^
"C:\Users\Bertram\AppData\Local\Temp\OCL6DE4.tmp.cl", line 536: warning:
variable "m" was declared but never referenced
int90_v a, as, b, r, m;
^
Internal Error: ld failed
END OF BUILD OUTPUT
Error -11: clBuildProgram
init_CL(3, -1) failed
I'm sure it worked on the CPU, but I may have changed one of the kernels a bit ... Just "ld failed" is not a lot to work with ...
Anyway, I'm attaching the zip.