I am running my OpenCL program on a cluster. The cluster software prefers jobs to use a single CPU core so it can utilize the nodes effectively. I tried using device fission to restrict my code to run on a single core, but I was dismayed to discover that when multiple instances of my program were run they used the same single CPU core, which is obviously sub-optimal. Is there any way to restrict an OpenCL process to a single core but still allow the process to automatically migrate to other cores (I assume this is handled by the kernel). Here is my current code (approximately):
cl_device_partition_property[3] props = [CL_DEVICE_PARTITION_EQUALLY, compute_units, 0]; cl_uint num_sub_devices; err = clCreateSubDevices(Device, props.ptr, 0, null, &num_sub_devices); assert(err == 0, "Failed to create sub-devices:" ~ GetCLErrorString(err)); if(sub_device_idx >= num_sub_devices) throw new Exception("Invalid sub-device index " ~ Format("{}", sub_device_idx).idup ~ "."); scope sub_devices = new cl_device_id[](num_sub_devices); err = clCreateSubDevices(Device, props.ptr, num_sub_devices, sub_devices.ptr, null); assert(err == 0, "Failed to create sub-devices:" ~ GetCLErrorString(err)); Device = sub_devices[sub_device_idx];
Note that I always use the same sub_device_idx (which is obviously where this issue comes from). It is not possible for me to vary this, because I do not control which nodes the cluster software juggles places my programs on.