-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Predefined macros in OpenCL (standard and proprietary)
The following predefined macro names are available.
__FILE__
The presumed name of the current source file (a character string literal).
__LINE__
The presumed line number (within the current source file) of the current source line (an integer constant).
__OPENCL_VERSION__
substitutes an integer reflecting the version number of the OpenCL supported by the OpenCL device. OpenCL 1.0 will substitute the integer 100. Note that this macro may not be defined at all on host side.
__ENDIAN_LITTLE__
is used to determine if the OpenCL device is a little endian architecture or a big endian architecture (an integer constant of 1 if device is little endian and is undefined otherwise). Also refer to CL_DEVICE_ENDIAN_LITTLE specified in the table of OpenCL Device Queries for clGetDeviceInfo.
__ROUNDING_MODE__
is used to determine the current rounding mode and is set to rte. The __ROUNDING_MODE__
only affects the rounding mode of conversions to a float type.
__kernel_exec(X, typen)
(and kernel_exec(X, typen)
) is defined as follows:
__kernel __attribute__((work_group_size_hint(X, 1, 1))) \
__attribute__((vec_type_hint(typen)))
__IMAGE_SUPPORT__
is used to determine if the OpenCL device supports images. This is an integer constant of 1 if images are supported and is undefined otherwise. Also refer to CL_DEVICE_IMAGE_SUPPORT specified in the table of OpenCL Device Queries for clGetDeviceInfo.
__FAST_RELAXED_MATH__
is used to determine if the cl-fast-relaxed-math optimization option is specified in build options given to clBuildProgram. This is an integer constant of 1 if the cl-fast-relaxed-math build option is specified and is undefined otherwise. The macro names defined by the C99 specification but not currently supported by OpenCL are reserved for future use.
The macro names defined by the C99 specification but not currently supported by OpenCL are reserved for future use.
You can test if a pragma is supported by checking whether its name is defined:
#ifdef cl_nv_pragma_unroll
#define NVIDIA
#endif
or
#ifdef cl_khr_byte_addressable_store
#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : disable
#endif
The CL_VERSION_x_y
macros listed below are visible on host side, but note that they only tell the state of your headers (and probably your libs/drivers). The device may have a lower level than the driver.
Note that even if we see eg. 1.2 support when building, the binary may be run on a system with 1.1 drivers and/or a device that only supports 1.0! Only device-side macros (or using functions instead of macros) can tell the real situation.
The device code can simply use the macros since the kernels are built at run-time, but the host code need to adopt. Here's how to deal with it: First, test at run-time for device support. Second, wrap THAT in macros in order to be able to build at all on legacy systems with lower OpenCL version.
#if CL_VERSION_1_2
/* get_device_version() is our helper function in common-opencl.c */
if (get_device_version(gpu_id) >= 120) {
// 1.2 specific code
} else
#else
{
// fallback code for < 1.2
}
#endif
CL_VERSION_1_0
is defined reflecting the OpenCL 1.0 version. Please note that this macro is probably useless since it wasn't defined in the 1.0 standard...
CL_VERSION_1_1
is defined reflecting the OpenCL 1.1 version.
CL_VERSION_1_2
is defined reflecting the OpenCL 1.2 version.
__OPENCL_C_VERSION__
substitutes an integer reflecting the OpenCL C version specified by the -cl-std build option to clBuildProgram or clCompileProgram. If the -cl-std build option is not specified, the OpenCL C version supported by the compiler for this OpenCL device will be used. For version 1.20 __OPENCL_C_VERSION__
will substitute the integer 120.
__EMBEDDED_PROFILE__
will be the integer constant 1 for OpenCL devices that implement the embedded profile and is undefined otherwise. CL_PLATFORM_PROFILE defined in table 4.1 (see clGetPlatformInfo) will return the string EMBEDDED_PROFILE if the OpenCL implementation supports the embedded profile only.
CL_VERSION_2_0
substitutes the integer 200 (or merely 1?) reflecting the OpenCL 2.0 version.
CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE
expands to a positive integer specifying the maximum size in bytes for a program scope variable or static function variable. This is the same value as CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE returned by clGetDeviceInfo in table 4.3.
The NULL
macro expands to a null pointer constant. An integer constant expression with the value 0, or such an expression cast to type void * is called a null pointer constant.
The predefined identifier __func__
is available.
__WinterPark__
__BeaverCreek__
__Turks__
__Caicos__
__Tahiti__
__Pitcairn__
__Capeverde__
__Cayman__
__Barts__
__Cypress__
__Juniper__
__Redwood__
__Cedar__
__ATI_RV770__
__ATI_RV730__
__ATI_RV710__
__Loveland__
__GPU__
Note: Do not use __GPU__
. It's AMD-specific. Although nowadays we define it ourselves for non-AMD.
__Hawaii__
is confirmed too. Note to self: __HAWAII__
is not it. They are Capitalized lower case.
Note: Do not use these, they are AMD-specific. Although nowadays we define __CPU__
ourselves for non-AMD.
__CPU__
__X86__
__X86_64__
None known unfortunately
Here's some things our shared code always define:
DEV_VER_MAJOR=xx, DEV_VER_MINOR=yy
are the first and second groups of digits from CL_DRIVER_VERSION. For AMD, it will be eg. 1729 and 3, for Catalyst 15.5 (see this table). For nvidia, it will be eg. 352 and 21 for version 352.21. OSX can have version strings like "10.6.42 310.42.05f01" which will just give DEV_VER_MAJOR=10 and DEV_VER_MINOR=6 and the rest is lost.
SM_MAJOR=x, SM_MINOR=y
are defined for nvidia only. For eg. Compute Capability 5.2 (sm_52) they will be 5 and 2 respectively. These are not defined under OS X since it doesn't support any proprietary queries.
OPENCL_COMPILER
is always defined to 1 for run-time OpenCL compilation. It can be used in header files that are also sourced from host side (this will be undefined then).
__GPU__
is defined to 1 for GPU devices.
__CPU__
is defined to 1 for CPU devices.
__OS_X__
is defined to 1 on OS X platform.
__MESA__
is defined to 1 for MESA platforms.
__POCL__
is defined to 1 for POCL platforms.
__BEIGNET__
is defined to 1 for, you guessed it, Beignet platforms.
__SIZEOF_HOST_SIZE_T__
will be sizeof(size_t)
passed from host side (needed when a struct passed from host includes size_t member). Beware that this will normally be 8 nowadays while GPU-side likely has it as 4 - structs passed from host that contains size_t members may need som TLC.
We also define DEVICE_INFO
which is a bit field, see run/opencl/opencl_device_info.h for definitions and helper macros. For example, we can do things like #if amd_vliw5(DEVICE_INFO)
after sourcing that header.