Skip to content

Predefined macros in OpenCL (standard and proprietary)

magnum edited this page Dec 25, 2024 · 18 revisions

OpenCL 1.0

The following predefined macro names are available.

__FILE__ The presumed name of the current source file (a character string literal).

__LINE__ The presumed line number (within the current source file) of the current source line (an integer constant).

__OPENCL_VERSION__ substitutes an integer reflecting the version number of the OpenCL supported by the OpenCL device. OpenCL 1.0 will substitute the integer 100. Note that this macro may not be defined at all on host side.

__ENDIAN_LITTLE__ is used to determine if the OpenCL device is a little endian architecture or a big endian architecture (an integer constant of 1 if device is little endian and is undefined otherwise). Also refer to CL_DEVICE_ENDIAN_LITTLE specified in the table of OpenCL Device Queries for clGetDeviceInfo.

__ROUNDING_MODE__ is used to determine the current rounding mode and is set to rte. The __ROUNDING_MODE__ only affects the rounding mode of conversions to a float type.

__kernel_exec(X, typen) (and kernel_exec(X, typen)) is defined as follows:

     __kernel __attribute__((work_group_size_hint(X, 1, 1))) \
              __attribute__((vec_type_hint(typen)))

__IMAGE_SUPPORT__ is used to determine if the OpenCL device supports images. This is an integer constant of 1 if images are supported and is undefined otherwise. Also refer to CL_DEVICE_IMAGE_SUPPORT specified in the table of OpenCL Device Queries for clGetDeviceInfo.

__FAST_RELAXED_MATH__ is used to determine if the cl-fast-relaxed-math optimization option is specified in build options given to clBuildProgram. This is an integer constant of 1 if the cl-fast-relaxed-math build option is specified and is undefined otherwise. The macro names defined by the C99 specification but not currently supported by OpenCL are reserved for future use.

The macro names defined by the C99 specification but not currently supported by OpenCL are reserved for future use.

You can test if a pragma is supported by checking whether its name is defined:

#ifdef cl_nv_pragma_unroll
#define NVIDIA
#endif

or

#ifdef cl_khr_byte_addressable_store
#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : disable
#endif

The CL_VERSION_x_y macros listed below are visible on host side, but note that they only tell the state of your headers (and probably your libs/drivers). The device may have a lower level than the driver.

Note that even if we see eg. 1.2 support when building, the binary may be run on a system with 1.1 drivers and/or a device that only supports 1.0! Only device-side macros (or using functions instead of macros) can tell the real situation.

The device code can simply use the macros since the kernels are built at run-time, but the host code need to adopt. Here's how to deal with it: First, test at run-time for device support. Second, wrap THAT in macros in order to be able to build at all on legacy systems with lower OpenCL version.

#if CL_VERSION_1_2
        /* get_device_version() is our helper function in common-opencl.c */
        if (get_device_version(gpu_id) >= 120) {
                // 1.2 specific code
        } else
#else
        {
                // fallback code for < 1.2
        }
#endif

OpenCL 1.1

CL_VERSION_1_0 is defined reflecting the OpenCL 1.0 version. Please note that this macro is probably useless since it wasn't defined in the 1.0 standard...

CL_VERSION_1_1 is defined reflecting the OpenCL 1.1 version.

OpenCL 1.2

CL_VERSION_1_2 is defined reflecting the OpenCL 1.2 version.

__OPENCL_C_VERSION__ substitutes an integer reflecting the OpenCL C version specified by the -cl-std build option to clBuildProgram or clCompileProgram. If the -cl-std build option is not specified, the OpenCL C version supported by the compiler for this OpenCL device will be used. For version 1.20 __OPENCL_C_VERSION__ will substitute the integer 120.

__EMBEDDED_PROFILE__ will be the integer constant 1 for OpenCL devices that implement the embedded profile and is undefined otherwise. CL_PLATFORM_PROFILE defined in table 4.1 (see clGetPlatformInfo) will return the string EMBEDDED_PROFILE if the OpenCL implementation supports the embedded profile only.

OpenCL 2.0

CL_VERSION_2_0 substitutes the integer 200 (or merely 1?) reflecting the OpenCL 2.0 version.

CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE expands to a positive integer specifying the maximum size in bytes for a program scope variable or static function variable. This is the same value as CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE returned by clGetDeviceInfo in table 4.3.

The NULL macro expands to a null pointer constant. An integer constant expression with the value 0, or such an expression cast to type void * is called a null pointer constant.

The predefined identifier __func__ is available.

AMD

GPU devices:

__WinterPark__
__BeaverCreek__
__Turks__
__Caicos__
__Tahiti__
__Pitcairn__
__Capeverde__
__Cayman__
__Barts__
__Cypress__
__Juniper__
__Redwood__
__Cedar__
__ATI_RV770__
__ATI_RV730__
__ATI_RV710__
__Loveland__
__GPU__

Note: Do not use __GPU__. It's AMD-specific. Although nowadays we define it ourselves for non-AMD.

__Hawaii__ is confirmed too. Note to self: __HAWAII__ is not it. They are Capitalized lower case.

CPU devices:

Note: Do not use these, they are AMD-specific. Although nowadays we define __CPU__ ourselves for non-AMD.

__CPU__
__X86__
__X86_64__

nvidia

None known unfortunately

JtR Jumbo

Here's some things our shared code always define:

DEV_VER_MAJOR=xx, DEV_VER_MINOR=yy are the first and second groups of digits from CL_DRIVER_VERSION. For AMD, it will be eg. 1729 and 3, for Catalyst 15.5 (see this table). For nvidia, it will be eg. 352 and 21 for version 352.21. OSX can have version strings like "10.6.42 310.42.05f01" which will just give DEV_VER_MAJOR=10 and DEV_VER_MINOR=6 and the rest is lost.

SM_MAJOR=x, SM_MINOR=y are defined for nvidia only. For eg. Compute Capability 5.2 (sm_52) they will be 5 and 2 respectively. These are not defined under OS X since it doesn't support any proprietary queries.

OPENCL_COMPILER is always defined to 1 for run-time OpenCL compilation. It can be used in header files that are also sourced from host side (this will be undefined then).

__GPU__ is defined to 1 for GPU devices.

__CPU__ is defined to 1 for CPU devices.

__OS_X__ is defined to 1 on OS X platform.

__MESA__ is defined to 1 for MESA platforms.

__POCL__ is defined to 1 for POCL platforms.

__BEIGNET__ is defined to 1 for, you guessed it, Beignet platforms.

__SIZEOF_HOST_SIZE_T__ will be sizeof(size_t) passed from host side (needed when a struct passed from host includes size_t member). Beware that this will normally be 8 nowadays while GPU-side likely has it as 4 - structs passed from host that contains size_t members may need som TLC.

We also define DEVICE_INFO which is a bit field, see run/opencl/opencl_device_info.h for definitions and helper macros. For example, we can do things like #if amd_vliw5(DEVICE_INFO) after sourcing that header.