-
Notifications
You must be signed in to change notification settings - Fork 17
Tensile Tuning Guide
First set up development environment as guide by tensile wikis, However, this wikis kind of old ,therefore we need update those to run it work.
Set up Ubuntu 22.04.03 ( windows not able to run due to lack of environment ,we will to tuning our gpu host machine in linux , use the final file to build both in windows and linux)
1 , Set up ROCm
sudo apt update && sudo apt install -y curl git vim ffmpeg gfortran libstdc++-12-dev cockpit openssh-server
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo usermod -a -G render,video $LOGNAME # Adding current user to Video, Render groups. See prerequisites.
wget https://repo.radeon.com/amdgpu-install/6.1.1/ubuntu/jammy/amdgpu-install_6.1.60101-1_all.deb
sudo apt install ./amdgpu-install_6.1.60101-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms
sudo apt install rocm
sudo reboot
Then check info if the rocm correct install
rocminfo
2 , Now we use rocblas set up to get the set up . by
git clone -b release/rocm-rel-6.1.1 https://github.com/ROCm/rocBLAS.git
cd rocBLAS
run
./install.sh -d
After done , we are basic set up the environment . Now we move the the wiki guide .
Ubuntu: sudo apt install python3 python3-yaml libomp-dev libboost-program-options-dev libboost-filesystem-dev libtinfo-dev
-
joblib library:
pip install joblib
(or possiblypip3 install joblib
)
- Ubuntu:
sudo apt install libmsgpack-dev
if you're using the msgpack backend - Ubuntu:
sudo apt install libtinfo-dev
if you're using the YAML backend \
its better to download both . its may not update anythings as we already install them by ./install.sh -d
sudo apt install cmake
Then , download llvm . we install llvm-17 which is compatible with rocm so far .
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo add-apt-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-17 main"
sudo apt-get update
sudo apt-get install -y clang-17 lldb-17 lld-17
if report error ,saying there is missing link for msgpack
pip install msgpack
sudo apt-get install libzstd-dev
now , we set up the envrionment .
3, test the environment and get the library file
git clone https://github.com/ROCmSoftwarePlatform/Tensile
cd tensile
**"If your GPUs are out of official support, you can try editing the arch definitions in Tensile via VS Code. **
- Locate Supported Architectures: Look for any official supported architectures listed (e.g., "gfx1102").
- Add Your GPU Information: Beside each existing architecture, add your GPU's architecture information using a similar format.
-
Find 'ISA' Lines: Search for lines containing
11, 0, 2
(or similar). This indicates the supported instruction set architecture. -
Modify 'ISA' Code: Add your GPU's ISA code alongside the existing one (e.g.,
ISA: 11, 0, 2, 11, 0, 3
). Remember to replace11, 0, 3
with the correct ISA code for your specific GPU.
A sample tuning file can be found in ./Tensile/Configs/rocblas_sgemm_example.yaml.
This file generates a library for gfx1030
. If you are running on a different architecture, you will first need to edit the line that says ArchitectureName: "gfx1030"
at the bottom of the file. change Device ID
to yourgpu's Device ID
. get it by rocminfo
and info availalbe here Then, you can run the benchmark as follows:
mkdir build
cd build
../Tensile/bin/Tensile ../Tensile/Configs/rocblas_sgemm_example.yaml ./
After about 1 minute of benchmarking, Tensile will output a yaml file with the winning kernels. This file contains the results of the winning kernels in the 3_LibraryLogic directory. Spreadsheets and yaml files with the Benchmark Data for all kernels are available in the 2_BenchmarkData directory.
if there is any error try fix it by google or ask AI get your support .
rename the navi21 to navi34 ( change to your liked name ,or simply keep it ) ,change the data over there , change gfx1030 to your gpu arch ,Device Id to your gpu Device ID ( get by run rocminfo
).
Then run
cd build
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_gb_nn_asm_full.yaml ./
its take longer ,maybe hours . its best not to use your pc when the benchmarking running . Then run one by one get the rest of information .make sure to back up your build folder after each one benchmark finish to name something build 1,2,3 ...We may only need the file in the 3_LibraryLogic directory.
Then run this one by one ,make sure change the navi34 in the code to your named for your gpu library .or simple change back to navi21,and use the file in navi21 and replace the information with yours. and run the code as example show as above .
here is the list I am using , for your reference only , run each line ,one by one
../Tensile/bin/Tensile ../Tensile/Configs/rocblas_sgemm_example.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_gb_nn_asm_full.yaml ./?
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_gb_nt_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_gb_tn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_gb_tt_asm_full.yaml ./
will generate 4 different files in name navi34_Cijk_Alik_Bjlk_HB_GB.yaml
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_sb_nn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_sb_nt_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_sb_tn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_sb_tt_asm_full.yaml ./
will generate 4 different files in name navi34_Cijk_Alik_Bjlk_HB.yaml
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_gb_nn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_gb_nt_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_gb_tn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_gb_tt_asm_full.yaml ./
will generate 4 different files in name /navi34_Cijk_Alik_Bjlk_HHS_BH_GB.yaml
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_sb_nn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_sb_nt_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_sb_tn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_sb_tt_asm_full.yaml ./
will generate 4 different files in name navi34_Cijk_Alik_Bjlk_HHS_BH.yaml
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_gb_nn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_gb_nt_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_gb_tn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_gb_tt_asm_full.yaml ./
will generate 4 different files in name navi34_cijk_Bljk_SB_GB.yaml ( failed to generate logic by this four file ,except tt file)
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_sb_nn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_sb_nt_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_sb_tn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_sb_tt_asm_full.yaml ./
will generate 4 different files in name navi34_cijk_Bljk_SB.yaml ( failed to generate logic by this four file ,except tt file)
There are some files not in navi 21 , you may edit those files like below with name in asm_full.yaml, replace the information with your .or ignore it .
missing file
dgemm rocblas_dgemm_..._asm_full.yaml
rocblas_hpa_bfloat16_..._asm_full.yaml (will generate navi34_Cijk_Alik_Bjlk_I8II_BH.yaml)
rocblas_igemm_asm_full_nn.yaml ( will generate 4xi8 files , make sure the value in the asm cap to true(set in the common.py or AsmCaps.py) ,if you machine don't support 4xi8 initaly)
mfma_igemm_nn_asm_full.yaml ( will generate navi34_Cijk_Ailk_Bljk_BBS_BH.yaml)
"Important: The default Tensile configuration file might not include support for all GPUs. If you encounter errors when running Tensile, carefully review the terminal output. It will often provide clues about missing or incorrect architecture information.
Troubleshooting Steps:
- Identify Missing Information: The error messages should point to specific architectures that are not properly defined in the config file.
- Edit the Config File: Add the necessary information for your GPU, following the format used for existing entries (refer to previous instructions on adding architecture details and ISA codes).
Note:
- Editing configuration files can be risky, so proceed with caution! Back up your original file before making any changes.
- For newer architectures like RNDA3, there may be a lack of pre-defined configurations. You might need to consult official documentation, explore community forums, or experiment carefully to determine the correct settings.
Future Updates: Official updates to Tensile are often necessary to provide complete support for new GPUs and architectures."
more information for tuning is here and tensile tuning .tex and a pdf version available in here
Rocblas build on linux ,follow this guide