-
Notifications
You must be signed in to change notification settings - Fork 55
Troubleshooting
While doing the Federated learning project on Raspberry Pi’s, one will be presented with myriads of challenges, especially those frustrating exceptions! Not to worry, we have compiled all the errors that each one of us have encountered in this journey filled with loads of lessons learnt.
- Cause: Compiling PyTorch on a Raspberry Pi takes a long time.
- Fix: As long as no errors are showing in your terminal and the green light is on your Raspberry Pi you will just have to wait.
- Cause 1: This exception occurs if you try to run pytorch from the same location as your build location. The compiled pytorch folder already contains a folder named torch and the interpreter tries to find the packages from that folder.
- Cause 2: This exception would also occur if the installed PyTorch library is not compiled with the right version of gcc.
- Fix 1: cd (change directory) to a different location and import torch upon launching Python
- Fix 2 Compile PyTorch with the gcc version 8.2 present in Raspbian Buster and then install PyTorch
Original error was: libf77blas.so.3: cannot open shared object file, while importing torch in Python
- Cause: You probably missed to install some of the dependencies listed in the project tutorial required for Pytorch
- Fix: Run the following to install Pytorch’s dependencies:
sudo apt install libopenblas-dev libblas-dev m4 cmake cython python3-dev python3-yaml python3-setuptools
Failed to run 'bash tools/build_pytorch_libs.sh --use-cuda --use-nnpack --use mkldnn --use qnnpack caffe2' while trying to build PyTorch
- Cause: You probably missed to add the environment variables before the build process. Note that this needs to be done every time your Raspberry Pi is restarted.
- Fix: Type in the following to add environment variables:
export NO_CUDA=1
export NO_DISTRIBUTED=1
export NO_MKLDNN=1
export NO_NNPACK=1
export NO_QNNPACK=1
- Note: You can also temporarily add this to your .bash_rc file.
- Cause: You may be using latest Raspbian version (Buster) on a RPi3 or RPi3+. Raspbian Buster comes with GCC8.x and Stretch with GCC6.x, GCC8.x will fail to build PyTorch for RPi3+ or older.
- Fix: Downgrade to Raspbian Stretch. You can find older Raspbian images here.
- Cause: This exception occurs when you try to install syft via pip. This is because latest version of PySyft requires PyTorch v1.1 to be installed.
- Fix 1: Upgrade torch to v1.1. Type in the following:
pip install –upgrade torch
- Fix 2: If you are using PyTorch v1.0 and need to stick to it, install an older version of PySyft without dependencies. If you directly install via pip with dependencies, it would once again lead to the same error. Hence, type in the following instead:
pip install syft==0.1.13a1 –no-dependencies
Syft version 0.1.13a1 seems to be compatible with torch V1.0 Now you need to separately install the dependencies. Type in:
pip3 install flask-socketio lz4 msgpack websockets zstd
Try importing Pysyft using: import syft. This should execute successfully without any further exceptions
Try:
- $ pip3 install websocket_client
- $ pip3 install Flask flask-socketio lz4 msgpack websockets zstd
- $ python3
>>> import syft
- #If no errors syft is successfully installed!
- Cause: This error occurs due to the very large timeout value given to websocket in pysyft for a windows machine.
-
Fix: This needs to be fixed by editing the websocket_client.py file in
..\Lib\site-packages\syft\workers
of your Python location. Look for TIMEOUT_INTERVAL variable which has9_999_999
. Remove a 9 and change this to9_999_99
.
print("Generating list of batches for the workers...")
list_federated_train_loader = list(federated_train_loader)
- Cause: Well, it does take a looong time to process!
- Fix: Be patient, grab a cup of coffee, and wait! It takes around 20mins – 3 hours or more depending on the processing power of machine.
- Cause: latest package version requires libomp that may not be installed in some devices.
- Fix: just install libomp via Homebrew on macOS or apt-get on Linux.
- First, check if your RPi connections are running properly.
- Cause: This problem will most likely be caused by having different PySyft versions on your devices.
- Fix: Make sure to install latest version. You can just git pull from PySyft repo and restart build and install process (build and install time will be shorter).
- Fix2: Restart notebook kernel al clear all outputs. If you changed or updated packages they may not work properly until kernel is restarted.
- Note 1: latest version will only work with PyTorch 1.1.0.
- Note 2: This will also fix a couple of bugs on tutorial files, worker conenction status will now be visible in RPis.
- Cause: This error occurred for me when I tried to run the start_websocket_servers.py script using sudo
- Fix: The error disappeared when I did not use sudo.
- Cause: On step 21 in jupyter notebook 'Federated Recurrent Neural Network'
- Fix: ..\syft\frameworks\torch\hook.py at the line 356. Change self.native_param_data(new_data) to self.native_param_data.set_(new_data)