Skip to content

Commit

Permalink
Updates to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
SimonWNCAS committed Mar 5, 2021
1 parent 7cc9c05 commit da63258
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 24 deletions.
47 changes: 26 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,11 @@ It is based on [Fedora](https://getfedora.org/) and includes all of the software

A compiler is **not** required on the build and run machine where the container is deployed. All compilation of LFRic is done via the containerised compilers.

LFRic components are built using a shell within the container.
The shell automatically sets up the build environment when invoked.
LFRic components are built using a shell within the container and the shell automatically sets up the build environment when invoked.

The LFRic source code is not containerised, it is retrieved as usual via subversion from within the container shell so there is no need to rebuild the container for LFRic trunk updates.
The LFRic source code is not containerised, it is retrieved as usual via subversion from within the container shell so there is no need to rebuild the container for LFRic code updates.

The container is compatible with [slurm](https://slurm.schedmd.com/documentation.html) and the compiled executable can be run in batch, using the local MPI libraries, if the host system has an [MPICH ABI](https://www.mpich.org/abi/) compatible MPI.
The container is compatible with [slurm](https://slurm.schedmd.com/documentation.html), and the compiled executable can be run in batch using the local MPI libraries, if the host system has an [MPICH ABI](https://www.mpich.org/abi/) compatible MPI.

A rebuilt container is available from [Sylabs Cloud](https://cloud.sylabs.io/library/simonwncas/default/test).

Expand All @@ -23,9 +22,10 @@ lfric.sub is and example ARCHER2 submission script.

# Requirements
## Base requirement
Linux host to build and run.

[Singularity](https://sylabs.io/) 3.0+ (3.7 preferred); Access to [Met Office Science Repository Service](https://code.metoffice.gov.uk)
[Singularity](https://sylabs.io/) 3.0+ (3.7 preferred)

Access to [Met Office Science Repository Service](https://code.metoffice.gov.uk)

## Optional requirements

Expand All @@ -44,7 +44,9 @@ singularity pull [--disable-cache] lfric_env.sif library://simonwncas/default/lf
```
Note: `--disable-cache` is required if using Archer2.

* Build container using `lfric_env.sif`.
or:

* Build container using `lfric_env.def`.
```
sudo singularity build lfric_env.sif lfric_env.def
```
Expand Down Expand Up @@ -99,11 +101,10 @@ cd example
mpiexec -np 6 ../bin/gungho configuration.nml
```
Note: This uses the MPI runtime libraries built into in the container. If the host machine has a MPICH based MPI (MPICH, Intel MPI, Cray MPT, MVAPICH2), then see below on how to use [MPICH ABI](https://www.mpich.org/abi/) to access the local MPI and therefore the fast interconnects when running the executable via the container.
OpenMPI will not work with this method.

# Using MPICH ABI

This approach is a variation on the [Singularity MPI Bind model](https://sylabs.io/guides/3.7/user-guide/mpi.html#bind-model). The compiled model executable is run within the container with suitable options to allow access to the local MPI installation. At runtime, containerised libraries are used by the executable apart from the local MPI libraries.
This approach is a variation on the [Singularity MPI Bind model](https://sylabs.io/guides/3.7/user-guide/mpi.html#bind-model). The compiled model executable is run within the container with suitable options to allow access to the local MPI installation. At runtime, containerised libraries are used by the executable apart from the local MPI libraries. OpenMPI will not work with this method.

Note: this only applies when a model is run, the executable is compiled using the method above, without any reference to local libraries.

Expand All @@ -113,11 +114,11 @@ A MPICH ABI compatible MPI is required. These have MPI libraries named `libmpifo

## Build bind points and LD_LIBRARY_PATH

The local MPI libraries need to be made available to the container. Bind points are required so that containerised processes can access the local directories. Also the `LD_LIBRARY_PATH` inside the container needs updating to reflect the path to the local libraries.
The local MPI libraries need to be made available to the container. Bind points are required so that containerised processes can access the local directories which contain the MPI libraries. Also the `LD_LIBRARY_PATH` inside the container needs updating to reflect the path to the local libraries. This method has been tested for slurm, but should for other job control systems.

For example, assuming the system MPI libraries are in `/opt/mpich/lib`, set the bind directory with
```
export BIND_DIR=/opt/mpich
export BIND_OPT="-B /opt/mpich"
```
then for Singularity versions <3.7
```
Expand All @@ -128,38 +129,42 @@ for Singularity v3.7 and over
export LOCAL_LD_LIBRARY_PATH="/opt/mpich/lib:\$LD_LIBRARY_PATH"
```

The entries in `BIND_OPT` are comma separated, while `[SINGULARITYENV_LOCAL_]LD_LIBRARY_PATH` are colon separated.

## Construct run command and submit

For Singularity versions <3.7, the command to run gungho is now
For Singularity versions <3.7, the command to run gungho within MPI is now
```
singularity exec $BIND_DIR lfric_env.sif ../bin/gungho configuration.nml
singularity exec $BIND_OPT <sif-dir>/lfric_env.sif ../bin/gungho configuration.nml
```
for Singularity v3.7 and over

```
singularity exec $BIND_DIR --env=LD_LIBRARY_PATH=$LOCAL_LD_LIBRARY_PATH lfric_env.sif ../bin/gungho configuration.nml
singularity exec $BIND_OPT --env=LD_LIBRARY_PATH=$LOCAL_LD_LIBRARY_PATH <sif-dir>/lfric_env.sif ../bin/gungho configuration.nml
```

Running with mpirun/slurm is straightforward, just use the standard command for running MPI jobs eg:
```
mpirun -n <NUMBER_OF_RANKS> singularity exec $BIND_DIR lfric_env.sif ../bin/gungho configuration.nml
mpirun -n <NUMBER_OF_RANKS> singularity exec $BIND_OPT lfric_env.sif ../bin/gungho configuration.nml
```
or
```
srun --cpu-bind=cores singularity exec $BIND_DIR lfric_env.sif ../bin/gungho configuration.nml
srun --cpu-bind=cores singularity exec $BIND_OPT lfric_env.sif ../bin/gungho configuration.nml
```
on ARCHER2

If running with slurm, `/var/spool/slurmd` should be appended to `BIND_DIR`, separated with a comma.
If running with slurm, `/var/spool/slurmd` should be appended to `BIND_OPT`, separated with a comma.

## Update for local MPI dependencies
It could be possible that the local MPI libraries have other dependencies which are in other system directories. In this case `BIND_DIR` and `[SINGULARITYENV_]LOCAL_LD_LIBRARY_PATH` have to be updated to reflect these. For example on ARCHER2 these are
It could be possible that the local MPI libraries have other dependencies which are in other system directories. In this case `BIND_OPT` and `[SINGULARITYENV_]LOCAL_LD_LIBRARY_PATH` have to be updated to reflect these. For example on ARCHER2 these are
```
export BIND_DIR="-B /opt/cray,/usr/lib64:/usr/lib/host,/var/spool/slurmd"
export BIND_OPT="-B /opt/cray,/usr/lib64:/usr/lib/host,/var/spool/slurmd"
```
and
```
export SINGULARITYENV_LOCAL_LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.0.16/ofi/gnu/9.1/lib-abi-mpich:/opt/cray/libfabric/1.11.0.0.233/lib64:/opt/cray/pe/pmi/6.0.7/lib
```
Discovering these is a process of trail and error where the executable is run via the container and any missing libraries included the the above environment variables.
`/usr/lib/host` Is at the end of `LD_LIBRARY_PATH` in the container, so that the bind point can be used to provide any remaining system libraries dependencies in standard locations such as `/usr/lib64`.
Discovering the missing dependencies is a process of trail and error where the executable is run via the container, and any missing libraries will cause an error and be reported. A suitable bind point and library path is then included in the above environment variables, and the process repeated.

`/usr/lib/host` Is at the end of `LD_LIBRARY_PATH` in the container, so that this bind point can be used to provide any remaining system libraries dependencies in standard locations. In the above example, there are extra dependencies in `/usr/lib64`, so `
/usr/lib64:/usr/lib/host` in `BIND_OPT` mounts this as `/usr/lib/host` inside the container, and therefore `/usr/lib64` is appended to the container's `LD_LIBRARY_PATH`.
4 changes: 2 additions & 2 deletions lfric.sub
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ cd <base_dir>/trunk/gungho/example

export SINGULARITYENV_LOCAL_LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.0.16/ofi/gnu/9.1/lib-abi-mpich:/opt/cray/libfabric/1.11.0.0.233/lib64:/opt/cray/pe/pmi/6.0.7/lib

export BIND_DIR="-B /opt/cray,/usr/lib64:/usr/lib/host,/var/spool/slurmd"
export BIND_OPT="-B /opt/cray,/usr/lib64:/usr/lib/host,/var/spool/slurmd"

srun --cpu-bind=cores singularity exec $BIND_DIR <base_dir>/lfric_env.sif ../bin/gungho configuration.nml
srun --cpu-bind=cores singularity exec $BIND_OPT <base_dir>/lfric_env.sif ../bin/gungho configuration.nml

2 changes: 1 addition & 1 deletion lfric_env.def
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ export MPICH_DIR=$BASE_DIR/mpich
export PFUNIT=$INSTALL_DIR
export NETCDF_DIR=$BASE_DIR/netcdf
export CPPFLAGS="-I$INSTALL_DIR/include -I$NETCDF_DIR/include"
export FFLAGS="-I$INSTALL_DIR/include -I$NETCDF_DIR/include -I$MPICH_DIR/include"
export FFLAGS="-I$INSTALL_DIR/include -I$INSTALL_DIR/mod -I$NETCDF_DIR/include -I$MPICH_DIR/include"
export LDFLAGS="-L$INSTALL_DIR/lib -L$NETCDF_DIR/lib"
export PATH=$MPICH_DIR/bin:$NETCDF_DIR/bin:$INSTALL_DIR/bin:/opt/intel/oneapi/compiler/latest/linux/bin/intel64:$PATH
export PSYCLONE_CONFIG=/usr/local/share/psyclone/psyclone.cfg
Expand Down

0 comments on commit da63258

Please sign in to comment.