Simulation / Modeling / Design

Making Containers Easier with HPC Container Maker

Today’s groundbreaking scientific discoveries are taking place in high performance computing (HPC) data centers. However, installing and upgrading HPC applications on those shared systems come with a set of unique challenges that decrease accessibility, limit users to old features, and ultimately lower productivity.

Containers simplify application deployments in the data centers by wrapping applications into an isolated virtual environment. By including all application dependencies like binaries and libraries, application containers run seamlessly in any data center environment.

The HPC application containers available on NVIDIA GPU Cloud (NGC) dramatically improve ease of application deployment while delivering optimized performance. The containers include HPC applications such as NAMD, GROMACS, and Relion. NGC gives researchers and scientists the flexibility to run HPC application containers on NVIDIA Pascal­ – and Volta-powered systems, including Quadro-powered workstations, NVIDIA DGX Systems, and HPC clusters.

However, if the desired application is not available on NGC, building HPC containers from scratch trades one set of challenges for another. Parts of the software environment typically provided by the HPC data center must be redeployed inside the container. For those used to just loading the relevant environment modules, installing a compiler, MPI library, CUDA, and other core HPC components from scratch may be daunting.

Introducing HPC Container Maker

HPC Container Maker (HPCCM) is an open-source project that addresses the challenges of creating HPC application containers. HPCCM encapsulates into modular building blocks the best practices of deploying core HPC components with container best practices, to reduce container development effort, minimize image size, and take advantage of image layering. Available on GitHub, HPCCM makes it easier to create HPC application containers by separating the choice of what should go into a container image from the specification details of how to configure, build, and install a component. This separation also enables the best practices of HPC component deployment to transparently evolve over time.

HPCCM comes with building blocks for the GNU and PGI Community Edition compilers, the OpenMPI and MVAPICH2 MPI libraries, the Mellanox OpenFabrics Enterprise Distribution (OFED) for InfiniBand support, the FFTW and HDF5 libraries, Python, and more.

How It Works

Container frameworks rely on an input specification file to define the contents of the corresponding container image. The input specification file is a Dockerfile for Docker and many other container runtimes and a Singularity recipe file for Singularity. The specification file contains the precise set of steps to execute when building a container image. For a given component, the steps may include download of the source, configuration and building, installation, and finally cleanup. Using the HPCCM building blocks, it’s easy to generate optimized container specification files for Docker and Singularity.

For example, the following Dockerfile and Singularity recipe files are generated from the same two-line, high-level HPCCM recipe using the included OpenMPI building block. The OpenMPI building block has a number of configuration options, only some of which are shown below, to customize the configuration, building, and installation.

 

Stage0 += baseimage(image='nvidia/cuda:9.0-devel', _as='devel')
Stage0 += openmpi(cuda=True, infiniband=True, prefix='/usr/local/openmpi',
                  version='3.0.0')

[Basic HPCCM recipe]

 

FROM nvidia/cuda:9.0-devel AS devel

# OpenMPI version 3.0.0
RUN apt-get update -y && \
    apt-get install -y --no-install-recommends \
        file \
        hwloc \
        openssh-client \
        wget && \
    rm -rf /var/lib/apt/lists/*
RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.bz2 && \
    tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j && \
    cd /tmp/openmpi-3.0.0 && ./configure --prefix=/usr/local/openmpi --disable-getpwuid --enable-orterun-prefix-by-default --with-cuda --with-verbs && \
    make -j4 && \
    make -j4 install && \
    rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0
ENV PATH=/usr/local/openmpi/bin:$PATH \
    LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH

[Dockerfile generated from the basic HPCCM recipe]

 

BootStrap: docker
From: nvidia/cuda:9.0-devel

# OpenMPI version 3.0.0
%post
    apt-get update -y
    apt-get install -y --no-install-recommends \
        file \
        hwloc \
        openssh-client \
        wget
    rm -rf /var/lib/apt/lists/*
%post
    mkdir -p /tmp && wget -q --no-check-certificate -P /tmp https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.bz2
    tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j
    cd /tmp/openmpi-3.0.0 && ./configure --prefix=/usr/local/openmpi --disable-getpwuid --enable-orterun-prefix-by-default --with-cuda --with-verbs
    make -j4
    make -j4 install
    rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0
%environment
    export PATH=/usr/local/openmpi/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH

[Singularity recipe file generated from the basic HPCCM recipe]

 

Base Recipes as a Starting Point for HPC Application Containers

Several “base container” recipes are included with HPCCM and are an excellent starting point for HPC containers. These base recipes include fundamental HPC components such as a compiler, an MPI library, CUDA, Mellanox OFED, and other components typically found in an HPC software stack.

The “MPI Bandwidth” sample program from the Lawrence Livermore National Laboratory (LLNL) will be used as a proxy application to illustrate how one can use HPCCM base recipes to create an application container.

First, generate a Dockerfile from one of the base images and use Docker to create a local container image named baseimage. The build may take a while since a number of HPC components need to be downloaded and built. Please note that these steps assume that the user is part of the docker group and hence doesn’t need to prefix each docker command with sudo.

 

$ hpccm.py --recipe recipes/hpcbase-gnu-openmpi.py --single-stage > Dockerfile.baseimage
$ docker build -t baseimage -f Dockerfile.baseimage .

 

Use the newly created base container image as the starting point for the application Dockerfile. In this case, the Dockerfile to build the MPI Bandwidth proxy application is trivial when the starting point is a container image with an MPI library; just copy the source file into the container and build it using the MPI compiler wrapper. Create a file named Dockerfile.baseimage-mpi_bandwidth with the following content.

 

FROM baseimage

# MPI Bandwidth "application"
COPY mpi_bandwidth.c /tmp/mpi_bandwidth.c
RUN mkdir -p /workspace && \
    mpicc -o /workspace/mpi_bandwidth /tmp/mpi_bandwidth.c

 

Build the application container image using Docker.

 

$ docker build -t baseimage-mpi_bandwidth -f Dockerfile.baseimage-mpi_bandwidth .

 

Finally, run the containerized MPI Bandwidth proxy application.

 

$ nvidia-docker run --rm -ti baseimage-mpi_bandwidth mpirun --allow-run-as-root -n 2 /workspace/mpi_bandwidth

******************** MPI Bandwidth Test ********************
Message start size= 100000 bytes
Message finish size= 1000000 bytes
Incremented by 100000 bytes per iteration
Roundtrips per iteration= 100
MPI_Wtick resolution = 1.000000e-09
************************************************************
task    0 is on dbcb15aff446 partner=   1
task    1 is on dbcb15aff446 partner=   0
************************************************************
...
***Message size:  1000000 *** best /  avg / worst (MB/sec)
   task pair:    0 - 1: 8754.23 / 8641.50 / 6640.06
   OVERALL AVERAGES:          8754.23 / 8641.50 / 6640.06

 

Alternatively, the steps to build the MPI Bandwidth proxy application could have been added to the end of Dockerfile.baseimage for a single, self-contained HPC application Dockerfile. Doing this would also allow unused components from the general-purpose base image to be removed to reduce the size of the application container image.

Portable HPC Application Recipes

HPCCM also includes reference HPC application recipes for some popular applications such as GROMACS and MILC. These can be used as-is to generate corresponding HPC application container images. Please note that these recipes are specifically tailored for each application to only include the necessary components, in contrast to the general-purpose base container recipes.

The MPI Bandwidth proxy application will again be used to demonstrate how to create portable HPC application recipes.

First, create a copy of the one of the base recipes.

 

$ cp recipes/hpcbase-gnu-openmpi.py mpi_bandwidth.py

 

Add the following content to the end of mpi_bandwidth.py. This performs the same steps from the previous section to copy the source file into the container and build it using the MPI compiler wrapper.

 

# MPI bandwidth "application"
Stage0 += copy(src='mpi_bandwidth.c', dest='/tmp/mpi_bandwidth.c')
Stage0 += shell(commands=['mkdir -p /workspace',
                          'mpicc -o /workspace/mpi_bandwidth /tmp/mpi_bandwidth.c'])
Stage1 += copy(src='/workspace/mpi_bandwidth', dest='/workspace/mpi_bandwidth',
               _from=0)

Now generate a Dockerfile for MPI Bandwidth and build the corresponding container image. At the end of the Dockerfile.recipe, note that the generated Docker syntax to build MPI Bandwidth is the same as the content in Dockerfile.baseimage-mpi_bandwidth from the previous section.

$ hpccm.py --recipe mpi_bandwidth.py --single-stage > Dockerfile.recipe
$ docker build -t recipe-mpi_bandwidth -f Dockerfile.recipe .

 

From the same HPCCM recipe file, a Singularity recipe file can also be generated just by using the –format singularity command line option and then used to build a Singularity container image.

 

$ hpccm.py --recipe mpi_bandwidth.py --single-stage --format singularity > Singularity.recipe
$ sudo singularity build recipe-mpi_bandwidth.simg Singularity.recipe

$ singularity exec --nv recipe-mpi_bandwidth.simg mpirun -n 2 /workspace/mpi_bandwidth

******************** MPI Bandwidth Test ********************
Message start size= 100000 bytes
Message finish size= 1000000 bytes
Incremented by 100000 bytes per iteration
Roundtrips per iteration= 100
MPI_Wtick resolution = 1.000000e-09
************************************************************
task    0 is on hsw221 partner=   1
task    1 is on hsw221 partner=   0
************************************************************
…
***Message size:  1000000 *** best /  avg / worst (MB/sec)
   task pair:    0 - 1: 8833.26 / 8703.22 / 7982.75
   OVERALL AVERAGES:          8833.26 / 8703.22 / 7982.75

 

Up to this point, MPI has been invoked inside the container and used shared memory to communicate between MPI ranks. Containers may also be used for multi-node runs using a high-performance interconnect such as InfiniBand. Successfully using containers for multi-node runs depends on additional factors such as the host networking and firewall configuration, container privileges, and making the network device accessible to the container. Please consult your container framework documentation for more information.

In the case of Singularity, mpirun should be invoked from outside the container. (It’s recommended to use the same version of MPI outside the container as inside.)

 

$ mpirun -mca btl openib,self,vader -n 2 singularity exec --nv recipe-mpi_bandwidth.simg /workspace/mpi_bandwidth

******************** MPI Bandwidth Test ********************
Message start size= 100000 bytes
Message finish size= 1000000 bytes
Incremented by 100000 bytes per iteration
Roundtrips per iteration= 100
MPI_Wtick resolution = 1.000000e-09
************************************************************
task    0 is on hsw220 partner=   1
task    1 is on hsw221 partner=   0
************************************************************
…
***Message size:  1000000 *** best /  avg / worst (MB/sec)
   task pair:    0 - 1: 6165.93 / 5766.67 / 2798.42
   OVERALL AVERAGES:          6165.93 / 5766.67 / 2798.42

 

Summary

Containers enable HPC applications to be widely deployed regardless of the underlying data center environment. NVIDIA is enabling the usage of containers for HPC by providing access to tuned and tested HPC application containers on the NVIDIA GPU Cloud (NGC). For instances where a container is not available on NGC, HPCCM simplifies creating new HPC application containers.

HPC Container Maker (HPCCM) is an open-source project available on GitHub.

Discuss (0)

Tags