GPU

Ground-state calculations on a GPU is a new feature with some limitations:

only PW-mode
it has only been implemented in the “New GPAW” code

You use it like this:

>>> from gpaw import GPAW
>>> atoms = ...
>>> atoms.calc = GPAW(..., parallel={'gpu': True})

By default, the environment variable $GPAW_USE_GPUS is used, to determine whether to use gpu or not (defaults to not). In addition, the user can specify parallel={‘gpu’: False} (or True) to override this behavior.

The GPAW CI has a GitLab Runner with a GPU, so the GPU parts of GPAW are tested by the GPAW’s test suite as well.

GPAW_USE_GPUS: If this environment variable is set to 1 then the default value for gpu in the parallel dictionary will be set to True. Since it only is a default value, the effect of $GPAW_USE_GPUS may be overridden by specifying the gpu key to the parallel dictionary.

GPAW_CPUPY: If this environment variable is set to 1, then users without GPU’s can run the GPU code. CuPy will be emulated by fictitious library cpupy. This option is useful to make sure that developers without GPU do not break the GPU code.

Tip

>>> import numpy as np
>>> from gpaw.gpu import cupy as cp
>>> a_cpu = np.zeros(...)
>>> a_gpu = cp.asarray(a_cpu)  # from CPU to GPU
>>> b_cpu = a_gpu.get()  # from GPU to CPU

Building the GPU code

To build GPAW with GPU support, siteconfig.py needs to be updated. To see how to use siteconfig, see Customizing installation. Five variables need to be set:

gpu is a boolean determining whether to build the GPU kernels or not.

gpu_target where valid target architectures are 'cuda', 'hip-amd' or 'hip-cuda'. Essentially, with NVIDIA architectures, the target should be 'cuda', and nvcc compiler will be required, and with hip- selections, hipcc compiler will be used.

gpu_compiler is optional, and will be selected by gpu_target normally, but it can be overwritten with this parameter.

gpu_include_dirs are not normally needed, but can be used to provide additional search paths to locate headers.

gpu_compile_args is essential, and proper target architecture needs to be supplied in most cases.

If you intend to go to really large systems, you need to enable 64-bit array indexing in kernels. This is done by adding $define_macros += [('GPAW_64_BIT_INDEXING', None)]$ to the $siteconfig.py$. Note: In GPAW master, GPAW’s MPI wrappers are only equiped to send int32 messages (the int64 MPI is coming, but not yet merged to master.

In addition, libraries list should be appended by GPU blas and GPU runtime libraries. See the examples below for examples of how to utilize these commands.

Example piece of siteconfig to build with HIP (AMD MI250X):

gpu = True
gpu_target = 'hip-amd'
gpu_compiler = 'hipcc'
gpu_include_dirs = []
gpu_compile_args = [
    '-g',
    '-O3',
    '--offload-arch=gfx90a',
    ]
libraries += ['amdhip64', 'hipblas']

Example piece of siteconfig to build with CUDA (NVIDIA A100):

gpu = True
gpu_target = 'cuda'
gpu_compiler = 'nvcc'
gpu_compile_args = ['-O3',
                    '-g',
                    '-gencode', 'arch=compute_80,code=sm_80']

libraries += ['cudart', 'cublas']

To see what the siteconfig should look in practice, see ../platforms/Cray/siteconfig-lumi-gpu.py (AMD MI250X) or ../platforms/Linux/Niflheim/siteconfig-foss.py (NVIDIA A100) examples.

GPU parallelization

Same parallelization options are available as with the CPU version. GPAW will distribute the available GPUs in round robin manner. As a rule of thumb, always use 1 CPU per logical GPU. While it rarely helps to oversubscribe the GPUs, it might sometimes give a small speed up.

By default, GPAW will utilize GPU-aware MPI, expecting the MPI library to be compiled with GPU-aware MPI support. However, if this is not the case (segfaults or bus errors occur at MPI calls), one may disable the GPU-aware MPI with following command added to the siteconfig:

undef_macros += ['GPAW_GPU_AWARE_MPI']

If disabled, at MPI calls, GPAW will transfer data from GPU to CPU, to move it via MPI in CPU, and transfer it back to GPU after that. However, the normal behavior is to transfer directly from GPU to GPU.

The gpaw.gpu module

gpaw.gpu.cupy: cupy module (or gpaw.gpu.cpupy if cupy is not available)

gpaw.gpu.cupyx: cupyx module (or gpaw.gpu.cpupyx if cupyx is not available)

gpaw.gpu.cupy_is_fake = True

bool(x) -> bool

Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.

gpaw.gpu.is_hip = False

bool(x) -> bool

Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.

gpaw.gpu.as_np(array, dtype=None)[source]

Transfer array to CPU (if not already there).

Parameters:: array (ndarray | ndarray) – Numpy or CuPy array.

gpaw.gpu.as_xp(array, xp)[source]

Transfer array to CPU or GPU (if not already there).

Parameters:

array – Numpy or CuPy array.
xp – numpy or cupy.

Fake cupy library

The implementation uses cupy. In the code, we don’t do import cupy as cp. Instead we use from gpaw.gpu import cupy as cp. This allows us to use a fake cupy implementation so that we can run GPAW’s cupy code without having a physical GPU. To enable the fake cupy module, do:

GPAW_CPUPY=1 python ...

This allows users without a GPU to find out if their code interferes with the GPU implementation, simply by running the tests.

CuPy enabled container objects

The following objects:

can have their data (.data attribute) stored in a cupy.ndarray array instead of, as normal, a numpy.ndarray array. In additions, these objects now have an xp attribute that can be numpy or cupy.

Also, the AtomCenteredFunctions object can do its operations on the GPU.

Building GPAW with MAGMA support

GPAW provides wrappers to a subset of eigensystem solvers from the MAGMA library, which implements efficient, hybrid CPU-GPU algorithms for common linear algebra tasks. Compiling GPAW with MAGMA support is recommended for performance if running on AMD GPUs. On Nvidia there is currently no performance increase.

MAGMA features can be enabled in siteconfig.py:

magma = True
libraries += ['magma']

You may also need to modify library_dirs, runtime_library_dirs and include_dirs with paths to your MAGMA installation (see Customizing installation).

You will also need to ensure the CUDA/HIP compiler standard is set to C++17 or newer (-std=c++17). Modern CUDA/HIP installations do this automatically, and GPAW installation also adds this flag. In case you still face issues: 1. If your siteconfig.py adds '-std=...'' to gpu_compile_args, update the standard there. GPAW will not override a user-defined standard. 1. If using HIP to compile CUDA code ($hipcc$ as a wrapper to $nvcc$), you may need to set the standard through an environment variable: export HIPCC_COMPILE_FLAGS_APPEND="-std=c++17". However, we generally recommend using $nvcc$ and the CUDA toolkit directly if building for Nvidia GPUs.

You can use the gpaw.cgpaw.gpu.magma.is_available() function to check if MAGMA is available within your GPAW installation. GPAW eigensystem routines will default to the MAGMA implementation on AMD GPUs, provided the matrix is large enough to benefit from it. You can also call the MAGMA solvers directly from the gpaw.cgpaw.gpu.magma extension module.