Profiling
cProfile
Python has a cProfile
module to help you find the places in the
code where the time is spent.
Let’s say you have a script script.py
that you want to run through the
profiler. This is what you do:
>>> import profile
>>> profile.run('import script', 'prof')
This will run your script and generate a profile in the file prof
.
You can also generate the profile by inserting a line like this in
your script:
...
import cProfile
cProfile.run('atoms.get_potential_energy()', 'prof')
...
Note
Use:
import cProfile
from gpaw.mpi import rank
cProfile.run('atoms.get_potential_energy()', f'prof-{rank:04}')
if you want to run in parallel.
To analyse the results, you do this:
>>> import pstats
>>> pstats.Stats('prof').strip_dirs().sort_stats('time').print_stats(20)
Tue Oct 14 19:08:54 2008 prof
1093215 function calls (1091618 primitive calls) in 37.430 CPU seconds
Ordered by: internal time
List reduced from 1318 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
37074 10.310 0.000 10.310 0.000 :0(calculate_spinpaired)
1659 4.780 0.003 4.780 0.003 :0(relax)
167331 3.990 0.000 3.990 0.000 :0(dot)
7559 3.440 0.000 3.440 0.000 :0(apply)
370 2.730 0.007 17.090 0.046 xc_correction.py:130(calculate_energy_and_derivatives)
37000 0.780 0.000 9.650 0.000 xc_functional.py:657(get_energy_and_potential_spinpaired)
37074 0.720 0.000 12.990 0.000 xc_functional.py:346(calculate_spinpaired)
...
...
The list shows the 20 functions where the most time is spent. Check
the pstats
documentation if you want to do more fancy things.
Tip
Since the cProfile
module does not time calls to C-code, it
is a good idea to run the code in debug mode - this will wrap
calls to C-code in Python functions:
$ python3 -d script.py
Tip
There is also a quick and simple way to profile a script:
$ python3 -m cProfile script.py
Parallel profiling (GPAW new only)
The profiling of GPAW new code is done by a decorator called trace, which is to be applied to all functions one wants to profile. GPAW already has a lot of trace decorators added.
To keep overhead minimum when not tracing,
an environment variable called GPAW_TRACE
has to be set to 1
in order to allow tracing. If GPAW_TRACE
is not defined, or 0,
the trace decorator will be identity, and no overhead will be added to function calls.
In addition to setting the environment variable,
one needs to use the global_timer
of gpaw.new
. Below are two examples
of how to profile.
CPU profiling
Here is an example how to perform CPU profiling for a particular phase of a GPAW calculation.
import os
# This is recommended to be set as an environment variable
os.environ['GPAW_TRACE'] = '1'
from gpaw.new.ase_interface import GPAW
from gpaw.new.timer import global_timer
from gpaw.utilities.timing import Profiler
from ase.build import graphene
atoms = graphene(size=(5, 5, 1), vacuum=5)
atoms.set_pbc((True, True, False))
atoms.calc = GPAW(
mode={"name": "pw", "ecut": 500},
parallel={"gpu": False},
convergence={"density": 3, "eigenstates": 3, "energy": 10},
kpts=(1, 1, 1),
random=True,
)
with global_timer.context(Profiler("cpu")) as timer:
atoms.get_potential_energy()
This will write .json
files for each MPI rank, and finally, at exit, it will concatenate them into a single file.
In order to visualize the profiling results go to address https://ui.perfetto.dev/
, click on Open trace file
and open the resulting json (the concatenated one without rank number).
GPU profiling
The GPU profiling is the same as CPU profiling, but in addition, there are additional events which are tracked
in the GPU stream. This allows to trace what CPU is doing vs. what GPU is doing. For more information,
see below for the gpu=True
option on trace decorator.
import os
# This is recommended to be set as an environment variable
os.environ['GPAW_TRACE'] = '1'
from gpaw.new.ase_interface import GPAW
from gpaw.new.timer import global_timer
from gpaw.utilities.timing import GPUProfiler
from ase.build import graphene
atoms = graphene(size=(5, 5, 1), vacuum=5)
atoms.set_pbc((True, True, False))
atoms.calc = GPAW(
mode={"name": "pw", "ecut": 500},
parallel={"gpu": True},
convergence={"density": 3, "eigenstates": 3, "energy": 10},
kpts=(1, 1, 1),
random=True,
)
with global_timer.context(GPUProfiler("gpu")) as timer:
atoms.get_potential_energy()
Adding tracing to new functions
The trace decorator can be imported as follows: from gpaw.new import trace
.
To profile a particular function one needs to decorate it as follows:
from gpaw.new import trace
...
@trace
def my_slow_function(...):
...
...
For GPU tracing, when utilizing the GPU profiler, it is often advatageous to track when GPU is working at
particular function, instead of the CPU. To that end, the trace may be given an extra argument gpu=True
.
When utilized together with the GPUProfiler, this will emit GPU events to the GPU stream, and therefore
tracking the time of the beginning and end of GPU kernel launches (in the default stream). In ideal case,
the CPU will run ahead of GPU streams (because in case there is CPU intensive part, the GPU stream will catch up,
but allowing the CPU part to be cost free):
from gpaw.new import trace
...
@trace(gpu=True)
def my_slow_gpu_function(...):
...
...
Adding tracing inline
If one wants to trace a set of lines which do not consist of an entire function, one may use the tracectx context manager:
from gpaw.new import tracectx
with tracectx('slow lines'):
...