Debugging
Python debugging
Even though some debugging can done just with the print function, a real debugger offers several advantages. It is possible, for example, to set breakpoints in certain files or functions, execute the code step by step, examine and change values of variables. Python contains a standard debugger pdb. A script can be started under the debugger control as python3 -m pdb script.py. Now before the execution of the script starts one enters the debugger prompt. The most important debugger commands are:
h(elp) [command]
b(reak) [[filename:]lineno|function[, condition]]
Set a breakpoint.
s(tep)
Execute the current line, stop at the first possible occasion (either in a function that is called or on the next line in the current function).
n(ext)
Continue execution until the next line in the current function is reached or it returns.
r(eturn)
Continue execution until the current function returns.
c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.
l(ist) [first[, last]]
List source code for the current file.
p expression
Evaluate the expression in the current context and print its value. Note: “print” can also be used, but is not a debugger command – this executes the Python print statement
Most commands can be invoked with only the first letter. A full list of all the commands and their explanation can be found in the Python debugger (PDB) documentation.
An example session might look like:
corona1 ~/gpaw/trunk/test> python3 -m pdb H.py
> /home/csc/jenkovaa/gpaw/trunk/test/H.py(1)?()
-> from gpaw import GPAW
(Pdb) l 11,5
11 hydrogen.SetCalculator(calc)
12 e1 = hydrogen.GetPotentialEnergy()
13
14 calc.Set(kpts=(1, 1, 1))
15 e2 = hydrogen.GetPotentialEnergy()
16 equal(e1, e2)
(Pdb) break 12
Breakpoint 1 at /home/csc/jenkovaa/gpaw/trunk/test/H.py:12
(Pdb) c
... output from the script...
> /home/csc/jenkovaa/gpaw/trunk/test/H.py(12)?()
-> e1 = hydrogen.GetPotentialEnergy()
(Pdb) s
--Call--
> /v/solaris9/appl/chem/CamposASE/ASE/ListOfAtoms.py(224)GetPotentialEnergy()
-> def GetPotentialEnergy(self):
(Pdb) p self
[Atom('H', (2.0, 2.0, 2.0))]
Emacs has a special mode for Python debugging which can be invoked as M-x pdb. After that one has to give the command to start the debugger (e.g. python3 -m pdb script.py). Emacs opens two windows, one for the debugger command prompt and one which shows the source code and the current point of execution. Breakpoints can be set also on the source-code window.
C debugging
First, the C-extension should be compiled with the -g flag in order to get the debug information into the library. Also, the optimizations should be switched off which could be done in siteconfig.py as:
extra_link_args += ['-g']
extra_compile_args += ['-O0', '-g']
There are several debuggers available, the following example session applies to gdb:
sepeli ~/gpaw/trunk/test> gdb python
GNU gdb Red Hat Linux (6.1post-1.20040607.52rh)
(gdb) break Operator_apply
Function "Operator_apply" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (Operator_apply) pending.
(gdb) run H.py
Starting program: /usr/bin/python2.4 H.py
... output ...
Breakpoint 2, Operator_apply (self=0x2a98f8f670, args=0x2a9af73b78)
at c/operators.c:83
(gdb)
One can also do combined C and Python debugging by starting the input
script as run -m pdb H.py
i.e:
sepeli ~/gpaw/trunk/test> gdb python
GNU gdb Red Hat Linux (6.1post-1.20040607.52rh)
(gdb) break Operator_apply
Function "Operator_apply" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (Operator_apply) pending.
(gdb) run -m pdb H.py
Starting program: /usr/bin/python2.4 -m pdb H.py
[Thread debugging using libthread_db enabled]
[New Thread -1208371520 (LWP 1575)]
> /home/jenkovaa/test/H.py(1)?()
-> from gpaw import GPAW
(Pdb)
The basic gdb commands are the same as in pdb (or vice versa). Full documentation can be found in the GDB user manual. Apart from the commands mentioned earlier, a few are worthy of mention here:
backtrace [n | full]
Print a backtrace of the entire stack: one line per frame for all frames in the stack
full
prints the values of the local variables also.n
specifies the number of frames to print
jump linespec
Resume execution at line
linespec
i.e. at the given location in the corresponding source code. Any location of the typefilename:linenum
will do, but the results may be bizarre iflinespec
is in a different function from the one currently executing.
tbreak [[filename:]lineno|function[, condition]]
Set a breakpoint similar to how
break
operates, but this type of breakpoint is automatically deleted after the first time your program stops there.
p(rint) expr
Inquire about the symbols (names of variables, functions and types) defined in a compiled program.
expr
may include calls to functions in the program being debugged. Can also be used to evaluate more complicated expressions or referring to static variables in other source files as'foo.c'::x
.
Hint
Emacs can be used also with gdb. Start with M-x gdb and then continue as when starting from the command line.
Tracking memory leaks
Although a C-extensions runs fine, or so it seems, reference counting of Python
objects and matching calls to malloc
and free
may not always be up to par.
Frequently, the symptom of such disproportions is all too clear, resulting in
segmentation faults (i.e. SIGSEGV
) e.g. when a memory address is accessed
before it has been allocated or after is has been deallocated. Such situations
can be debugged using gdb as described above.
Note
Please refer to the Python/C API Reference Manual or the unofficial (but helpful) introduction to reference counting in Python.
On the other hand, neglecting the deallocation or forgetting to decrease the reference count of a Python object will lead to a build-up of unreachable memory blocks - a process known as memory leakage. Despite being non-critical bugs, severe memory leaks in C-code will eventually bring all computations to a halt when the program runs out of available memory.
Suppose you have written a Python script called test.py
which appears to
suffer from memory leaks. Having build GPAW with the -g flag as described,
tracking down the source of the memory leak (in this case line 123 of myfile.c
)
can be done using Valgrind as follows:
sepeli ~/gpaw/trunk/test> valgrind --tool=memcheck --leak-check=yes \
--show-reachable=yes --num-callers=20 --track-fds=yes gpaw-python test.py
==16442== 6,587,460 bytes in 29,943 blocks are definitely lost in loss record 85 of 85
==16442== at 0x40053C0: malloc (vg_replace_malloc.c:149)
==16442== by 0x5322831: ???
==16442== by 0x8087BD5: my_leaky_function (myfile.c:123)
Note that Valgrind is more than just a memory profiler for C; it provides an entire instrumentation framework for building dynamic analysis tools and thus includes other debugging tools, e.g. a heap/stack/global array overrun detector.
Parallel debugging
Debugging programs that are run in parallel with MPI is not as straight forward as in serial, but many of the same tools can be used (e.g. GDB and Valgrind). Note that one cannot use the Python debugger as described above because GPAW requires that a custom Python interpreter is built with the necessary MPI bindings.
There are probably numerous ways to debug an MPI application with GDB, and experimentation is strongly encouraged, but the following method is recommended for interactive debugging. This approach builds upon advice in Open MPI’s FAQ Debugging applications in parallel, but is adapted for use with Python on a GNU/Linux development platform. Prepend the following to your script:
import os, sys, time, math
from gpaw.mpi import world
gpaw_python_path = '/your/path/to/gpaw-python'
ndigits = 1 + int(math.log10(world.size))
assert os.system('screen -S gdb.%0*d -dm gdb %s %d' \
% (ndigits, world.rank, gpaw_python_path, os.getpid())) == 0
time.sleep(ndigits)
world.barrier()
This runs gdb /path/to/gpaw-python pid
from within each instance of the custom Python
interpreter and detaches it into a screen session
called gdb.0
for rank 0 etc. You may now resume control of the debugger instances by
running screen -rd gdb.0
, entering \(c\) to continue and so forth for all instances.
Hint
Run screen -ls
to get an overview of running sessions.
Enable logging of an attached session with Ctrl+a H (capital H).
Use Ctrl+a Ctrl+d to detach a session but leave it running.
Note
This approach only works if the problem you’re trying to address occurs after the GPAW executable has been loaded. In the alternate case, it is recommended to debug a single instance of the parallel program with the usual serial methods first.
For details on using Valgrind on parallel programs, please refer to the online manual Debugging MPI Parallel Programs with Valgrind