Author: Armin Rigo <ar...@tunes.org>
Branch: extradoc
Changeset: r5645:09dd2764135d
Date: 2016-07-06 11:50 +0200
http://bitbucket.org/pypy/extradoc/changeset/09dd2764135d/

Log:    in-progress: reverse debugger

diff --git a/blog/draft/revdb.rst b/blog/draft/revdb.rst
new file mode 100644
--- /dev/null
+++ b/blog/draft/revdb.rst
@@ -0,0 +1,143 @@
+Hi all,
+
+If I had to pick the main advantage of PyPy over CPython, it is that
+we have got with the RPython translation toolchain a real place for
+experimentation.  Every now and then, we build inside RPython some
+feature that gives us an optionally tweaked version of the PyPy
+interpreter---tweaked in a way that would be hard to do with CPython,
+because it would require systematic changes everywhere.  The most
+obvious and successful examples are the GC and the JIT.  But there
+have been many other experiments along the same lines, from the
+so-called "stackless transformation" in the early days, to the STM
+version of PyPy.
+
+Today I would like to present you with last month's work (still very
+much in alpha state).  It is a RPython transformation that gives
+support for a *reverse debugger* in PyPy or in any other interpreter
+written in RPython.
+
+
+Reverse debugging
+-----------------
+
+What is `reverse debugging`__?  It is a debugger where you can go
+forward and backward in time.  It is still a not commonly used
+feature, and I have no idea why not.  I have used UndoDB's reverse
+debugger for C code, and I can only say that it saved me many, many
+days of poking around blindly in gdb.
+
+.. __: https://en.wikipedia.org/wiki/Debugger#Reverse_debugging
+
+There are already some Python experiments about reverse debugging.
+However, I claim that they are not very useful.  How they work is
+typically by recording changes to some objects, like lists and
+dictionaries, in addition to recording the history of where your
+program passed through.  However, the problem of Python is, again,
+that lists and dictionaries are not the end of the story.  There are
+many, many, many types of objects written in C which are mutable---in
+fact, the immutable ones are the exception.  You can try to
+systematically record all changes, but it is a huge task and easy to
+forget a detail.
+
+In other words it is a typical use case for tweaking the RPython
+translation toolchain rather than the CPython or PyPy interpreter
+directly.
+
+
+RevDB in PyPy
+-------------
+
+Right now, RevDB works barely enough to start being useful.  I have
+used it to track one real bug (for the interested people, see
+bd220c268bc9).  So here is what it is, what it is not, and how to use
+it.
+
+RevDB is a Python debugger.  It will not help track issues like
+segfaults or crashes of the interpreter, but it will help track any
+Python-level bugs.  Think about bugs that end up as a Python traceback
+or another wrong answer, but where the problem is really caused by
+something earlier going wrong in your Python logic.
+
+RevDB is a logging system, similar to http://rr-project.org/ .  You
+first run your Python program by using a special version of PyPy.  It
+creates a log file which records the I/O actions.  Sometimes you are
+tracking a rare bug: you may need to run your program many times until
+it shows the crash.  That should still be reasonable: the special
+version of PyPy is very slow (it does not contain any JIT nor one of
+our high-performance GCs), but still not incredibly so---it is a few
+times slower than running the same program on CPython.  The point is
+also that normally, what you need is just one recorded run of the
+program showing the bug.  You may struggle a bit to get that, but once
+you have it, this part is done.
+
+Then you use the debugger on the log file.  The debugger will also
+internally re-run the special version of PyPy in a different mode.
+This feels like a debugger, though it is really a program that
+inspects any point of the history.  Like in a normal pdb, you can use
+commands like "next" and "p foo.bar" and even run more complicated
+bits of Python code.  You also have new commands like "bnext" to go
+backwards.  Most importantly, you can set *watchpoints*.  More about
+that later.
+
+What you cannot do is do any input/output from the debugger.  Indeed,
+the log file records all imports that were done and what the imported
+modules contained.  Running the debugger on the log file gives an
+exact replay of what was recorded.
+
+
+
+
+
+
+
+
+
+
+- no thread module for now.  And, no cpyext module for now (the
+  CPython C API compatibility layer), because it depends on threads.
+  No micronumpy either.
+  These missing modules are probably blockers for large programs.
+
+- does not contain a JIT, and does not use our fast garbage collector.
+
+- for now, the process needs to get the same addresses (of C functions
+  and static data) when recording and when replaying.  On the Linux I
+  tried it with, you get this result by disabling Address Space Layout
+  Randomization (ASLR)::
+
+       echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
+
+- OS/X and other Posix platforms are probably just a few fixes away.
+  Windows support will require some custom logic to replace the
+  forking done when replaying.  This is more involved but should still
+  be possible.
+
+- maybe 15x memory usage on replaying (adjust number of forks in
+  process.py, MAX_SUBPROCESSES).
+
+- replaying issues:
+
+  - Attempted to do I/O or access raw memory: we get this often, and
+    then we need "bstep+step" before we can print anything else
+
+  - id() is globally unique, returning a reproducible 64-bit number,
+    so sometimes using id(x) is a workaround for when using x doesn't
+    work because of "Attempt to do I/O" issues (e.g.
+    ``p [id(x) for x in somelist]``)
+
+  - next/bnext/finish/bfinish might jump around a bit non-predictably.
+
+  - similarly, breaks on watchpoints can stop at apparently unexpected
+    places (when going backward, try to do "step" once).  The issue is
+    that it can only stop at the beginning of every line.  In the
+    extreme example, if a line is ``foo(somelist.pop(getindex()))``,
+    then ``somelist`` is modified in the middle.  Immediately before
+    this modification occurs, we are in ``getindex()``, and
+    immediately afterwards we are in ``foo()``.  The watchpoint will
+    stop the program at the end of ``getindex()`` if running backward,
+    and at the start of ``foo()`` if running forward, but never
+    actually on the line doing the change.
+
+  - the first time you use $NUM to refer to an object, if it was
+    created long ago, then replaying might need to replay again from
+    that long-ago time
_______________________________________________
pypy-commit mailing list
pypy-commit@python.org
https://mail.python.org/mailman/listinfo/pypy-commit

Reply via email to