Author: fijal Branch: extradoc Changeset: r5564:172584f486a0 Date: 2015-10-03 22:00 +0200 http://bitbucket.org/pypy/extradoc/changeset/172584f486a0/
Log: write a draft diff --git a/blog/draft/warmup-improvements-2.rst b/blog/draft/warmup-improvements-2.rst new file mode 100644 --- /dev/null +++ b/blog/draft/warmup-improvements-2.rst @@ -0,0 +1,71 @@ + +Hello everyone! + +This is the second part of the series of warmup improvements and +memory consumption. This post covers recent work on sharing guard +resume data that was recently merged to trunk. It will be a part +of the next official PyPy release. To understand what it does, let's +start with a loop for a simple example:: + + def f(): + s = 0 + for i in range(100000): + s += 1 + +which compiles to the following loop:: + + label(p0, p1, p4, p6, p7, i39, i25, p15, p24, i44, i29, descr=TargetToken(4364727712)) + # check the loop exit + i45 = i44 >= i29 + guard(i45 is false) + # increase the loop counter + i46 = i44 + 1 + # store the index into special W_RangeObject + ((pypy.objspace.std.iterobject.W_AbstractSeqIterObject)p15).inst_index = i46 + # add s += 1 with overflow checking + i47 = int_add_ovf(i39, 1) + guard_no_overflow(descr=<Guard0x104295518>) + guard_not_invalidated(descr=<Guard0x1042954c0>) + i49 = getfield_raw_i(4336405536, descr=<FieldS pypysig_long_struct.c_value 0>) + i50 = i49 < 0 + guard(i50 is false) + jump(p0, p1, p4, p6, p7, i47, i44, p15, p24, i46, i29, descr=TargetToken(4364727712)) + +Now each ``guard`` here needs a bit of data to know how to exit the compiled +assembler into the interpreter, and potentially to compile a bridge in the future. +Since over 90% of guards never fail, this is incredibly wasteful - we have a copy +of the resume data for each guard. When two guards are next to each other or the +operations in between them are pure, we can safely redo the operations or to simply +put, resume in the previous guard. That means every now and again we execute a few +operations extra, but not storing extra info saves quite a bit of time and a bit of memory. +I've done some measurments on annotating & rtyping pypy which is pretty memory hungry +program that compiles a fair bit. I measured, respectively: + +* total time the translation step took (annotating or rtyping) + +* time it took for tracing (that excludes backend time for the total JIT time) at + the end of rtyping. + +* memory the GC feels responsible for after the step. The real amount of memory + consumed will always be larger and the coefficient of savings is in 1.5-2x mark + +Here is the table: + ++---------+-----------------+--------------+-------------------+----------------+--------------+ +| branch | time annotation | time rtyping | memory annotation | memory rtyping | tracing time | ++---------+-----------------+--------------+-------------------+----------------+--------------+ +| default | 317s | 454s | 707M | 1349M | 60s | ++---------+-----------------+--------------+-------------------+----------------+--------------+ +| sharing | 302s | 430s | 595M | 1070M | 51s | ++---------+-----------------+--------------+-------------------+----------------+--------------+ +| win | 4.8% | 5.5% | 19% | 26% | 17% | ++---------+-----------------+--------------+-------------------+----------------+--------------+ + +Obviously pypy translation is a bit extreme exampl - the vast majority of the code out there +does not have that much code involved that's being jitted. However, it's at the very least +a good win for us :-) + +We will continue to improve the warmup performance and keep you posted! + +Cheers, +fijal _______________________________________________ pypy-commit mailing list pypy-commit@python.org https://mail.python.org/mailman/listinfo/pypy-commit