Here's another data point. My results are similar to Skip's
(unsurprising since I'm also using a mac). My wild guess is that the
30% vs 10% improvement is an AMD vs. Intel thing? It's not 32-bit vs.
64-bit since both David and Jakob got a 30% speedup, but David had a
32-bit build while Jakob had a
Greg Ewing wrote:
A.M. Kuchling wrote:
A stray thought: does using a generator for the VM make life easier
for the Stackless Python developers in any way? Does it make it
possible for stock CPython to become stackless?
I doubt it. A major barrier to stacklessness is that
a lot of
Phillip J. Eby [EMAIL PROTECTED] writes:
At 10:47 AM 10/24/2008 +0200, J. Sievers wrote:
- Right now, CPython's bytecode is translated to direct threaded code
lazily (when a code object is first evaluated). This would have to
be merged into compile.c in some way plus some assorted minor
Stefan Behnel wrote:
That's obviously a problem, but it only answers the second question, not the
first one. [does using a generator for the VM make life easier
for the Stackless Python developers in any way?]
The Stackless Python developers themselves would have to answer
that one, but my
On Sat, Oct 25, 2008 at 04:33:23PM +1300, Greg Ewing wrote:
Maybe not, but at least you can follow what it's doing
just by knowing C. Introducing vmgen would introduce another
layer for the reader to learn about.
A stray thought: does using a generator for the VM make life easier
for the
At 07:50 AM 10/25/2008 -0400, A.M. Kuchling wrote:
On Sat, Oct 25, 2008 at 04:33:23PM +1300, Greg Ewing wrote:
Maybe not, but at least you can follow what it's doing
just by knowing C. Introducing vmgen would introduce another
layer for the reader to learn about.
A stray thought: does using
A.M. Kuchling wrote:
A stray thought: does using a generator for the VM make life easier
for the Stackless Python developers in any way? Does it make it
possible for stock CPython to become stackless?
I doubt it. A major barrier to stacklessness is that
a lot of extension modules would need
M.-A. Lemburg [EMAIL PROTECTED] writes:
[snip]
BTW: I hope you did not use pybench to get profiles of the opcodes.
That would most certainly result in good results for pybench, but
less good ones for general applications such as Django or Zope/Plone.
Algorithm used for superinstruction
Daniel Stutzbach [EMAIL PROTECTED] writes:
[snip]
I searched around for information on how threaded code interacts with
branch prediction, and here's what I found. The short answer is that
threaded code significantly improves branch prediction.
See ``Optimizing indirect branch
On Fri, Oct 24, 2008 at 7:18 AM, Terry Reedy [EMAIL PROTECTED] wrote:
I have not seen any Windows test yet. The direct threading is gcc-specific,
so there might be degradation with MSVC.
erlang uses gcc to compile a single source file on windows and uses MS
VC++ to compile all others. They
Greg Ewing [EMAIL PROTECTED] writes:
Daniel Stutzbach wrote:
With threaded code, every handler ends with its own dispatcher, so
the processor can make fine-grained predictions.
I'm still wondering whether all this stuff makes a
noticeable difference in real-life Python code, which
spends
On 2008-10-24 09:53, J. Sievers wrote:
M.-A. Lemburg [EMAIL PROTECTED] writes:
[snip]
BTW: I hope you did not use pybench to get profiles of the opcodes.
That would most certainly result in good results for pybench, but
less good ones for general applications such as Django or Zope/Plone.
[EMAIL PROTECTED] writes:
On 23 Oct, 10:42 pm, [EMAIL PROTECTED] wrote:
Guido van Rossum wrote:
there already is something else called VPython
Perhaps it could be called Fython (Python with a Forth-like VM)
or Thython (threaded-code Python).
I feel like I've missed something important, but,
Greg Ewing wrote:
[EMAIL PROTECTED] wrote:
Is there any reason this should be a separate project rather than just
be rolled in to the core?
Always keep in mind that one of the important characteristics
of CPython is that its implementation is very straightforward
and easy to follow.
Guido This is very interesting (at this point I'm just lurking), but
Guido has anyone pointed out yet that there already is something else
Guido called VPython, which has a long standing right to the name?
I believe Jakob has already been notified about this. How about TPython? A
Terry I have not seen any Windows test yet. The direct threading is
Terry gcc-specific, so there might be degradation with MSVC.
Not if a compiler #ifdef selects between two independent choices:
#ifdef __GCC__ /* or whatever the right incantation is */
#include
At 10:47 AM 10/24/2008 +0200, J. Sievers wrote:
- Right now, CPython's bytecode is translated to direct threaded code
lazily (when a code object is first evaluated). This would have to
be merged into compile.c in some way plus some assorted minor changes.
Don't you mean codeobject.c? I
[EMAIL PROTECTED] writes:
BTW, as to the implementation of individual VM instructions I don't believe
the Vmgen stuff affects that. It's just the way the instructions are
assembled.
Vmgen handles the pushing and popping as well. E.g. ROT_THREE becomes:
rot_three ( a1 a2 a3 -- a3 a1 a2 )
[EMAIL PROTECTED] writes:
Guido This is very interesting (at this point I'm just lurking), but
Guido has anyone pointed out yet that there already is something else
Guido called VPython, which has a long standing right to the name?
I believe Jakob has already been notified about
Stefan Behnel wrote:
Funny to hear that from the author of a well-known code generator. ;-)
I've never claimed that anything about the implementation
of Pyrex is easy to follow. :-)
Having two switch statements and a couple of separate
special cases for a single eval loop might look pretty
Hey,
I hope you don't mind my replying in digest form.
First off, I guess I should be a little clearer as to what VPthon is
and what it does.
VPython is essentially a set of patches for CPython (in touches only
three files, diff -b is about 800 lines IIRC plus the switch statement
in ceval.c's
On Thu, Oct 23, 2008 at 1:08 AM, J. Sievers [EMAIL PROTECTED] wrote:
In particular, direct threaded code leads to less horrible branch
prediction than switch dispatch on many machines (exactly how
pronounced this effect is depends heavily on the specific
architecture).
To clarify: This is
On 2008-10-23 09:08, J. Sievers wrote:
a) It's fairly easy to implement different types of dispatch, simply by
changing a few macros (and while I haven't done this, it shouldn't be a
problem to add some switch dispatch #ifdefs for non-GCC platforms).
In particular, direct threaded code leads
Adam Olsen wrote:
To clarify: This is *NOT* actually a form of threading, is it?
I think the term threaded code is being used here in
the sense of Forth, i.e. instead of a sequence of small
integers that are dispatched using a switch statement,
you use the actual machine addresses of the
On Thu, Oct 23, 2008 at 01:31:48AM -0600, Adam Olsen wrote:
To clarify: This is *NOT* actually a form of threading, is it? It
merely breaks the giant dispatch table into a series of small ones,
while also grouping instructions into larger superinstructions? OS
threads are not touched at any
A.M. Kuchling amk at amk.ca writes:
threaded code: A technique for implementing virtual machine
interpreters, introduced by J.R. Bell in 1973, where each op-code in
the virtual machine instruction set is the address of some (lower
level) code to perform the required
On 2008.10.23 12:02:12 +0200, M.-A. Lemburg wrote:
BTW: I hope you did not use pybench to get profiles of the opcodes.
That would most certainly result in good results for pybench, but
less good ones for general applications such as Django or Zope/Plone.
I was wondering about Pybench-specific
On 2008-10-23 15:19, David Ripton wrote:
On 2008.10.23 12:02:12 +0200, M.-A. Lemburg wrote:
BTW: I hope you did not use pybench to get profiles of the opcodes.
That would most certainly result in good results for pybench, but
less good ones for general applications such as Django or
Jakob David Gregg (and friends) recently published a paper comparing
Jakob stack based and register based VMs for Java and found that
Jakob register based VMs were substantially faster. The main reason for
Jakob this appears to be the absence of the various LOAD_ instructions
On Thu, Oct 23, 2008 at 8:13 AM, Antoine Pitrou [EMAIL PROTECTED] wrote:
Is this kind of optimization that useful on modern CPUs? It helps remove a
memory access to the switch/case lookup table, which should shave off the 3
CPU
cycles of latency of a modern L1 data cache, but it won't remove
Daniel Stutzbach wrote:
With threaded code, every handler ends with its own dispatcher, so the
processor can make fine-grained predictions.
I'm still wondering whether all this stuff makes a
noticeable difference in real-life Python code, which
spends most of its time doing expensive things
On Wed, Oct 22, 2008 at 5:16 AM, J. Sievers [EMAIL PROTECTED] wrote:
I implemented a variant of the CPython VM on top of Gforth's Vmgen; this made
it fairly straightforward to add direct threaded code and superinstructions
for
the various permutations of LOAD_CONST, LOAD_FAST, and most of the
Guido van Rossum wrote:
there already is something else called VPython
Perhaps it could be called Fython (Python with a Forth-like VM)
or Thython (threaded-code Python).
--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
On 23 Oct, 10:42 pm, [EMAIL PROTECTED] wrote:
Guido van Rossum wrote:
there already is something else called VPython
Perhaps it could be called Fython (Python with a Forth-like VM)
or Thython (threaded-code Python).
I feel like I've missed something important, but, why not just call it
[EMAIL PROTECTED] wrote:
Is there any reason this should be a separate project rather than just
be rolled in to the core?
Always keep in mind that one of the important characteristics
of CPython is that its implementation is very straightforward
and easy to follow. Replacing the ceval loop
[EMAIL PROTECTED] wrote:
It's a substantial patch, but from what I understand it's a huge
performance improvement and completely compatible, both at the C API and
Python source levels.
I have not seen any Windows test yet. The direct threading is
gcc-specific, so there might be degradation
Hi,
I implemented a variant of the CPython VM on top of Gforth's Vmgen; this made
it fairly straightforward to add direct threaded code and superinstructions for
the various permutations of LOAD_CONST, LOAD_FAST, and most of the two-argument
VM instructions.
Sources:
2008/10/22 J. Sievers [EMAIL PROTECTED]:
I implemented a variant of the CPython VM on top of Gforth's Vmgen; this made
it fairly straightforward to add direct threaded code and superinstructions
for
the various permutations of LOAD_CONST, LOAD_FAST, and most of the
two-argument
VM
On 2008-10-22 14:16, J. Sievers wrote:
Hi,
I implemented a variant of the CPython VM on top of Gforth's Vmgen; this made
it fairly straightforward to add direct threaded code and superinstructions
for
the various permutations of LOAD_CONST, LOAD_FAST, and most of the
two-argument
VM
On Oct 22, 2008, at 10:16 AM, J. Sievers wrote:
Hi,
I implemented a variant of the CPython VM on top of Gforth's Vmgen;
this made
it fairly straightforward to add direct threaded code and
superinstructions for
the various permutations of LOAD_CONST, LOAD_FAST, and most of the
J I implemented a variant of the CPython VM on top of Gforth's Vmgen; this
made
J it fairly straightforward to add direct threaded code and
superinstructions for
J the various permutations of LOAD_CONST, LOAD_FAST, and most of the
two-argument
J VM instructions.
J Sources:
J I implemented a variant of the CPython VM on top of Gforth's Vmgen;
J this made it fairly straightforward to add direct threaded code and
J superinstructions for the various permutations of LOAD_CONST,
J LOAD_FAST, and most of the two-argument VM instructions.
Skip Trying to
Feedback is, of course, very welcome and it'd be great to have some pybench
results from different machines.
My results are very similar to Jakob's.
Gentoo Linux, 32-bit x86, Athlon 6400+ underclocked to 3.0 GHz.
make test:
282 tests OK.
5 tests failed:
test_doctest test_hotshot
David Ripton wrote:
Feedback is, of course, very welcome and it'd be great to have some pybench
results from different machines.
My results are very similar to Jakob's.
From looking thru the vmgen manual, there are two things it is doing
that CPython is not. 1. gcc-specific threaded code;
44 matches
Mail list logo