[issue1943] improved allocation of PyUnicode objects

2011-09-29 Thread STINNER Victor
STINNER Victor added the comment: The PEP 393 is based on the idea proposed in this issue (use only one memory block, not two), but also enhanced it to reduce more the memory using other technics: - use a different character type depending on the maximum character, - use a shorter structure

[issue1943] improved allocation of PyUnicode objects

2011-04-25 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: I just found that the extension zope.i18nmessageid: http://pypi.python.org/pypi/zope.i18nmessageid subclasses unicode at the C level: http://svn.zope.org/zope.i18nmessageid/trunk/src/zope/i18nmessageid/_zope_i18nmessageid_message.c?rev=120914&view=markup

[issue1943] improved allocation of PyUnicode objects

2010-10-07 Thread Antoine Pitrou
Antoine Pitrou added the comment: Updated patch against current py3k. -- Added file: http://bugs.python.org/file19142/unialloc6.patch ___ Python tracker ___ _

[issue1943] improved allocation of PyUnicode objects

2010-09-20 Thread Benjamin Peterson
Benjamin Peterson added the comment: 2010/9/20 Mark Lawrence : > > Mark Lawrence added the comment: > > No reply to msg110599, I'll close this in a couple of weeks unless anyone > objects. Please don't. This is still a valid issue. -- status: pending -> open

[issue1943] improved allocation of PyUnicode objects

2010-09-20 Thread Mark Lawrence
Mark Lawrence added the comment: No reply to msg110599, I'll close this in a couple of weeks unless anyone objects. -- status: open -> pending ___ Python tracker ___ ___

[issue1943] improved allocation of PyUnicode objects

2010-08-21 Thread Guido van Rossum
Changes by Guido van Rossum : -- assignee: gvanrossum -> ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http:/

[issue1943] improved allocation of PyUnicode objects

2010-07-17 Thread Mark Lawrence
Mark Lawrence added the comment: @Antoine: do you wish to try and take this forward? -- nosy: +BreamoreBoy ___ Python tracker ___ ___

[issue1943] improved allocation of PyUnicode objects

2010-02-01 Thread Terry J. Reedy
Terry J. Reedy added the comment: >It is, otherwise I would have documented it. The fact that some developers are not using those APIs correctly doesn't change that. If, as Antoine claimed, 'it' is a documented feature of str strings, and Py3 says str = Unicode, it is a plausible inference. -

[issue1943] improved allocation of PyUnicode objects

2010-02-01 Thread Antoine Pitrou
Antoine Pitrou added the comment: Le lundi 01 février 2010 à 19:21 +, Marc-Andre Lemburg a écrit : > > This is not an implementation detail. > > It is, otherwise I would have documented it. Ok, so the current allocation scheme of unicode objects is an implementation detail as well, right?

[issue1943] improved allocation of PyUnicode objects

2010-02-01 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: modules to py3k. > This is not an implementation detail. It is, otherwise I would have documented it. The fact that some developers are not using those APIs correctly doesn't change that. Note that PyUnicode_AsUnicode() only returns a pointer to the Py_UNI

[issue1943] improved allocation of PyUnicode objects

2010-02-01 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: > Note that Python is free to change the meaning of Py_UNICODE > (e.g. to use UCS4 on all platforms) Python-UCS4 has never worked on Windows. Most developers on Windows, taking example on core python source code, implicitly assumed that HAVE_USABLE_WCHAR

[issue1943] improved allocation of PyUnicode objects

2010-02-01 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: Then there are many places to change, in core python as well as in third-party code. And PyArg_ParseTuple("u") would not work any more. -- ___ Python tracker ___

[issue1943] improved allocation of PyUnicode objects

2010-02-01 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc added the comment: > >> Base type Unicode buffers end with a null-Py_UNICODE termination, >> but this is not used anywhere, AFAIK > On Windows, code like >CreateDirectoryW(PyUnicode_AS_UNICODE(po),

[issue1943] improved allocation of PyUnicode objects

2010-02-01 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: Again, on Windows there are many many usages of PyUnicode_AS_UNICODE() that pass the result to various Windows API functions, expecting a nul-terminated array of WCHARs. Please don't change this! -- ___ Pytho

[issue1943] improved allocation of PyUnicode objects

2010-02-01 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I'd consider that a bug. Esp. the IO lib should be 8-bit clean > in the sense that it doesn't add any special meaning to NUL > characters or code points. It doesn't add any special meaning to them. It just relies on a NUL being present after the end of the st

[issue1943] improved allocation of PyUnicode objects

2010-02-01 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >> I find that the null termination for 8-bit strings makes low-level >> parsing operations (e.g., parsing a numeric string) safer and easier: > > Not to mention faster. The new IO library mak

[issue1943] improved allocation of PyUnicode objects

2010-01-31 Thread Brian Harring
Changes by Brian Harring : -- nosy: +ferringb ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python

[issue1943] improved allocation of PyUnicode objects

2010-01-11 Thread Adam Olsen
Adam Olsen added the comment: On Sun, Jan 10, 2010 at 14:59, Marc-Andre Lemburg wrote: > BTW, I'm not aware of any changes to the PyUnicodeObject by some > fastsearch implementation. Could you point me to this ? /* We allocate one more byte to make sure the string is Ux terminated.

[issue1943] improved allocation of PyUnicode objects

2010-01-10 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I find that the null termination for 8-bit strings makes low-level > parsing operations (e.g., parsing a numeric string) safer and easier: Not to mention faster. The new IO library makes use of it (for newline detection), on both bytestrings and unicode strin

[issue1943] improved allocation of PyUnicode objects

2010-01-10 Thread Mark Dickinson
Mark Dickinson added the comment: I find that the null termination for 8-bit strings makes low-level parsing operations (e.g., parsing a numeric string) safer and easier: for example, it makes skipping a series of digits with something like: while (isdigit(*s)) ++s; safe. I'd imagine that

[issue1943] improved allocation of PyUnicode objects

2010-01-10 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: > Base type Unicode buffers end with a null-Py_UNICODE termination, > but this is not used anywhere, AFAIK On Windows, code like CreateDirectoryW(PyUnicode_AS_UNICODE(po), NULL) is very common, at least in posixmodule.c. -- __

[issue1943] improved allocation of PyUnicode objects

2010-01-10 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Adam Olsen wrote: > > Adam Olsen added the comment: > > Points against the subclassing argument: > > * We have a null-termination invariant. For byte strings this was part of > the public API, and I'm not sure that's changed for unicode strings; aren't

[issue1943] improved allocation of PyUnicode objects

2010-01-10 Thread Adam Olsen
Adam Olsen added the comment: Points against the subclassing argument: * We have a null-termination invariant. For byte strings this was part of the public API, and I'm not sure that's changed for unicode strings; aren't you arguing that we should maximize how much of our implementation is a

[issue1943] improved allocation of PyUnicode objects

2009-06-08 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Terry J. Reedy wrote: > In the interest of possibly improving the imminent 3.1 release, > I opened #6216 > Raise Unicode KEEPALIVE_SIZE_LIMIT from 9 to 32? Thanks for opening that ticket. > I wonder if it is possible to make it generically easier to subcla

[issue1943] improved allocation of PyUnicode objects

2009-06-05 Thread Terry J. Reedy
Terry J. Reedy added the comment: In the interest of possibly improving the imminent 3.1 release, I opened #6216 Raise Unicode KEEPALIVE_SIZE_LIMIT from 9 to 32? I wonder if it is possible to make it generically easier to subclass PyVarObjects (but my C knowledge to getting too faded to have an

[issue1943] improved allocation of PyUnicode objects

2009-06-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Note that in Python 2.x you don't have such issues because > there, most tools for text processing will happily work on > any sort of buffer, so you don't need a string sub-type > in order to implement e.g. references into another string > (the buffer type wil

[issue1943] improved allocation of PyUnicode objects

2009-06-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Guido van Rossum wrote: > I think it's fine to wait for 3.2. Maybe add something to the docs > about not subclassing unicode in C. We should have a wider discussion about this on python-dev. I'll publish the unicoderef extension and then we can see whether

[issue1943] improved allocation of PyUnicode objects

2009-06-05 Thread Guido van Rossum
Guido van Rossum added the comment: On Fri, Jun 5, 2009 at 4:06 AM, Marc-Andre Lemburg wrote: > > Marc-Andre Lemburg added the comment: > > Antoine Pitrou wrote: >> Antoine Pitrou added the comment: >> >> Raymond suggested the patch be committed in 3.1, so as to minimize >> disruption between

[issue1943] improved allocation of PyUnicode objects

2009-06-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > Antoine Pitrou added the comment: > > Raymond suggested the patch be committed in 3.1, so as to minimize > disruption between 3.1 and 3.2. Benjamin, what do you think? Has Guido pronounced on this already ? -- ___

[issue1943] improved allocation of PyUnicode objects

2009-06-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: Raymond suggested the patch be committed in 3.1, so as to minimize disruption between 3.1 and 3.2. Benjamin, what do you think? -- nosy: +benjamin.peterson ___ Python tracker _

[issue1943] improved allocation of PyUnicode objects

2009-06-04 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Anything larger than 256 bytes goes straight to the OS malloc(). Under a 64-bit system, a plain dict is more than 256 bytes. -- ___ Python tracker __

[issue1943] improved allocation of PyUnicode objects

2009-06-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > Antoine Pitrou added the comment: > >> Since pymalloc is being used to manage such objects, there's >> a lot of room for improvements, since the allocation scheme >> is under out control. E.g. we could have pymalloc allocate >> larg

[issue1943] improved allocation of PyUnicode objects

2009-06-04 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Since pymalloc is being used to manage such objects, there's > a lot of room for improvements, since the allocation scheme > is under out control. E.g. we could have pymalloc allocate > larger pools for PyUnicodeObjects. I'm not sure what "larger pools for Py

[issue1943] improved allocation of PyUnicode objects

2009-06-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Guido van Rossum wrote: > Guido van Rossum added the comment: > > On Wed, Jun 3, 2009 at 1:41 PM, Antoine Pitrou wrote: >> Apart from the example Marc-André just posted (and which is a 0.0.1 >> proof of concept he apparently just wrote), the number of use

[issue1943] improved allocation of PyUnicode objects

2009-06-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Here's a new version of the unicode reference type, extended to run in both Python 2.6 and 3.1: http://downloads.egenix.com/python/unicoderef-0.0.2.tar.gz I've also included a benchmark implemented in C which measures Unicode/Bytes allocation performance a

[issue1943] improved allocation of PyUnicode objects

2009-06-03 Thread Guido van Rossum
Guido van Rossum added the comment: On Wed, Jun 3, 2009 at 1:41 PM, Antoine Pitrou wrote: > Apart from the example Marc-André just posted (and which is a 0.0.1 > proof of concept he apparently just wrote), the number of users is, > AFAICT, zero. IIUC Marc-Andre extracted that from a larger cod

[issue1943] improved allocation of PyUnicode objects

2009-06-03 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Still, I expect that a vanishingly small number of users will actually > use that feature. Apart from the example Marc-André just posted (and which is a 0.0.1 proof of concept he apparently just wrote), the number of users is, AFAICT, zero. Unless there's so

[issue1943] improved allocation of PyUnicode objects

2009-06-03 Thread Collin Winter
Collin Winter added the comment: On Wed, Jun 3, 2009 at 2:36 AM, Marc-Andre Lemburg wrote: > Marc-Andre Lemburg added the comment: >> All this is assuming the speed-up is important enough to bother.  Has >> anyone run a comparison benchmark using the Unladen Swallow benchmarks? >> >>  I trust

[issue1943] improved allocation of PyUnicode objects

2009-06-03 Thread Guido van Rossum
Guido van Rossum added the comment: Hm, so the extra pointer is a feature. I guess a compromise would be to keep the extra indirection but make it point into the same object in the base class. Thinking about how memory caching in modern CPUs work, this would probably be quite fast but it would

[issue1943] improved allocation of PyUnicode objects

2009-06-03 Thread STINNER Victor
Changes by STINNER Victor : -- nosy: -haypo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue1943] improved allocation of PyUnicode objects

2009-06-03 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: Let's apply simple and noncontroversial patches first, and then see if the bigger changes are still worth it. Please open a new ticket. -- ___ Python tracker

[issue1943] improved allocation of PyUnicode objects

2009-06-03 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: >> Instead of changing PyUnicodeObject from a PyObject to a PyVarObject, >> making sub-typing a lot harder, I'd much rather apply a single change >> for 3.1: raising the KEEPALIVE_SIZE_LIMIT to 32 as explained and >> motivated here: >

[issue1943] improved allocation of PyUnicode objects

2009-06-03 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Instead of changing PyUnicodeObject from a PyObject to a PyVarObject, > making sub-typing a lot harder, I'd much rather apply a single change > for 3.1: raising the KEEPALIVE_SIZE_LIMIT to 32 as explained and > motivated here: You make it sound like an altern

[issue1943] improved allocation of PyUnicode objects

2009-06-03 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: > That's unfortunate; it would clearly have been easier to change this in 3.1. > > That said, I'm not sure anyone *should* be subclassing PyUnicode. Maybe > Marc-Andre can explain why he is doing this (or point to the message in > this thread where he expla

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Guido van Rossum
Guido van Rossum added the comment: That's unfortunate; it would clearly have been easier to change this in 3.1. That said, I'm not sure anyone *should* be subclassing PyUnicode. Maybe Marc-Andre can explain why he is doing this (or point to the message in this thread where he explained this b

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Raymond Hettinger
Changes by Raymond Hettinger : -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Raymond Hettinger
Raymond Hettinger added the comment: Correction: It is a proposal for 3.2 that changes the struct used in 3.0 and 3.1. -- ___ Python tracker ___

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Raymond Hettinger
Raymond Hettinger added the comment: It's not in 3.1. It is a proposal for 3.2 that changes the struct from what it is in 3.0 and 3.o. -- ___ Python tracker ___

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Guido van Rossum
Guido van Rossum added the comment: If this is not yet in 3.1, it's clearly too late to add it (now that RC1 was already released). If was in already (hard to tell from the long bug), I think it should be kept in (removing it would destabilize more than keeping it). -- ___

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Raymond Hettinger
Raymond Hettinger added the comment: Mark, I'm inclined to agree that this would be a destabilizing change. Guido, do you care to pronounce on whether it is okay to change the struct? -- assignee: -> gvanrossum nosy: +gvanrossum, rhettinger ___ Pyt

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > Antoine Pitrou added the comment: > >> You cannot simply recompile your code and have it working. > > Who is "you"? > People doing mundane things with PyUnicodeObjects certainly can, > assuming they use the macros for any member ac

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Here's an example implementation of a Unicode sub-type that allows referencing other Unicode objects: http://downloads.egenix.com/python/unicoderef-0.0.1.tar.gz As you can see, it's pretty straight-forward to write and I want to keep it that way.

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Antoine Pitrou
Antoine Pitrou added the comment: > You cannot simply recompile your code and have it working. Who is "you"? People doing mundane things with PyUnicodeObjects certainly can, assuming they use the macros for any member access. > Please note that all type objects documented in the header files >

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > Antoine Pitrou added the comment: > >> The patch breaks C API + binary compatibility for an essential Python >> type - that's not something you can easily undo. > > I don't see how it breaks C API compatibility. No officially docum

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Eric Smith
Changes by Eric Smith : -- nosy: +eric.smith ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Antoine Pitrou
Antoine Pitrou added the comment: > The patch breaks C API + binary compatibility for an essential Python > type - that's not something you can easily undo. I don't see how it breaks C API compatibility. No officially documented function has changed, and the accessor macros still work. Am I mis

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > Antoine Pitrou added the comment: > >> There were a number of patches to support sharing of data between >> unicode objects. (By Larry Hastings?) They were rejected because (a) >> they were complicated, and (b) it was possible

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Jim Jewett wrote: > Jim Jewett added the comment: > > There were a number of patches to support sharing of data between > unicode objects. (By Larry Hastings?) They were rejected because (a) > they were complicated, and (b) it was possible to provoke

[issue1943] improved allocation of PyUnicode objects

2009-05-30 Thread Antoine Pitrou
Antoine Pitrou added the comment: > There were a number of patches to support sharing of data between > unicode objects. (By Larry Hastings?) They were rejected because (a) > they were complicated, and (b) it was possible to provoke pathological > memory retention. Yes, it's the "lazy st

[issue1943] improved allocation of PyUnicode objects

2009-05-30 Thread Jim Jewett
Jim Jewett added the comment: There were a number of patches to support sharing of data between unicode objects. (By Larry Hastings?) They were rejected because (a) they were complicated, and (b) it was possible to provoke pathological memory retention. -- nosy: +jimjjewett

[issue1943] improved allocation of PyUnicode objects

2009-05-25 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: > The OS malloc() is only called... I know this. But pymalloc has its own overhead, and cache locality will certainly be better if string data is close to the string length. The goal is to improve the current usage of strings, and not rely on hypothetical

[issue1943] improved allocation of PyUnicode objects

2009-05-25 Thread Antoine Pitrou
Antoine Pitrou added the comment: Marc-André, the problem is that all your arguments are fallacious at best. Let me see: > Like I said: The current design of the Unicode object implementation > would benefit more from advances in pymalloc tuning, not from making it > next to impossible to exten

[issue1943] improved allocation of PyUnicode objects

2009-05-25 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Amaury Forgeot d'Arc wrote: > Amaury Forgeot d'Arc added the comment: > > Looking at the comments, it seems that the performance gain comes from > the removal of the double allocation which is needed by the current design. > > Was the following implementa

[issue1943] improved allocation of PyUnicode objects

2009-05-25 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: Looking at the comments, it seems that the performance gain comes from the removal of the double allocation which is needed by the current design. Was the following implementation considered: - keep the current PyUnicodeObject structure - for small string

[issue1943] improved allocation of PyUnicode objects

2009-05-25 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine, I have explained the reasons for rejecting the patch. In short, it violates a design principle behind the Unicode implementation. If you want to change such a basic aspect of the Unicode implementation, then write a PEP which demonstrates the usefu

[issue1943] improved allocation of PyUnicode objects

2009-05-24 Thread Antoine Pitrou
Antoine Pitrou added the comment: Marc-André, please don't close the issue while you're the only one opposing it, thanks. -- resolution: rejected -> status: closed -> open ___ Python tracker __

[issue1943] improved allocation of PyUnicode objects

2009-05-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Ok, then closing the patch as rejected. -- resolution: -> rejected status: open -> closed ___ Python tracker ___ __

[issue1943] improved allocation of PyUnicode objects

2009-05-24 Thread Antoine Pitrou
Antoine Pitrou added the comment: As I already showed, the freelist experiments bring very little improvement. -- ___ Python tracker ___ _

[issue1943] improved allocation of PyUnicode objects

2009-05-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine, I think we have to make a decision here: I'm still -1 on changing PyUnicodeObject to be a PyVarObject, but do like your experiments with the free lists. I also still believe that tuning the existing parameters in the Unicode implementation and pyma

[issue1943] improved allocation of PyUnicode objects

2009-05-23 Thread Antoine Pitrou
Antoine Pitrou added the comment: Updated patch against py3k. On a 64-bit system, each unicode object takes 14 bytes less than without the patch (using sys.getsizeof()). Two to four more bytes could be gained by folding the `state` member in the two lower bits of `defenc`, but I'm not sure it's

[issue1943] improved allocation of PyUnicode objects

2009-05-16 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Daniel, which patch? freelists2.patch or unialloc4.patch? If these are > targeted py3k (judging by the "Versions" selector above), none of > Unladen Swallow's benchmarks work under 3k (we're focusing on 2.x). They target py3k indeed. Also, they need updating

[issue1943] improved allocation of PyUnicode objects

2009-05-16 Thread Collin Winter
Collin Winter added the comment: Daniel, which patch? freelists2.patch or unialloc4.patch? If these are targeted py3k (judging by the "Versions" selector above), none of Unladen Swallow's benchmarks work under 3k (we're focusing on 2.x). -- ___ Pytho

[issue1943] improved allocation of PyUnicode objects

2009-05-16 Thread Daniel Diniz
Daniel Diniz added the comment: Collin, Can you test this patch with Unladen Swallow's benchmarks? -- components: +Unicode nosy: +ajaksu2, collinwinter, ezio.melotti, haypo stage: -> test needed versions: +Python 3.2 -Python 3.0 ___ Python tracker

[issue1943] improved allocation of PyUnicode objects

2008-03-22 Thread Marc-Andre Lemburg
Marc-Andre Lemburg <[EMAIL PROTECTED]> added the comment: I wasn't clear enough: my point was that your free list patch would probably benefit from some tuning of the cut-off parameters. 15 characters appears to be too small (see the HISTORY file histogram). You'll likely get better results for

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Antoine Pitrou
Antoine Pitrou <[EMAIL PROTECTED]> added the comment: Well, of course most words in most languages are below 20 characters. Hence most strings containing words are also below 20 chars. But strings can also contain whole lines (e.g. decoding of various Internet protocols), which are statistically

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Marc-Andre Lemburg
Marc-Andre Lemburg <[EMAIL PROTECTED]> added the comment: Thanks for running the tests again. The use of pymalloc for the buffer made a significant difference indeed. I expect that more can be had by additionally tweaking KEEPALIVE_SIZE_LIMIT. It is interesting to see that the free list patch o

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Antoine Pitrou
Changes by Antoine Pitrou <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file9419/unialloc2.patch __ Tracker <[EMAIL PROTECTED]> __ ___ Pytho

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Antoine Pitrou
Changes by Antoine Pitrou <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file9441/unialloc3.patch __ Tracker <[EMAIL PROTECTED]> __ ___ Pytho

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Antoine Pitrou
Changes by Antoine Pitrou <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file9332/freelists.patch __ Tracker <[EMAIL PROTECTED]> __ ___ Pytho

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Antoine Pitrou
Changes by Antoine Pitrou <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file9296/unialloc.patch __ Tracker <[EMAIL PROTECTED]> __ ___ Python

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Antoine Pitrou
Changes by Antoine Pitrou <[EMAIL PROTECTED]>: Added file: http://bugs.python.org/file9790/freelists2.patch __ Tracker <[EMAIL PROTECTED]> __ ___ Python

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Antoine Pitrou
Changes by Antoine Pitrou <[EMAIL PROTECTED]>: Added file: http://bugs.python.org/file9789/unialloc4.patch __ Tracker <[EMAIL PROTECTED]> __ ___ Python-

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Antoine Pitrou
Antoine Pitrou <[EMAIL PROTECTED]> added the comment: You are right, #2321 made the numbers a bit tighter: With a small string: ./python -m timeit -s "s=open('INTBENCH', 'r').read()" "s.split()" -> Unpatched py3k: 23.1 usec per loop -> Freelist patch: 21.3 usec per loop -> PyVarObject patch: 20.

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Marc-Andre Lemburg
Marc-Andre Lemburg <[EMAIL PROTECTED]> added the comment: Regarding the benchmark: You can instrument a 2.x version of the interpreter to build the data set and then have the test load this data set in Py3k and have it replay the allocation/deallocation in the same way it was done on the 2.x syst

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Antoine Pitrou
Antoine Pitrou <[EMAIL PROTECTED]> added the comment: Well, I'm not gonna try to defend my patch eternally :) I understand your opinion even if I find a bit disturbing that we refuse a concrete, actual optimization on the basis of future hypothetical ones. Since all the arguments have been laid

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Marc-Andre Lemburg
Marc-Andre Lemburg <[EMAIL PROTECTED]> added the comment: I've read the comments from Guido and Martin, but they don't convince me in changing my -1. As you say: it's difficult to get support for optimizations such a slicing and concatenation into the core. And that's exactly why I want to keep

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Antoine Pitrou
Antoine Pitrou <[EMAIL PROTECTED]> added the comment: Well I'm not subscribed to the python-3k list either - too much traffic indeed. You can read and post into it with gmane for example: http://thread.gmane.org/gmane.comp.python.python-3000.devel/11768 (there is probably an NNTP gateway too) As

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Marc-Andre Lemburg
Marc-Andre Lemburg <[EMAIL PROTECTED]> added the comment: Regarding benchmarks: It's difficult to come up with decent benchmarks for things like this. A possible strategy is to use an instrumented interpreter that records which Unicode objects are created and when they are deleted. If you then ru

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Antoine Pitrou
Antoine Pitrou <[EMAIL PROTECTED]> added the comment: Hi, Marc-André, I'm all for "real-life" benchmarks if someone proposes some. Until that we have to live with micro-benchmarks, which seems to be the method used for other CPython optimizations by the way. You are talking about slicing optimi

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Marc-Andre Lemburg
Marc-Andre Lemburg <[EMAIL PROTECTED]> added the comment: Yes, all those objections apply to the string type as well. The fact that strings are variable length objects makes it impossible to do apply any of the possible optimizations I mentioned. If strings were a fixed length object, it would ha

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc <[EMAIL PROTECTED]> added the comment: Marc-Andre: don't all your objections also apply to the 8bit string type, which is already a variable-size structure? Is extending unicode more common than extending str? With python 3.0, all strings are unicode. Shouldn't this type be

[issue1943] improved allocation of PyUnicode objects

2008-03-20 Thread Marc-Andre Lemburg
Marc-Andre Lemburg <[EMAIL PROTECTED]> added the comment: Antoine, as I've already mentioned in my other comments, I'm -1 on changing the Unicode object to a variable size object. I also don't think that the micro-benchmarks you are applying really do test the implementation in a real-life situa

[issue1943] improved allocation of PyUnicode objects

2008-03-19 Thread Antoine Pitrou
Antoine Pitrou <[EMAIL PROTECTED]> added the comment: Thanks for your interest Sean :) By the way, on python-3000 both GvR and Martin von Löwis were ok on the proposed design change, although they did not review the patch itself. http://mail.python.org/pipermail/python-3000/2008-February/012076.h

[issue1943] improved allocation of PyUnicode objects

2008-03-19 Thread Sean Reifschneider
Sean Reifschneider <[EMAIL PROTECTED]> added the comment: Marc-Andre: Wit the udpated patches, is this a set of patches we can accept? -- assignee: -> lemburg keywords: +patch nosy: +jafo priority: -> normal __ Tracker <[EMAIL PROTECTED]>

[issue1943] improved allocation of PyUnicode objects

2008-02-16 Thread Antoine Pitrou
Antoine Pitrou added the comment: Here is an updated patch, to comply with the introduction of the PyUnicode_ClearFreeList() function. Added file: http://bugs.python.org/file9441/unialloc3.patch __ Tracker <[EMAIL PROTECTED]> __

[issue1943] improved allocation of PyUnicode objects

2008-02-12 Thread Antoine Pitrou
Antoine Pitrou added the comment: Here is an updated patch against the current py3k branch, and with spaces instead of tabs for indentation. Added file: http://bugs.python.org/file9419/unialloc2.patch __ Tracker <[EMAIL PROTECTED]>

[issue1943] improved allocation of PyUnicode objects

2008-01-30 Thread Antoine Pitrou
Antoine Pitrou added the comment: After some more tests I must qualify what I said. The freelist patch is an improvement in some situations. In others it does not really have any impact. On the other hand, the PyVarObject version handles memory-bound cases dramatically better, see below. With a

[issue1943] improved allocation of PyUnicode objects

2008-01-30 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Yes, definitely. Some comments on style in your first patch: * please use unicode->length instead of the macro LENGTH you added * indents in unicodeobject.c are 4 spaces * line length should stay below 80 __ Tracker <[EMA

[issue1943] improved allocation of PyUnicode objects

2008-01-29 Thread Antoine Pitrou
Antoine Pitrou added the comment: FWIW, I tried using the freelist scheme introduced in my patch without making PyUnicode a PyVarObject and, although it's better than the standard version, it's still not as good as the PyVarObject version. Would you be interested in that patch? _

[issue1943] improved allocation of PyUnicode objects

2008-01-27 Thread Antoine Pitrou
Antoine Pitrou added the comment: I know it's not the place to discuss #1629305, but the join() solution is not always faster. Why? Because 1) there's the list contruction and method call overhead 2) ceval.c has some bytecode hackery to try and make plain concatenations somewhat less slow. As fo

  1   2   >