-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Mike Coleman wrote:
I guess if ints are 12 bytes (per Beazley's book, but not sure if that
still holds), then that would correspond to a 1GB reduction.
Python 2.6.1 (r261:67515, Dec 11 2008, 20:28:07)
[GCC 4.2.3] on sunos5
Type help, copyright,
Jesus Cea jcea at jcea.es writes:
Mike Coleman wrote:
I guess if ints are 12 bytes (per Beazley's book, but not sure if that
still holds), then that would correspond to a 1GB reduction.
Python 2.6.1 (r261:67515, Dec 11 2008, 20:28:07)
[GCC 4.2.3] on sunos5
Type help, copyright, credits
M.-A. Lemburg wrote:
On 2008-12-22 22:45, Steven D'Aprano wrote:
This behaviour appears to be specific to deleting dicts, not deleting
random objects. I haven't yet confirmed that the problem still exists
in trunk (I hope to have time tonight or tomorrow), but in my previous
tests deleting
On Sat, Dec 20, 2008 at 6:22 PM, Mike Coleman tutu...@gmail.com wrote:
Re held and intern_it: Haha! That's evil and extremely evil,
respectively. :-)
P.S. I tried the held idea out (interning integers in a list), and
unfortunately it didn't make that much difference. In the example I
Pitrou
Cc: python-dev@python.org
Subject: Re: [Python-Dev] extremely slow exit for program having huge (45G)
dict (python 2.5.2)
Allocation of a new pool would have to do a linear search in these
pointers (finding the arena with the least number of pools);
You mean the least number of free
I'd like to suggest here, if you are giving this code a facelift,
that on Windows you use VirtualAlloc and friends to allocate the
arenas. This gives you the most direct access to the VM manager and
makes sure that a release arena is immediately availible to the rest
of the system. It also
On Sun, 21 Dec 2008 06:45:11 am Antoine Pitrou wrote:
Steven D'Aprano steve at pearwood.info writes:
In November 2007, a similar problem was reported on the
comp.lang.python newsgroup. 370MB was large enough to demonstrate
the problem. I don't know if a bug was ever reported.
Do you still
On 2008-12-20 23:16, Martin v. Löwis wrote:
I will try next week to see if I can come up with a smaller,
submittable example. Thanks.
These long exit times are usually caused by the garbage collection
of objects. This can be a very time consuming task.
I doubt that. The long exit times are
Thanks for all of the useful suggestions. Here are some preliminary results.
With still gc.disable(), at the end of the program I first did a
gc.collect(), which took about five minutes. (So, reason enough not
to gc.enable(), at least without Antoine's patch.)
After that, I did a .clear() on
On Mon, Dec 22, 2008 at 6:20 AM, M.-A. Lemburg m...@egenix.com wrote:
BTW: Rather than using a huge in-memory dict, I'd suggest to either
use an on-disk dictionary such as the ones found in mxBeeBase or
a database.
I really want this to work in-memory. I have 64G RAM, and I'm only
trying to
On Mon, Dec 22, 2008 at 11:01 AM, Mike Coleman tutu...@gmail.com wrote:
Thanks for all of the useful suggestions. Here are some preliminary results.
With still gc.disable(), at the end of the program I first did a
gc.collect(), which took about five minutes. (So, reason enough not
to
Or perhaps there's a smarter way to manage the list of
arena/free pool info.
If that code is the real problem (in a reproducible test case),
then this approach is the only acceptable solution. Disabling
long-running code is not acceptable.
Regards,
Martin
On Mon, Dec 22, 2008 at 2:38 PM, Martin v. Löwis mar...@v.loewis.de wrote:
Or perhaps there's a smarter way to manage the list of
arena/free pool info.
If that code is the real problem (in a reproducible test case),
then this approach is the only acceptable solution. Disabling
long-running
On Dec 22, 2008, at 1:13 PM, Mike Coleman wrote:
On Mon, Dec 22, 2008 at 6:20 AM, M.-A. Lemburg m...@egenix.com wrote:
BTW: Rather than using a huge in-memory dict, I'd suggest to either
use an on-disk dictionary such as the ones found in mxBeeBase or
a database.
I really want this to work
On 2008-12-22 19:13, Mike Coleman wrote:
On Mon, Dec 22, 2008 at 6:20 AM, M.-A. Lemburg m...@egenix.com wrote:
BTW: Rather than using a huge in-memory dict, I'd suggest to either
use an on-disk dictionary such as the ones found in mxBeeBase or
a database.
I really want this to work
If that code is the real problem (in a reproducible test case),
then this approach is the only acceptable solution. Disabling
long-running code is not acceptable.
By disabling, I meant disabling the optimization that's trying to
rearrange the arenas so that more memory can be returned to
On Dec 22, 2008, at 4:07 PM, M.-A. Lemburg wrote:
What kinds of objects are you storing in your dictionary ? Python
instances, strings, integers ?
Answered in a previous message:
On Dec 20, 2008, at 8:09 PM, Mike Coleman wrote:
The dict keys were all uppercase alpha strings of length 7. I
On Mon, 22 Dec 2008 11:20:59 pm M.-A. Lemburg wrote:
On 2008-12-20 23:16, Martin v. Löwis wrote:
I will try next week to see if I can come up with a smaller,
submittable example. Thanks.
These long exit times are usually caused by the garbage collection
of objects. This can be a very
Investigating further, from one stop, I used gdb to follow the chain
of pointers in the nextarena and prevarena directions. There were
5449 and 112765 links, respectively. maxarenas is 131072.
To reduce the time for keeping sorted lists of arenas, I was first
thinking of a binheap. I had
Martin v. Löwis martin at v.loewis.de writes:
It then occurred that there are only 64 different values for nfreepools,
as ARENA_SIZE is 256kiB, and POOL_SIZE is 4kiB. So rather than keeping
the list sorted, I now propose to maintain 64 lists, accessible in
an array double-linked lists
Allocation of a new pool would have to do a linear search in these
pointers (finding the arena with the least number of pools);
You mean the least number of free pools, right?
Correct.
IIUC, the heuristic is to favour
a small number of busy arenas rather than a lot of sparse ones.
On Mon, Dec 22, 2008 at 2:54 PM, Ivan Krstić
krs...@solarsail.hcs.harvard.edu wrote:
It's still not clear to me, from reading the whole thread, precisely what
you're seeing. A self-contained test case, preferably with generated random
data, would be great, and save everyone a lot of
On Mon, Dec 22, 2008 at 2:22 PM, Adam Olsen rha...@gmail.com wrote:
To make sure that's the correct line please recompile python without
optimizations. GCC happily reorders and merges different parts of a
function.
Adding a counter in C and recompiling would be a lot faster than using
a gdb
On Dec 22, 2008, at 6:28 PM, Mike Coleman wrote:
For (2), yes, 100% CPU usage.
100% _user_ CPU usage? (I'm trying to make sure we're not chasing some
particular degeneration of kmalloc/vmalloc and friends.)
--
Ivan Krstić krs...@solarsail.hcs.harvard.edu | http://radian.org
Now, we should find a way to benchmark this without having to steal Mike's
machine and wait 30 minutes every time.
So, I seem to reproduce it. The following script takes about 15 seconds to
run and allocates a 2 GB dict which it deletes at the end (gc disabled of
course).
With 2.4, deleting
Steven D'Aprano wrote:
This behaviour appears to be specific to deleting dicts, not deleting
random objects. I haven't yet confirmed that the problem still exists
in trunk (I hope to have time tonight or tomorrow), but in my previous
tests deleting millions of items stored in a list of
2008/12/22 Ivan Krstić krs...@solarsail.hcs.harvard.edu:
On Dec 22, 2008, at 6:28 PM, Mike Coleman wrote:
For (2), yes, 100% CPU usage.
100% _user_ CPU usage? (I'm trying to make sure we're not chasing some
particular degeneration of kmalloc/vmalloc and friends.)
Yes, user. No noticeable
I unfortunately don't have time to work out how obmalloc works myself,
but I wonder if any of the constants in that file might need to scale
somehow with memory size. That is, is it possible that some of them
that work okay with 1G RAM won't work well with (say) 128G or 1024G
(coming soon
Mike Coleman wrote:
If you plot this, it is clearly quadratic (or worse).
Here's another comparison script that tries to probe the vagaries of the
obmalloc implementation. It looks at the proportional increases in
deallocation times for lists and dicts as the number of contained items
increases
On Mon, Dec 22, 2008 at 7:34 PM, Antoine Pitrou solip...@pitrou.net wrote:
Now, we should find a way to benchmark this without having to steal Mike's
machine and wait 30 minutes every time.
So, I seem to reproduce it. The following script takes about 15 seconds to
run and allocates a 2 GB
It is likely that PyMalloc would be better with a way to disable the
free()ing of empty arenas, or move to an arrangement where (like the
various type free-lists in 2.6+) explicit action can force pruning of
empty arenas - there are other usage patterns than yours which would
benefit
On Sat, Dec 20, 2008 at 6:09 PM, Mike Coleman tutu...@gmail.com wrote:
On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti
Have you seen any significant difference in the exit time when the
cyclic GC is disabled or enabled?
Unfortunately, with GC enabled, the application is too slow to be
-bounces+kristjan=ccpgames@python.org
[mailto:python-dev-bounces+kristjan=ccpgames@python.org] On Behalf Of Mike
Coleman
Sent: 19. desember 2008 23:30
To: python-dev@python.org
Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict
(python 2.5.2)
I have a program
Mike Coleman wrote:
I have a program that creates a huge (45GB) defaultdict. (The keys
are short strings, the values are short lists of pairs (string, int).)
Nothing but possibly the strings and ints is shared.
The program takes around 10 minutes to run, but longer than 20 minutes
to exit (I
Andrew MacIntyre wrote:
Mike Coleman wrote:
I have a program that creates a huge (45GB) defaultdict. (The keys
are short strings, the values are short lists of pairs (string, int).)
Nothing but possibly the strings and ints is shared.
The program takes around 10 minutes to run, but longer
Steve Unfortunately there are doubtless programs out there that do rely
Steve on actions being taken at shutdown.
Indeed. I believe any code which calls atexit.register.
Steve Maybe os.exit() could be more widely advertised, though ...
That would be os._exit(). Calling it avoids
Andrew, this is on an (intel) x86_64 box with 64GB of RAM. I don't
recall the maker or details of the architecture off the top of my
head, but it would be something off the rack from Dell or maybe HP.
There were other users on the box at the time, but nothing heavy or
that gave me any reason to
Mike Coleman wrote:
... Regarding interning, I thought this only worked with strings.
Is there some way to intern integers? I'm probably creating 300M
integers more or less uniformly distributed across range(1)?
held = list(range(1))
...
troublesome_dict[string] =
for program having huge (45G)
dict (python 2.5.2)
I'm not sure exactly how to attack this. Callgrind is cool, but no
way will work on something this size. Timed ltrace output might be
interesting. Or maybe a gprof'ed Python, though that's more work
On 2008-12-20 17:57, Mike Coleman wrote:
On Sat, Dec 20, 2008 at 4:02 AM, Kristján Valur Jónsson
krist...@ccpgames.com wrote:
Can you distill the program into something reproducible?
Maybe with something slightly less than 45Gb but still exhibiting some
degradation of exit performance?
I
On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg m...@egenix.com wrote:
These long exit times are usually caused by the garbage collection
of objects. This can be a very time consuming task.
In that case, the question would be why is the interpreter collecting
garbage when it knows we're trying
Leif Walsh wrote:
On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg m...@egenix.com wrote:
These long exit times are usually caused by the garbage collection
of objects. This can be a very time consuming task.
In that case, the question would be why is the interpreter collecting
garbage
Leif In that case, the question would be why is the interpreter
Leif collecting garbage when it knows we're trying to exit anyway?.
Because useful side effects are sometimes performed as a result of this
activity (flushing disk buffers, closing database connections, etc).
Skip
[M.-A. Lemburg]
These long exit times are usually caused by the garbage collection
of objects. This can be a very time consuming task.
[Leif Walsh]
In that case, the question would be why is the interpreter collecting
garbage when it knows we're trying to exit anyway?.
Because user-defined
On 2008-12-20 21:20, Leif Walsh wrote:
On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg m...@egenix.com wrote:
These long exit times are usually caused by the garbage collection
of objects. This can be a very time consuming task.
In that case, the question would be why is the interpreter
(@Skip, Michael, Tim)
On Sat, Dec 20, 2008 at 3:26 PM, s...@pobox.com wrote:
Because useful side effects are sometimes performed as a result of this
activity (flushing disk buffers, closing database connections, etc).
Of course they are. But what about the case given above:
On Sat, Dec 20,
[Mike Coleman]
... Regarding interning, I thought this only worked with strings.
Implementation details. Recent versions of CPython also, e.g.,
intern the empty tuple, and very small integers.
Is there some way to intern integers? I'm probably creating 300M
integers more or less uniformly
[Leif Walsh]
...
It might be a semantic change that I'm looking for here, but it seems
to me that if you turn off the garbage collector, you should be able
to expect that either it also won't run on exit,
It won't then, but the garbage collector is the gc module, and that
only performs
On Sat, Dec 20, 2008 at 2:50 PM, M.-A. Lemburg m...@egenix.com wrote:
If you want a really fast exit, try this:
import os
os.kill(os.getpid(), 9)
But you better know what you're doing if you take this approach...
This would work, but I think os._exit(EX_OK) is probably just as fast,
and
I will try next week to see if I can come up with a smaller,
submittable example. Thanks.
These long exit times are usually caused by the garbage collection
of objects. This can be a very time consuming task.
I doubt that. The long exit times are usually caused by a bad
malloc
Antoine Pitrou wrote:
Leif Walsh leif.walsh at gmail.com writes:
It might be a semantic change that I'm looking for here, but it seems
to me that if you turn off the garbage collector, you should be able
to expect that either it also won't run on exit, or it should have a
way of letting you
Steve Holden steve at holdenweb.com writes:
I believe the OP engendered a certain amount of confusion by describing
object deallocation as being performed by the garbage collector. So he
perhaps didn't understand that even decref'ing all the objects only
referenced by the dict will take a huge
On Fri, Dec 19, 2008 at 6:29 PM, Mike Coleman tutu...@gmail.com wrote:
I have a program that creates a huge (45GB) defaultdict. (The keys
are short strings, the values are short lists of pairs (string, int).)
Nothing but possibly the strings and ints is shared.
That is, after executing
[Sorry, for the previous garbage post.]
On Fri, Dec 19, 2008 at 6:29 PM, Mike Coleman tutu...@gmail.com wrote:
I have a program that creates a huge (45GB) defaultdict. (The keys
are short strings, the values are short lists of pairs (string, int).)
Nothing but possibly the strings and ints
Tim Peters wrote:
If that is the case here, there's no evident general solution. If you
have millions of objects still alive at exit, refcount-based
reclamation has to visit all of them, and if they've been swapped out
to disk it can take a very long time to swap them all back into memory
s...@pobox.com wrote:
Steve Unfortunately there are doubtless programs out there that do rely
Steve on actions being taken at shutdown.
Indeed. I believe any code which calls atexit.register.
Steve Maybe os.exit() could be more widely advertised, though ...
That would be
Tim, I left out some details that I believe probably rule out the
swapped out theory. The machine in question has 64GB RAM, but only
16GB swap. I'd prefer more swap, but in any case only around ~400MB
of the swap was actually in use during my program's entire run.
Furthermore, during my
Re held and intern_it: Haha! That's evil and extremely evil,
respectively. :-)
I will add these to the Python wiki if they're not already there...
Mike
___
Python-Dev mailing list
Python-Dev@python.org
On Sat, Dec 20, 2008 at 4:11 PM, Tim Peters tim.pet...@gmail.com wrote:
[Lots of answers]
Thanks. Wish I could have offered something useful.
--
Cheers,
Leif
___
Python-Dev mailing list
Python-Dev@python.org
On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti
alexan...@peadrop.com wrote:
Could you give us more information about the dictionary. For example,
how many objects does it contain? Is 45GB the actual size of the
dictionary or of the Python process?
The 45G was the VM size of the process
Mike Coleman wrote:
Andrew, this is on an (intel) x86_64 box with 64GB of RAM. I don't
recall the maker or details of the architecture off the top of my
head, but it would be something off the rack from Dell or maybe HP.
There were other users on the box at the time, but nothing heavy or
that
I have a program that creates a huge (45GB) defaultdict. (The keys
are short strings, the values are short lists of pairs (string, int).)
Nothing but possibly the strings and ints is shared.
The program takes around 10 minutes to run, but longer than 20 minutes
to exit (I gave up at that
62 matches
Mail list logo