Jesus Cea jcea.es> writes:
>
> Mike Coleman wrote:
> > I guess if ints are 12 bytes (per Beazley's book, but not sure if that
> > still holds), then that would correspond to a 1GB reduction.
>
> Python 2.6.1 (r261:67515, Dec 11 2008, 20:28:07)
> [GCC 4.2.3] on sunos5
> Type "help", "copyright",
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Mike Coleman wrote:
> I guess if ints are 12 bytes (per Beazley's book, but not sure if that
> still holds), then that would correspond to a 1GB reduction.
Python 2.6.1 (r261:67515, Dec 11 2008, 20:28:07)
[GCC 4.2.3] on sunos5
Type "help", "copyright"
On Sun, 21 Dec 2008 06:45:11 am Antoine Pitrou wrote:
> Steven D'Aprano pearwood.info> writes:
> > In November 2007, a similar problem was reported on the
> > comp.lang.python newsgroup. 370MB was large enough to demonstrate
> > the problem. I don't know if a bug was ever reported.
>
> Do you stil
> I'd like to suggest here, if you are giving this code a facelift,
> that on Windows you use VirtualAlloc and friends to allocate the
> arenas. This gives you the most direct access to the VM manager and
> makes sure that a release arena is immediately availible to the rest
> of the system. It a
08 22:56
To: Antoine Pitrou
Cc: python-dev@python.org
Subject: Re: [Python-Dev] extremely slow exit for program having huge (45G)
dict (python 2.5.2)
>> Allocation of a new pool would have to do a linear search in these
>> pointers (finding the arena with the least number of pools);
&
On Sat, Dec 20, 2008 at 6:22 PM, Mike Coleman wrote:
> Re "held" and "intern_it": Haha! That's evil and extremely evil,
> respectively. :-)
P.S. I tried the "held" idea out (interning integers in a list), and
unfortunately it didn't make that much difference. In the example I
tried, there we
M.-A. Lemburg wrote:
> On 2008-12-22 22:45, Steven D'Aprano wrote:
>> This behaviour appears to be specific to deleting dicts, not deleting
>> random objects. I haven't yet confirmed that the problem still exists
>> in trunk (I hope to have time tonight or tomorrow), but in my previous
>> tests
On 2008-12-22 22:45, Steven D'Aprano wrote:
> On Mon, 22 Dec 2008 11:20:59 pm M.-A. Lemburg wrote:
>> On 2008-12-20 23:16, Martin v. Löwis wrote:
> I will try next week to see if I can come up with a smaller,
> submittable example. Thanks.
These long exit times are usually caused by t
On Mon, Dec 22, 2008 at 7:34 PM, Antoine Pitrou wrote:
>
>> Now, we should find a way to benchmark this without having to steal Mike's
>> machine and wait 30 minutes every time.
>
> So, I seem to reproduce it. The following script takes about 15 seconds to
> run and allocates a 2 GB dict which it
Mike Coleman wrote:
> If you plot this, it is clearly quadratic (or worse).
Here's another comparison script that tries to probe the vagaries of the
obmalloc implementation. It looks at the proportional increases in
deallocation times for lists and dicts as the number of contained items
increases
I unfortunately don't have time to work out how obmalloc works myself,
but I wonder if any of the constants in that file might need to scale
somehow with memory size. That is, is it possible that some of them
that work okay with 1G RAM won't work well with (say) 128G or 1024G
(coming soon enough)?
2008/12/22 Ivan Krstić :
> On Dec 22, 2008, at 6:28 PM, Mike Coleman wrote:
>>
>> For (2), yes, 100% CPU usage.
>
> 100% _user_ CPU usage? (I'm trying to make sure we're not chasing some
> particular degeneration of kmalloc/vmalloc and friends.)
Yes, user. No noticeable sys or wait CPU going on.
Steven D'Aprano wrote:
> This behaviour appears to be specific to deleting dicts, not deleting
> random objects. I haven't yet confirmed that the problem still exists
> in trunk (I hope to have time tonight or tomorrow), but in my previous
> tests deleting millions of items stored in a list of t
> Now, we should find a way to benchmark this without having to steal Mike's
> machine and wait 30 minutes every time.
So, I seem to reproduce it. The following script takes about 15 seconds to
run and allocates a 2 GB dict which it deletes at the end (gc disabled of
course).
With 2.4, deleting t
On Dec 22, 2008, at 6:28 PM, Mike Coleman wrote:
For (2), yes, 100% CPU usage.
100% _user_ CPU usage? (I'm trying to make sure we're not chasing some
particular degeneration of kmalloc/vmalloc and friends.)
--
Ivan Krstić | http://radian.org
___
On Mon, Dec 22, 2008 at 2:22 PM, Adam Olsen wrote:
> To make sure that's the correct line please recompile python without
> optimizations. GCC happily reorders and merges different parts of a
> function.
>
> Adding a counter in C and recompiling would be a lot faster than using
> a gdb hook.
Oka
On Mon, Dec 22, 2008 at 2:54 PM, Ivan Krstić
wrote:
> It's still not clear to me, from reading the whole thread, precisely what
> you're seeing. A self-contained test case, preferably with generated random
> data, would be great, and save everyone a lot of investigation time.
I'm still working on
>> Allocation of a new pool would have to do a linear search in these
>> pointers (finding the arena with the least number of pools);
>
> You mean the least number of free pools, right?
Correct.
> IIUC, the heuristic is to favour
> a small number of busy arenas rather than a lot of sparse ones.
Martin v. Löwis v.loewis.de> writes:
>
> It then occurred that there are only 64 different values for nfreepools,
> as ARENA_SIZE is 256kiB, and POOL_SIZE is 4kiB. So rather than keeping
> the list sorted, I now propose to maintain 64 lists, accessible in
> an array double-linked lists indexed by
> Investigating further, from one stop, I used gdb to follow the chain
> of pointers in the nextarena and prevarena directions. There were
> 5449 and 112765 links, respectively. maxarenas is 131072.
To reduce the time for keeping sorted lists of arenas, I was first
thinking of a binheap. I had f
On Mon, 22 Dec 2008 11:20:59 pm M.-A. Lemburg wrote:
> On 2008-12-20 23:16, Martin v. Löwis wrote:
> >>> I will try next week to see if I can come up with a smaller,
> >>> submittable example. Thanks.
> >>
> >> These long exit times are usually caused by the garbage collection
> >> of objects. Thi
On Dec 22, 2008, at 4:07 PM, M.-A. Lemburg wrote:
What kinds of objects are you storing in your dictionary ? Python
instances, strings, integers ?
Answered in a previous message:
On Dec 20, 2008, at 8:09 PM, Mike Coleman wrote:
The dict keys were all uppercase alpha strings of length 7. I do
>> If that code is the real problem (in a reproducible test case),
>> then this approach is the only acceptable solution. Disabling
>> long-running code is not acceptable.
>
> By "disabling", I meant disabling the optimization that's trying to
> rearrange the arenas so that more memory can be retu
On 2008-12-22 19:13, Mike Coleman wrote:
> On Mon, Dec 22, 2008 at 6:20 AM, M.-A. Lemburg wrote:
>> BTW: Rather than using a huge in-memory dict, I'd suggest to either
>> use an on-disk dictionary such as the ones found in mxBeeBase or
>> a database.
>
> I really want this to work in-memory. I h
On Dec 22, 2008, at 1:13 PM, Mike Coleman wrote:
On Mon, Dec 22, 2008 at 6:20 AM, M.-A. Lemburg wrote:
BTW: Rather than using a huge in-memory dict, I'd suggest to either
use an on-disk dictionary such as the ones found in mxBeeBase or
a database.
I really want this to work in-memory. I have
On Mon, Dec 22, 2008 at 2:38 PM, "Martin v. Löwis" wrote:
>> Or perhaps there's a smarter way to manage the list of
>> arena/free pool info.
>
> If that code is the real problem (in a reproducible test case),
> then this approach is the only acceptable solution. Disabling
> long-running code is no
> Or perhaps there's a smarter way to manage the list of
> arena/free pool info.
If that code is the real problem (in a reproducible test case),
then this approach is the only acceptable solution. Disabling
long-running code is not acceptable.
Regards,
Martin
_
On Mon, Dec 22, 2008 at 11:01 AM, Mike Coleman wrote:
> Thanks for all of the useful suggestions. Here are some preliminary results.
>
> With still gc.disable(), at the end of the program I first did a
> gc.collect(), which took about five minutes. (So, reason enough not
> to gc.enable(), at lea
On Mon, Dec 22, 2008 at 6:20 AM, M.-A. Lemburg wrote:
> BTW: Rather than using a huge in-memory dict, I'd suggest to either
> use an on-disk dictionary such as the ones found in mxBeeBase or
> a database.
I really want this to work in-memory. I have 64G RAM, and I'm only
trying to use 45G of it
Thanks for all of the useful suggestions. Here are some preliminary results.
With still gc.disable(), at the end of the program I first did a
gc.collect(), which took about five minutes. (So, reason enough not
to gc.enable(), at least without Antoine's patch.)
After that, I did a .clear() on th
On 2008-12-20 23:16, Martin v. Löwis wrote:
>>> I will try next week to see if I can come up with a smaller,
>>> submittable example. Thanks.
>> These long exit times are usually caused by the garbage collection
>> of objects. This can be a very time consuming task.
>
> I doubt that. The long exi
On Sat, Dec 20, 2008 at 6:09 PM, Mike Coleman wrote:
> On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti
>> Have you seen any significant difference in the exit time when the
>> cyclic GC is disabled or enabled?
>
> Unfortunately, with GC enabled, the application is too slow to be
> useful, be
> It is likely that PyMalloc would be better with a way to disable the
> free()ing of empty arenas, or move to an arrangement where (like the
> various type free-lists in 2.6+) explicit action can force pruning of
> empty arenas - there are other usage patterns than yours which would
> benefit (per
Mike Coleman wrote:
Andrew, this is on an (intel) x86_64 box with 64GB of RAM. I don't
recall the maker or details of the architecture off the top of my
head, but it would be something "off the rack" from Dell or maybe HP.
There were other users on the box at the time, but nothing heavy or
that
On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti
wrote:
> Could you give us more information about the dictionary. For example,
> how many objects does it contain? Is 45GB the actual size of the
> dictionary or of the Python process?
The 45G was the VM size of the process (resident size was
On Sat, Dec 20, 2008 at 4:11 PM, Tim Peters wrote:
> [Lots of answers]
Thanks. Wish I could have offered something useful.
--
Cheers,
Leif
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscrib
Re "held" and "intern_it": Haha! That's evil and extremely evil,
respectively. :-)
I will add these to the Python wiki if they're not already there...
Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/pyth
Tim, I left out some details that I believe probably rule out the
"swapped out" theory. The machine in question has 64GB RAM, but only
16GB swap. I'd prefer more swap, but in any case only around ~400MB
of the swap was actually in use during my program's entire run.
Furthermore, during my program
s...@pobox.com wrote:
>
> Steve> Unfortunately there are doubtless programs out there that do rely
> Steve> on actions being taken at shutdown.
>
> Indeed. I believe any code which calls atexit.register.
>
> Steve> Maybe os.exit() could be more widely advertised, though ...
>
> Tha
Tim Peters wrote:
> If that is the case here, there's no evident general solution. If you
> have millions of objects still alive at exit, refcount-based
> reclamation has to visit all of them, and if they've been swapped out
> to disk it can take a very long time to swap them all back into memory
[Sorry, for the previous garbage post.]
> On Fri, Dec 19, 2008 at 6:29 PM, Mike Coleman wrote:
> I have a program that creates a huge (45GB) defaultdict. (The keys
> are short strings, the values are short lists of pairs (string, int).)
> Nothing but possibly the strings and ints is shared.
Cou
On Fri, Dec 19, 2008 at 6:29 PM, Mike Coleman wrote:
> I have a program that creates a huge (45GB) defaultdict. (The keys
> are short strings, the values are short lists of pairs (string, int).)
> Nothing but possibly the strings and ints is shared.
>
> That is, after executing the final stat
Steve Holden holdenweb.com> writes:
> I believe the OP engendered a certain amount of confusion by describing
> object deallocation as being performed by the garbage collector. So he
> perhaps didn't understand that even decref'ing all the objects only
> referenced by the dict will take a huge amo
Antoine Pitrou wrote:
> Leif Walsh gmail.com> writes:
>> It might be a semantic change that I'm looking for here, but it seems
>> to me that if you turn off the garbage collector, you should be able
>> to expect that either it also won't run on exit, or it should have a
>> way of letting you tell
>> I will try next week to see if I can come up with a smaller,
>> submittable example. Thanks.
>
> These long exit times are usually caused by the garbage collection
> of objects. This can be a very time consuming task.
I doubt that. The long exit times are usually caused by a bad
malloc implem
On Sat, Dec 20, 2008 at 2:50 PM, M.-A. Lemburg wrote:
> If you want a really fast exit, try this:
>
> import os
> os.kill(os.getpid(), 9)
>
> But you better know what you're doing if you take this approach...
This would work, but I think os._exit(EX_OK) is probably just as fast,
and allows you to
[Leif Walsh]
> ...
> It might be a semantic change that I'm looking for here, but it seems
> to me that if you turn off the garbage collector, you should be able
> to expect that either it also won't run on exit,
It won't then, but "the garbage collector" is the gc module, and that
only performs /
[Mike Coleman]
>> ... Regarding interning, I thought this only worked with strings.
Implementation details. Recent versions of CPython also, e.g.,
"intern" the empty tuple, and very small integers.
>> Is there some way to intern integers? I'm probably creating 300M
>> integers more or less unif
(@Skip, Michael, Tim)
On Sat, Dec 20, 2008 at 3:26 PM, wrote:
> Because useful side effects are sometimes performed as a result of this
> activity (flushing disk buffers, closing database connections, etc).
Of course they are. But what about the case given above:
On Sat, Dec 20, 2008 at 5:55
On 2008-12-20 21:20, Leif Walsh wrote:
> On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg wrote:
>> These long exit times are usually caused by the garbage collection
>> of objects. This can be a very time consuming task.
>
> In that case, the question would be "why is the interpreter collecting
>
[M.-A. Lemburg]
>> These long exit times are usually caused by the garbage collection
>> of objects. This can be a very time consuming task.
[Leif Walsh]
> In that case, the question would be "why is the interpreter collecting
> garbage when it knows we're trying to exit anyway?".
Because user-de
Leif> In that case, the question would be "why is the interpreter
Leif> collecting garbage when it knows we're trying to exit anyway?".
Because useful side effects are sometimes performed as a result of this
activity (flushing disk buffers, closing database connections, etc).
Skip
__
Leif Walsh wrote:
On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg wrote:
These long exit times are usually caused by the garbage collection
of objects. This can be a very time consuming task.
In that case, the question would be "why is the interpreter collecting
garbage when it knows w
On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg wrote:
> These long exit times are usually caused by the garbage collection
> of objects. This can be a very time consuming task.
In that case, the question would be "why is the interpreter collecting
garbage when it knows we're trying to exit anyway
On 2008-12-20 17:57, Mike Coleman wrote:
> On Sat, Dec 20, 2008 at 4:02 AM, Kristján Valur Jónsson
> wrote:
>> Can you distill the program into something reproducible?
>> Maybe with something slightly less than 45Gb but still exhibiting some
>> degradation of exit performance?
>> I can try to poi
] extremely slow exit for program having huge (45G)
dict (python 2.5.2)
I'm not sure exactly how to attack this. Callgrind is cool, but no
way will work on something this size. Timed ltrace output might be
interesting. Or maybe a gprof'ed
Mike Coleman wrote:
... Regarding interning, I thought this only worked with strings.
Is there some way to intern integers? I'm probably creating 300M
integers more or less uniformly distributed across range(1)?
held = list(range(1))
...
troublesome_dict[string] = held[number_to_h
Andrew, this is on an (intel) x86_64 box with 64GB of RAM. I don't
recall the maker or details of the architecture off the top of my
head, but it would be something "off the rack" from Dell or maybe HP.
There were other users on the box at the time, but nothing heavy or
that gave me any reason to
On Sat, Dec 20, 2008 at 4:02 AM, Kristján Valur Jónsson
wrote:
> Can you distill the program into something reproducible?
> Maybe with something slightly less than 45Gb but still exhibiting some
> degradation of exit performance?
> I can try to point our commercial profiling tools at it and see w
Steve> Unfortunately there are doubtless programs out there that do rely
Steve> on actions being taken at shutdown.
Indeed. I believe any code which calls atexit.register.
Steve> Maybe os.exit() could be more widely advertised, though ...
That would be os._exit(). Calling it avoid
Andrew MacIntyre wrote:
> Mike Coleman wrote:
>> I have a program that creates a huge (45GB) defaultdict. (The keys
>> are short strings, the values are short lists of pairs (string, int).)
>> Nothing but possibly the strings and ints is shared.
>>
>> The program takes around 10 minutes to run, b
Mike Coleman wrote:
I have a program that creates a huge (45GB) defaultdict. (The keys
are short strings, the values are short lists of pairs (string, int).)
Nothing but possibly the strings and ints is shared.
The program takes around 10 minutes to run, but longer than 20 minutes
to exit (I g
-bounces+kristjan=ccpgames@python.org
[mailto:python-dev-bounces+kristjan=ccpgames@python.org] On Behalf Of Mike
Coleman
Sent: 19. desember 2008 23:30
To: python-dev@python.org
Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict
(python 2.5.2)
I have a program that
I have a program that creates a huge (45GB) defaultdict. (The keys
are short strings, the values are short lists of pairs (string, int).)
Nothing but possibly the strings and ints is shared.
The program takes around 10 minutes to run, but longer than 20 minutes
to exit (I gave up at that point).
64 matches
Mail list logo