Thomas Ganss wrote:
> Klaas schrieb:
>
> > 4. Insert your keys in sorted order.
> This advice is questionable -
> it depends on the at least on the db vendor and probably
> sometimes on the sort method, if inserting pre-sorted
> values is better.
The article I wrote that you quoted named a specifi
Marcin Ciura wrote:
> Duncan Booth wrote:
>> Marcin Ciura wrote:
>>>See Figure 8 in
>>>http://sun.iinf.polsl.gliwice.pl/~mciura/shellsort.pdf
>> That isn't what the reference says. It only covers N up to a few
>> thousand. Practical values of N need to at least go up into the
>> millions.
>
> Pl
Duncan Booth wrote:
> Marcin Ciura wrote:
>>See Figure 8 in
>>http://sun.iinf.polsl.gliwice.pl/~mciura/shellsort.pdf
> That isn't what the reference says. It only covers N up to a few thousand.
> Practical values of N need to at least go up into the millions.
Please look at the performance graph
Marcin Ciura wrote:
> Tim Peters wrote:
>> shellsort is much more a refinement of insertion-sort than of
>> bubblesort, but is not an O(N log N) algorithm regardlesss.
>
> With a judiciously chosen increment sequence,
> the number of comparisons made by shellsort
> *is* of the order N log N for p
Tim Peters wrote:
> shellsort is much more a refinement of insertion-sort than of
> bubblesort, but is not an O(N log N) algorithm regardlesss.
With a judiciously chosen increment sequence,
the number of comparisons made by shellsort
*is* of the order N log N for practical values of N.
See Figure
[Tim Peters]
>> ...
>> O(N log N) sorting algorithms helped by pre-existing order are
>> uncommon, unless they do extra work to detect and exploit
>> pre-existing order.
[Lawrence D'Oliveiro]
> Shellsort works well with nearly-sorted data. It's basically a smarter
> version of bubblesort with much
In article <[EMAIL PROTECTED]>,
"Iain King" <[EMAIL PROTECTED]> wrote:
>Lawrence D'Oliveiro wrote:
>> In article <[EMAIL PROTECTED]>,
>> Scott David Daniels <[EMAIL PROTECTED]> wrote:
>>
>> >For example, time timsort (Python's internal sort) on pre-sorted
>> >data; you'll find it is handled fast
In article <[EMAIL PROTECTED]>,
"Tim Peters" <[EMAIL PROTECTED]> wrote:
>For example, the O(N log N) heapsort is unreasonable compared to the
>O(N**2) bubblesort by that reasoning (pre-sorted is actually a bad
>case for heapsort, but a good case for bubblesort)? O(N log N)
>sorting algorithms he
In article <[EMAIL PROTECTED]>,
Tim Peters <[EMAIL PROTECTED]> wrote:
>[Jim Segrave]
>> Actually, presorted lists are not a bad case for heapsort - it's quite
>> immune to any existing order or lack thereof,
>
>Write a heapsort and time it. It's not a difference in O() behavior,
>but more memory m
[Jim Segrave]
> Actually, presorted lists are not a bad case for heapsort - it's quite
> immune to any existing order or lack thereof,
Write a heapsort and time it. It's not a difference in O() behavior,
but more memory movement is required for a sorted list because
transforming the list into a m
In article <[EMAIL PROTECTED]>,
Tim Peters <[EMAIL PROTECTED]> wrote:
>[Scott David Daniels]
>>> For example, time timsort (Python's internal sort) on pre-sorted
>>> data; you'll find it is handled faster than random data.
>
>O(N) vs O(N log N), in fact.
>
>[Lawrence D'Oliveiro]
>> But isn't that h
[Scott David Daniels]
>> For example, time timsort (Python's internal sort) on pre-sorted
>> data; you'll find it is handled faster than random data.
O(N) vs O(N log N), in fact.
[Lawrence D'Oliveiro]
> But isn't that how a reasonable sorting algorithm should behave? Less
> work to do if the data
Lawrence D'Oliveiro wrote:
> In article <[EMAIL PROTECTED]>,
> Scott David Daniels <[EMAIL PROTECTED]> wrote:
>
> >For example, time timsort (Python's internal sort) on pre-sorted
> >data; you'll find it is handled faster than random data.
>
> But isn't that how a reasonable sorting algorithm sho
In article <[EMAIL PROTECTED]>,
Lawrence D'Oliveiro <[EMAIL PROTECTED]> wrote:
>In article <[EMAIL PROTECTED]>,
> Scott David Daniels <[EMAIL PROTECTED]> wrote:
>>
>>For example, time timsort (Python's internal sort) on pre-sorted
>>data; you'll find it is handled faster than random data.
>
>But i
Lawrence D'Oliveiro wrote:
> In article <[EMAIL PROTECTED]>,
> Scott David Daniels <[EMAIL PROTECTED]> wrote:
>
>
>>For example, time timsort (Python's internal sort) on pre-sorted
>>data; you'll find it is handled faster than random data.
>
>
> But isn't that how a reasonable sorting algorith
In article <[EMAIL PROTECTED]>,
Scott David Daniels <[EMAIL PROTECTED]> wrote:
>For example, time timsort (Python's internal sort) on pre-sorted
>data; you'll find it is handled faster than random data.
But isn't that how a reasonable sorting algorithm should behave? Less
work to do if the data
Roy Smith wrote:
> My guess would be that each resize grows the table by a constant
> factor, which IIRC, works out to amortized O(n).
Right. For <5 entries, the size is multiplied by 4 each time,
and doubled each time for large dictionaries.
> It's not as good as
> creating the dict the rig
Thomas Ganss wrote:
> Klaas schrieb:
>> 4. Insert your keys in sorted order.
> This advice is questionable -
>
> My gut feeling on this matter is:
> IF the insert times of pre-sorted values is far better
> than the times of unsorted values, there is a chance
> that the resulting tree is unbala
Klaas schrieb:
> 4. Insert your keys in sorted order.
This advice is questionable -
it depends on the at least on the db vendor and probably
sometimes on the sort method, if inserting pre-sorted
values is better.
My gut feeling on this matter is:
IF the insert times of pre-sorted values is far be
Chris:
> class StorageBerkeleyDB(StorageTest):
>def runtest(self, number_hash):
>db = bsddb.hashopen(None, flag='c', cachesize=8192)
>for (num, wildcard_digits) in number_hash.keys():
>key = '%d:%d' % (num, wildcard_digits)
>db[key] = None
>db.clo
Chris:
> Berkeley DB is great for accessing data by key for things already
> stored on disk (i.e. read access), but write performance for each
> key-value pair is slow due to it being careful about flushing
> writes to disk by default.
This is absolutely false.
-Mike
--
http://mail.python.org/
Chris Foote wrote:
> Richie Hindle wrote:
>> [Chris]
>>> Has anyone written a fast hash module which is more optimal for
>>> large datasets ?
>>
>> PyJudy might be what you're looking for, though I've never used it:
>>
>> http://www.dalkescientific.com/Python/PyJudy.html
>>
>> "Judy's key benefit
Chris Foote wrote:
> Claudio Grondi wrote:
>> Chris Foote wrote:
>>> Klaas wrote:
>>>
> 22.2s 20m25s[3]
20m to insert 1m keys? You are doing something wrong.
>>>
>>> I've put together some simplified test code, but the bsddb
>>> module gives 11m for 1M keys:
>>>
>> I have run your c
Claudio Grondi wrote:
> Chris Foote wrote:
>> Klaas wrote:
>>
22.2s 20m25s[3]
>>>
>>> 20m to insert 1m keys? You are doing something wrong.
>>
>> I've put together some simplified test code, but the bsddb
>> module gives 11m for 1M keys:
>>
> I have run your code for the bsddb on my P4 2.8 G
Richie Hindle wrote:
> [Chris]
>> Has anyone written a fast hash module which is more optimal for
>> large datasets ?
>
> PyJudy might be what you're looking for, though I've never used it:
>
> http://www.dalkescientific.com/Python/PyJudy.html
>
> "Judy's key benefits are scalability, high per
Chris Foote wrote:
> Klaas wrote:
>
>>> 22.2s 20m25s[3]
>>
>>
>> 20m to insert 1m keys? You are doing something wrong.
>
>
> Hi Mike.
>
> I've put together some simplified test code, but the bsddb
> module gives 11m for 1M keys:
>
I have run your code for the bsddb on my P4 2.8 GHz and have
Klaas wrote:
22.2s 20m25s[3]
20m to insert 1m keys? You are doing something wrong.
Hi Mike.
I've put together some simplified test code, but the bsddb
module gives 11m for 1M keys:
Number generator test for 100 number ranges
with a maximum of 3 wildcard digits.
Wed May 17 22:1
>22.2s 20m25s[3]
20m to insert 1m keys? You are doing something wrong.
With bdb's it is crucial to insert keys in bytestring-sorted order.
Also, be sure to give it a decent amount of cache.
-Mike
--
http://mail.python.org/mailman/listinfo/python-list
Chris Foote wrote:
> Claudio Grondi wrote:
>
>> Chris Foote wrote:
>>
>>> However, please note that the Python bsddb module doesn't support
>>> in-memory based databases - note the library documentation's[1] wording:
>>>
>>> "Files never intended to be preserved on disk may be created
>>> by
Claudio Grondi wrote:
> Chris Foote wrote:
>
>> However, please note that the Python bsddb module doesn't support
>> in-memory based databases - note the library documentation's[1] wording:
>>
>> "Files never intended to be preserved on disk may be created
>> by passing None as the filena
Maric Michaud <[EMAIL PROTECTED]> wrote:
>I don't see a use case where a python programmer should need a
>dictionnary "that will be probably big" but can't predict what keys will be
>in.
I've recently been working on an app with precisely this
characteristic, although there were enough other bott
Duncan Booth wrote:
>> BTW why are python dicts implemented as hash tables and not judy
>> arrays?
>>
> Probably because:
>
> Python predates judy arrays.
> Nobody has proposed replacing the current dict implementation.
> Nobody has demonstrated that it would actually be an advantage to
> repla
[EMAIL PROTECTED] wrote:
> BTW why are python dicts implemented as hash tables and not judy
> arrays?
>
Probably because:
Python predates judy arrays.
Nobody has proposed replacing the current dict implementation.
Nobody has demonstrated that it would actually be an advantage to
replace the cur
BTW why are python dicts implemented as hash tables and not judy arrays?
--
http://mail.python.org/mailman/listinfo/python-list
Chris Foote wrote:
> Claudio Grondi wrote:
>
>> Chris Foote wrote:
>>
>>> p.s. Disk-based DBs are out of the question because most
>>> key lookups will result in a miss, and lookup time is
>>> critical for this application.
>>>
>> Python Bindings (\Python24\Lib\bsddb vers. 4.3.0) and the DLL for
John Machin wrote:
>> (my dictionary values are all None).
>
> So in fact all you need is a set. Have you experimented with the Python
> 2.5 alpha?
>
I don't think that will help him: my timings work out about the same on
2.4.2 or 2.5a2. As I pointed out, the bottleneck is creating the tuples.
Chris Foote wrote:
> lcaamano wrote:
>
>> Sounds like PyTables could be useful.
>>
>> http://www.pytables.org
>>
>
> In browsing their excellent documentation, it seems that all concepts
> are built around storing and reading HDF5 format files.
>
> Not suitable for this project unfortunate
> (my dictionary values are all None).
So in fact all you need is a set. Have you experimented with the Python
2.5 alpha?
--
http://mail.python.org/mailman/listinfo/python-list
lcaamano wrote:
> Sounds like PyTables could be useful.
>
> http://www.pytables.org
In browsing their excellent documentation, it seems that all concepts
are built around storing and reading HDF5 format files.
Not suitable for this project unfortunately.
Cheers,
Chris
--
http://mail.python.o
Paul McGuire wrote:
> "Claudio Grondi" <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]
>> Chris Foote wrote:
>>> Hi all.
>>>
>>> I have the need to store a large (10M) number of keys in a hash table,
>>> based on a tuple of (long_integer, integer). The standard python
>>> dictionary
Claudio Grondi wrote:
> Chris Foote wrote:
>> p.s. Disk-based DBs are out of the question because most
>> key lookups will result in a miss, and lookup time is
>> critical for this application.
>>
> Python Bindings (\Python24\Lib\bsddb vers. 4.3.0) and the DLL for
> BerkeleyDB (\Python24\DLLs\_bsd
Chris Foote wrote:
> > Don't forget that adding keys
>> requires resizing the dict, which is a moderately expensive operation.
>
> Yep, that's why I probably need a dictionary where I can pre-specify
> an approximate size at the time of its creation.
Try splitting the creation of the keys from t
Le Lundi 15 Mai 2006 21:07, Diez B. Roggisch a écrit :
> > d={}.fromkeys(xrange(5*10**6)) ?
>
> That is a totally different beast. You don't want to insert arbitrary
> keys, you want the internal hash-array to be of the right size.
But you can replace the xrange part by any generator function you
Aahz wrote:
> In article <[EMAIL PROTECTED]>,
> Roy Smith <[EMAIL PROTECTED]> wrote:
>> In article <[EMAIL PROTECTED]>, Chris Foote <[EMAIL PROTECTED]>
>> wrote:
>>> I have the need to store a large (10M) number of keys in a hash table,
>>> based on a tuple of (long_integer, integer). The standa
Roy Smith wrote:
> In article <[EMAIL PROTECTED]>, Chris Foote <[EMAIL PROTECTED]>
> wrote:
>
>> I have the need to store a large (10M) number of keys in a hash table,
>> based on a tuple of (long_integer, integer). The standard python
>> dictionary works well for small numbers of keys, but star
Sounds like PyTables could be useful.
http://www.pytables.org
--
lpc
--
http://mail.python.org/mailman/listinfo/python-list
1. Why is two minutes to insert 5M keys "bad" for you? What would be
"good"? What would be good/bad look-up times? Have you measured the
typical look-up time? How often is the dict creation required to be
done? How often does the data change? Is multi-user access required for
(a) look-up (b) updat
Roy Smith:
> I am befuddled as to why
> people thought creating a dict by keyword would be a useful thing to
> do, but left out (and, indeed, eliminated the chance to add the syntax
> later) the obviously useful ability to hint at the size.
Adding the keyword syntax doesn't require much memory to
Maric Michaud schrieb:
> Le Lundi 15 Mai 2006 19:24, Roy Smith a écrit :
>> d = {}
>> d.reserve (10*1000*1000)
>
> d={}.fromkeys(xrange(5*10**6)) ?
That is a totally different beast. You don't want to insert arbitrary
keys, you want the internal hash-array to be of the right size.
Diez
--
http
"Claudio Grondi" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Chris Foote wrote:
> > Hi all.
> >
> > I have the need to store a large (10M) number of keys in a hash table,
> > based on a tuple of (long_integer, integer). The standard python
> > dictionary works well for small numb
Le Lundi 15 Mai 2006 19:24, Roy Smith a écrit :
> d = {}
> d.reserve (10*1000*1000)
d={}.fromkeys(xrange(5*10**6)) ?
--
_
Maric Michaud
_
Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
--
http://mail.python.org/mailman/listinfo/python-li
Aahz <[EMAIL PROTECTED]> wrote:
> Don't forget that adding keys requires resizing the dict, which is a
> moderately expensive operation.
My guess would be that each resize grows the table by a constant
factor, which IIRC, works out to amortized O(n). It's not as good as
creating the dict the righ
In article <[EMAIL PROTECTED]>,
Roy Smith <[EMAIL PROTECTED]> wrote:
>In article <[EMAIL PROTECTED]>, Chris Foote <[EMAIL PROTECTED]>
>wrote:
>>
>> I have the need to store a large (10M) number of keys in a hash table,
>> based on a tuple of (long_integer, integer). The standard python
>> dicti
Chris Foote wrote:
> Hi all.
>
> I have the need to store a large (10M) number of keys in a hash table,
> based on a tuple of (long_integer, integer). The standard python
> dictionary works well for small numbers of keys, but starts to
> perform badly for me inserting roughly 5M keys:
>
> # keys
In article <[EMAIL PROTECTED]>, Chris Foote <[EMAIL PROTECTED]>
wrote:
> Hi all.
>
> I have the need to store a large (10M) number of keys in a hash table,
> based on a tuple of (long_integer, integer). The standard python
> dictionary works well for small numbers of keys, but starts to
> perfo
[Chris]
> Has anyone written a fast hash module which is more optimal for
> large datasets ?
PyJudy might be what you're looking for, though I've never used it:
http://www.dalkescientific.com/Python/PyJudy.html
"Judy's key benefits are scalability, high performance, and memory
efficiency. A J
56 matches
Mail list logo