Re: Best way to handle large lists?

2006-10-04 Thread durumdara
Hi ! Thanks Jeremy. I am in the process of converting my stuff to use sets! I wouldn't have thought it would have made that big a deal! I guess it is live and learn. If you have simplified records with big amount of data, you can trying dbhash. With this you don't get out from memory...

Re: Best way to handle large lists?

2006-10-04 Thread GHUM
Maybe the application should use sets instead of lists for these collections. What would sets do for me over lists? searching for an element in a list is O(n) searching for an element in a set is O(1) (for reasonable distributed elements) Harald --

Re: Best way to handle large lists?

2006-10-04 Thread Hari Sekhon
So are you saying that using a dict means a faster search since you only need to look up one value? I would think that you would have to look through the keys and stop at the first key that matches since each key has to be uniq, so perhaps if it is nearer the front of the set of keys then

Re: Best way to handle large lists?

2006-10-04 Thread Fredrik Lundh
Hari Sekhon wrote: So are you saying that using a dict means a faster search since you only need to look up one value? I would think that you would have to look through the keys and stop at the first key that matches since each key has to be uniq, so perhaps if it is nearer the front of the

Best way to handle large lists?

2006-10-03 Thread Chaz Ginger
I have a system that has a few lists that are very large (thousands or tens of thousands of entries) and some that are rather small. Many times I have to produce the difference between a large list and a small one, without destroying the integrity of either list. I was wondering if anyone has any

Re: Best way to handle large lists?

2006-10-03 Thread Duncan Booth
Chaz Ginger [EMAIL PROTECTED] wrote: I have a system that has a few lists that are very large (thousands or tens of thousands of entries) and some that are rather small. Many times I have to produce the difference between a large list and a small one, without destroying the integrity of

Re: Best way to handle large lists?

2006-10-03 Thread Paul Rubin
Chaz Ginger [EMAIL PROTECTED] writes: I have a system that has a few lists that are very large (thousands or tens of thousands of entries) and some that are rather small. Many times I have to produce the difference between a large list and a small one, without destroying the integrity of

Re: Best way to handle large lists?

2006-10-03 Thread durumdara
Chaz Ginger írta: I have a system that has a few lists that are very large (thousands or tens of thousands of entries) and some that are rather small. Many times I have to produce the difference between a large list and a small one, without destroying the integrity of either list. I was

Re: Best way to handle large lists?

2006-10-03 Thread Bill Williams
I don't know enough about Python internals, but the suggested solutions all seem to involve scanning bigList. Can this presumably linear operation be avoided by using dict or similar to find all occurrences of smallist items in biglist and then deleting those occurrences? Bill Williams In

Re: Best way to handle large lists?

2006-10-03 Thread Hari Sekhon
I don't know much about the python internals either, so this may be the blind leading the blind, but aren't dicts much slower to work with than lists and therefore wouldn't your suggestion to use dicts be much slower? I think it's something to do with the comparative overhead of using keys in

Re: Best way to handle large lists?

2006-10-03 Thread Sybren Stuvel
Bill Williams enlightened us with: I don't know enough about Python internals, but the suggested solutions all seem to involve scanning bigList. Can this presumably linear operation be avoided by using dict or similar to find all occurrences of smallist items in biglist and then deleting those

Re: Best way to handle large lists?

2006-10-03 Thread Chaz Ginger
I've done that and decided that Python's 'list comprehension' isn't a way to go. I was hoping that perhaps someone had some experience with some C or C++ library that has a Python interface that would make a difference. Chaz Sybren Stuvel wrote: Bill Williams enlightened us with: I don't know

Re: Best way to handle large lists?

2006-10-03 Thread Paul Rubin
Sybren Stuvel [EMAIL PROTECTED] writes: I don't know enough about Python internals, but the suggested solutions all seem to involve scanning bigList. Can this presumably linear operation be avoided by using dict or similar to find all occurrences of smallist items in biglist and then

Re: Best way to handle large lists?

2006-10-03 Thread Chaz Ginger
Paul Rubin wrote: Sybren Stuvel [EMAIL PROTECTED] writes: I don't know enough about Python internals, but the suggested solutions all seem to involve scanning bigList. Can this presumably linear operation be avoided by using dict or similar to find all occurrences of smallist items in biglist

Re: Best way to handle large lists?

2006-10-03 Thread Larry Bates
Chaz Ginger wrote: I have a system that has a few lists that are very large (thousands or tens of thousands of entries) and some that are rather small. Many times I have to produce the difference between a large list and a small one, without destroying the integrity of either list. I was

Re: Best way to handle large lists?

2006-10-03 Thread Jeremy Sanders
Chaz Ginger wrote: What would sets do for me over lists? It's faster to tell whether something is in a set or dict than in a list (for some minimum size). Jeremy -- Jeremy Sanders http://www.jeremysanders.net/ -- http://mail.python.org/mailman/listinfo/python-list

Re: Best way to handle large lists?

2006-10-03 Thread Chaz Ginger
Larry Bates wrote: Chaz Ginger wrote: I have a system that has a few lists that are very large (thousands or tens of thousands of entries) and some that are rather small. Many times I have to produce the difference between a large list and a small one, without destroying the integrity of

Re: Best way to handle large lists?

2006-10-03 Thread Richard Brodie
Chaz Ginger [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Each item in the list is a fully qualified domain name, e.g. foo.bar.com. The order in the list has no importance. So you don't actually need to use lists at all, then. You can just use sets and write: newSet = bigSet -

Re: Best way to handle large lists?

2006-10-03 Thread Hari Sekhon
Jeremy Sanders wrote: Chaz Ginger wrote: What would sets do for me over lists? It's faster to tell whether something is in a set or dict than in a list (for some minimum size). Jeremy That is surprising since I read on this list recently that lists were faster

Re: Best way to handle large lists?

2006-10-03 Thread Jeremy Sanders
Jeremy Sanders wrote: Chaz Ginger wrote: What would sets do for me over lists? It's faster to tell whether something is in a set or dict than in a list (for some minimum size). As a footnote, this program import random num = 10 a = set( range(num) ) for i in range(10): x =

Re: Best way to handle large lists?

2006-10-03 Thread Chaz Ginger
Jeremy Sanders wrote: Jeremy Sanders wrote: Chaz Ginger wrote: What would sets do for me over lists? It's faster to tell whether something is in a set or dict than in a list (for some minimum size). As a footnote, this program import random num = 10 a = set( range(num) ) for

Re: Best way to handle large lists?

2006-10-03 Thread Fredrik Lundh
Hari Sekhon wrote: That is surprising since I read on this list recently that lists were faster than dicts depends on what you're doing with them, of course. It was one reason that was cited as to why local vars are better than global vars. L[int] is indeed a bit faster than D[string]