Hi,
You're really asking about optimisation incidentally. On Friday 10 August 2007 10:54, Jaggo wrote: > Hello! > > I desperately need a simple way to compare whether any item of SmallList is > in BigList. A simple way: True in [x in BigList for x in SmallList] Not efficient necessarily, but it is a simple way. Sounds like you want a faster way. > My current way, > > def IsAPartOfList(SmallList,BigList) > for item in SmallList: > if item in BigList: > return True > return False This is directly equivalent to my approach above BUT I *expect* your approach will run faster. *Since you short circuit the result by returning true. > Takes up waay too much time to process. > Can anyone think of any better way? So what you're really after is a quicker way, yes? The problem you have here is worst case IsAPartOfList will run through all the elements of BigList a number of times (specifically len(SmallList). Sorting the two lists and then working through may be quicker, but I'm struck that the cost of sorting BigList itself may be more costly than you want given you say the above function you wrote (which is pretty efficient) is too slow. Two immediate thoughts - use psyco on the function. It's a simple enough function and psyco may be able to give you a dramatic speedup. Alternatively you could reverse your loop. Rather than do SmallList * BigList number of comparisons, if there are duplicates in BigList (rather than being a set), then you could iterate through BigList more efficiently. If I read your spec correctly, then the following holds true even though its the inverse of what you stated as the probem: def IsAPartOfList_2(SmallList,BigList) for item in BigList: if item in SmallList: return True return False Incidentally this simple reversal may in itself be quicker depending on your data. If the numbers are generated from (say) a list of events and you're correlating events, then you may find this quicker: def IsAPartOfList_3(SmallList,BigList) for item in reversed(BigList): if item in SmallList: return True return False Since that exploits domain knowledge about the two lists. This becomes especially likely if the lists are likely to be already sorted because they've come from some ordered source (eg a server log). Furthermore if BigList is a list of times all in ascending order, and SmallList is a list of times in ascending order, and the latter represents a list of "recent" events and BigList represents a list of all events, then you can make another optimsation to take advantage of that knowledge: def IsAPartOfList_4(SmallList,BigList) for item in reversed(BigList): if item in reversed(SmallList): return True return False Why? Consider the following two lists: BigList: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39] SmallList: [33, 34, 35, 36] Then IsAPartOfList_4 will make 13 comparisons before returning True. Whereas IsAPartOfList_2 will make 133 comparisons, and your original will make 34 comparisons. However, if you have duplicates in BigList, we can skip the "if item in SmallList" every time we have a duplicate: def IsAPartOfList(SmallList,BigList) seen = {} for item in BigList: if not seen.get(item, False): if item in SmallList: return True return False This will only give you a potential speedup IF two things hold: * BigList contains duplicates * seen.get(item, False) is quicker IF these things don't hold then you won't see any improvement. Personally I'd try using psyco on the function since then you leave the function looking clean and simple. Which is things is quickest will depend heavily on the likely structure of the data, likelihood of duplicate, ordering and likely similarities between data. Overall though you have two choices: * Exploit your knowledge of the distribution and ordering of values * Use psyco These aren't mutually exclusive options :-) Michael. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor