[Python-ideas] Re: Set operations with Lists

2019-10-14 Thread Richard Higginbotham
Steve Jorgensen wrote: > What if, instead, there was an OrderedBag class that acts like a > list and supports set operations. > The advantage of this is that any hidden indexing that is used to optimize the > operations could be efficiently created in the object that results from the > set operati

[Python-ideas] Re: Set operations with Lists

2019-10-13 Thread Andrew Barnert via Python-ideas
On Oct 13, 2019, at 19:15, Steve Jorgensen wrote: > > What if, instead, there was an `OrderedBag` class that acts like a `list` and > supports set operations. > > The advantage of this is that any hidden indexing that is used to optimize > the operations could be efficiently created in the obj

[Python-ideas] Re: Set operations with Lists

2019-10-13 Thread Steve Jorgensen
What if, instead, there was an `OrderedBag` class that acts like a `list` and supports set operations. The advantage of this is that any hidden indexing that is used to optimize the operations could be efficiently created in the object that results from the set operation during its construction

[Python-ideas] Re: Set operations with Lists

2019-09-25 Thread Richard Higginbotham
Kyle Lahnakoski wrote: > On 2019-09-20 9:52 p.m., Richard Higginbotham wrote: > > Andrew Barnert wrote: > > set(b).intersection(a) or set(a) & set(b) or > > sb = > > set(b) then [x for x in a if x in sb] and you’re done. They can easily > > understand why it works. If they want to know why it’s fas

[Python-ideas] Re: Set operations with Lists

2019-09-25 Thread Kyle Lahnakoski
On 2019-09-20 9:52 p.m., Richard Higginbotham wrote: Andrew Barnert wrote: set(b).intersection(a) or set(a) & set(b) or sb = set(b) then [x for x in a if x in sb] and you’re done. They can easily understand why it works. If they want to know why it’s faster, you can easily explain it, and they

[Python-ideas] Re: Set operations with Lists

2019-09-23 Thread Richard Higginbotham
Hey Everybody, I do want to thank you for this souvenir. I've having it blown up and mounted on the wall. I'm going to leave some space for the benchmarks that compare my module and his with the standard set implementation. "I haven’t tested performance at all. I expect it to be slower (when

[Python-ideas] Re: Set operations with Lists

2019-09-23 Thread Richard Higginbotham
Andrew Barnert wrote: > On Sep 23, 2019, at 15:32, Richard Higginbotham higgi...@gmail.com wrote: > > Considering > > your use case however, I wonder, if you would not be better > > going with the iterator approach (as Andrew has been hinting already for > > some time in his posts). > > Richard M.

[Python-ideas] Re: Set operations with Lists

2019-09-23 Thread Andrew Barnert via Python-ideas
On Sep 23, 2019, at 15:32, Richard Higginbotham wrote: >> Considering your use case however, I wonder, if you would not be better >> going with the iterator approach (as Andrew has been hinting already for >> some time in his posts). >> Richard M. > They will most likely have good performance on

[Python-ideas] Re: Set operations with Lists

2019-09-23 Thread Richard Higginbotham
Richard Musil wrote: > On Mon, Sep 23, 2019 at 12:21 AM Richard Higginbotham higgi...@gmail.com > wrote: > > I really appreciate the time and thought you have put > > into it (and others > > here as well), and its been educational / fun for me too. One of the > > concerns I have about using timeit

[Python-ideas] Re: Set operations with Lists

2019-09-23 Thread Richard Higginbotham
Andrew Barnert wrote: > On Sep 22, 2019, at 15:28, Tim Peters tim.pet...@gmail.com wrote: > > That's not by accident - the inspiration for > > CPython's sort's basic > > "galloping" approach was taken from this paper, which wasn't about > > sorting at all: > > "Adaptive Set Intersections, Unions, a

[Python-ideas] Re: Set operations with Lists

2019-09-23 Thread Richard Higginbotham
Tim Peters wrote: > That's not by accident - the inspiration for CPython's sort's basic > "galloping" approach was taken from this paper, which wasn't about > sorting at all: > "Adaptive Set Intersections, Unions, and Differences" (2000) > Erik D. Demaine, Alejandro López-Ortiz, J. Ian Munro > >

[Python-ideas] Re: Set operations with Lists

2019-09-23 Thread Richard Musil
On Mon, Sep 23, 2019 at 12:21 AM Richard Higginbotham wrote: > I really appreciate the time and thought you have put into it (and others > here as well), and its been educational / fun for me too. One of the > concerns I have about using timeit is that it puts a lot of focus on the > exact statem

[Python-ideas] Re: Set operations with Lists

2019-09-22 Thread Andrew Barnert via Python-ideas
On Sep 22, 2019, at 15:28, Tim Peters wrote: > > > That's not by accident - the inspiration for CPython's sort's basic > "galloping" approach was taken from this paper, which wasn't about > sorting at all: > >"Adaptive Set Intersections, Unions, and Differences" (2000) >Erik D. Demaine,

[Python-ideas] Re: Set operations with Lists

2019-09-22 Thread Tim Peters
Note that CPython's sort is in the business of merging sorted (sub)lists. Any mergesort is. But CPython's adapts to take advantage, when possible, of "lumpy" distributions. For example, if you sort list(range(100, 200)) + list(range(0, 100)) it goes very fast (O(N)). Because

[Python-ideas] Re: Set operations with Lists

2019-09-22 Thread Richard Higginbotham
Richard Musil wrote: > On Sat, Sep 21, 2019 at 1:44 AM Richard Higginbotham higgi...@gmail.com > wrote: . > I wrote all this to show that without an insight it might be sometimes > difficult or impossible to do it right (I was caught myself in several > pitfalls on my original benchmarks I post

[Python-ideas] Re: Set operations with Lists

2019-09-22 Thread Richard Higginbotham
Steven D'Aprano wrote: > On Fri, Sep 20, 2019 at 11:44:17PM -, Richard Higginbotham wrote: > > Let me expand on the reasons for my post here. It > > seems like > > motivation has turned into a bit of a bike-shead. > > I don't think that anyone is questioning your motivation. (Although one > o

[Python-ideas] Re: Set operations with Lists

2019-09-22 Thread Andrew Barnert via Python-ideas
Let me go back to the top here. On Sep 18, 2019, at 12:29, Richard Higginbotham wrote: > > I have frequently come across cases where I would like to compare items in > one list in another similar to relational algebra. I’ve put together an itertools-style implementation at https://github.com/

[Python-ideas] Re: Set operations with Lists

2019-09-22 Thread Andrew Barnert via Python-ideas
On Sep 22, 2019, at 12:50, Richard Musil wrote: > >> On Sun, Sep 22, 2019 at 8:25 PM Andrew Barnert wrote: > >> One case I think you didn’t test is when the strings are generated in >> already-sorted order. > For the sake of completeness I did some benchmarks also with already sorted > stri

[Python-ideas] Re: Set operations with Lists

2019-09-22 Thread Richard Musil
On Sun, Sep 22, 2019 at 8:25 PM Andrew Barnert wrote: > One case I think you didn’t test is when the strings are generated in > already-sorted order. In that case, as opposed to the case where you > generate in random order and then sort, I think the PyUnicode objects and > the actual character a

[Python-ideas] Re: Set operations with Lists

2019-09-22 Thread Andrew Barnert via Python-ideas
On Sep 21, 2019, at 04:35, Richard Musil wrote: > > I wrote all this to show that without an insight it might be sometimes > difficult or impossible to do it right (I was caught myself in several > pitfalls on my original benchmarks I posted here) and also because it was > actually a fun to le

[Python-ideas] Re: Set operations with Lists

2019-09-21 Thread Richard Musil
On Sat, Sep 21, 2019 at 1:44 AM Richard Higginbotham wrote: > It's written in Python. It used to require containers with 100k or more > elements to see a speed improvement over some other unnamed methods. These > days its closer to 50 million with the only other contender being set (hash > based

[Python-ideas] Re: Set operations with Lists

2019-09-21 Thread Steven D'Aprano
On Fri, Sep 20, 2019 at 11:44:17PM -, Richard Higginbotham wrote: > Let me expand on the reasons for my post here. It seems like > motivation has turned into a bit of a bike-shead. I don't think that anyone is questioning your motivation. (Although one of your recent replies to Andrew seems

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Tim Peters
[Tim] >> Something I haven't seen mentioned here, but I may have missed it: >> when timing algorithms with sets and dicts, people often end up merely >> measuring the bad effects of hardware data cache misses when the >> containers get large, and especially in contrived benchmarks. >> >> In those t

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Richard Higginbotham
Andrew Barnert wrote: > On Sep 20, 2019, at 14:51, Richard Higginbotham higgi...@gmail.com wrote: > > Andrew Barnert wrote: > > What’s the goal here? > > I'm not concerned about micro op, I'm concerned with macro op on large data > > sets. > > Right now if someone comes to you with that simple use

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Richard Higginbotham
Let me expand on the reasons for my post here. It seems like motivation has turned into a bit of a bike-shead. I have an algorithm I wanted to see if the python devs would be interested in. It fulfills my use case as is. I've found it useful for different problem areas and I haven't seen anythi

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Andrew Barnert via Python-ideas
On Sep 20, 2019, at 14:51, Richard Higginbotham wrote: > > Andrew Barnert wrote: >> What’s the goal here? > > I'm not concerned about micro op, I'm concerned with macro op on large data > sets. Right now if someone comes to you with that simple use case of two > large lists you have to tell t

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Chris Angelico
On Sat, Sep 21, 2019 at 7:54 AM Richard Higginbotham wrote: > I'm not concerned about micro op, I'm concerned with macro op on large data > sets. Right now if someone comes to you with that simple use case of two > large lists you have to tell them to convert to sets and then back again. > Forg

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Richard Higginbotham
Andrew Barnert wrote: > On Sep 20, 2019, at 09:21, Richard Higginbotham higgi...@gmail.com wrote: > > Richard Musil wrote: > > On Fri, > > Sep 20, 2019 at 1:52 AM Andrew Barnert abarn...@yahoo.com wrote: > > On Sep 19, 2019, at 15:18, Richard Musil risa20...@gmail.com wrote: > > After some thought

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Richard Higginbotham
Chris Angelico wrote: > On Fri, Sep 20, 2019 at 6:31 AM Richard Higginbotham higgi...@gmail.com wrote: > > > > That's exactly what I expected. It's the opposite > > though. The time to check one list for all the elements > > Please quote some text to give context. WHAT is exactly what you expected

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Tim Peters
BTW, this thread seems a good place to mention the under-appreciated SortedContainers package from Grant Jenks: http://www.grantjenks.com/docs/sortedcontainers/ This supplies sorted containers (lists, dicts, sets) coded in pure Python, which generally run at least as fast as C extensions impl

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Andrew Barnert via Python-ideas
On Sep 20, 2019, at 09:21, Richard Higginbotham wrote: > > Richard Musil wrote: >>> On Fri, Sep 20, 2019 at 1:52 AM Andrew Barnert abarn...@yahoo.com wrote: >>> On Sep 19, 2019, at 15:18, Richard Musil risa20...@gmail.com wrote: > > After some thought I don't think the the time comparisons are u

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Andrew Barnert via Python-ideas
On Sep 19, 2019, at 20:25, Tim Peters wrote: > > Something I haven't seen mentioned here, but I may have missed it: > when timing algorithms with sets and dicts, people often end up merely > measuring the bad effects of hardware data cache misses when the > containers get large, and especially in

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Richard Higginbotham
Richard Musil wrote: > On Fri, Sep 20, 2019 at 1:52 AM Andrew Barnert abarn...@yahoo.com wrote: > > On Sep 19, 2019, at 15:18, Richard Musil risa20...@gmail.com wrote: After some thought I don't think the the time comparisons are useful unless we include the set init time. If we already have two

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Andrew Barnert via Python-ideas
On Sep 20, 2019, at 03:18, Richard Musil wrote: > > This is an interesting point, which is difficult to deduce without an > implementation insight. I would for example assume that there is a "list > construct" which holds just the references to the actual objects (strings in > our example here

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Andrew Barnert via Python-ideas
On Sep 20, 2019, at 02:41, Richard Musil wrote: > > I just added another implementation to the test (which I believe you had on > mind): > ``` > def relcomp_set_list(a, b): > bset = set(b) > return [ia for ia in a if ia not in bset] Or just `set(b).intersection(a)` and maybe checking th

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Richard Higginbotham
Tim Peters wrote: > RE [Richard Higginbotham higgi...@gmail.com] .. > Traversing via a set or dict instead reads the objects' data in > pseudo-random memory order instead, which can give a very high rate of > cache misses. In addition, the internal hash table probes also occur > in pseudo-rand

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Richard Higginbotham
Andrew Barnert wrote: > On Sep 19, 2019, at 19:18, Richard Higginbotham higgi...@gmail.com wrote: > > It's not really constant though. > > It’s really hard to have a discussion when all of your posts are all > > replies, but > you don’t give us any clue what you’re replying to. There are multiple

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Rhodri James
You know, if anyone wanted a good example of why Computer Science courses should carry on teaching about sorting algorithms despite them being already implemented for you these days, this thread is it. Very nicely explained, folks! -- Rhodri James *-* Kynesim Ltd _

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Richard Musil
> > On Thu, Sep 19, 2019 at 10:25:20PM -0500, Tim Peters wrote: > [...] > > Something I haven't seen mentioned here, but I may have missed it: > > when timing algorithms with sets and dicts, people often end up merely > > measuring the bad effects of hardware data cache misses when the > > containe

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Richard Musil
On Fri, Sep 20, 2019 at 1:52 AM Andrew Barnert wrote: > On Sep 19, 2019, at 15:18, Richard Musil wrote: > > > > Ok, I misread the original code, the lists were not sorted in the > previous results (and their should be). So with the correction to have them > sorted, > > I think to be fair, you wa

[Python-ideas] Re: Set operations with Lists

2019-09-20 Thread Steven D'Aprano
On Thu, Sep 19, 2019 at 10:25:20PM -0500, Tim Peters wrote: [...] > Something I haven't seen mentioned here, but I may have missed it: > when timing algorithms with sets and dicts, people often end up merely > measuring the bad effects of hardware data cache misses when the > containers get large,

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Tim Peters
[Richard Higginbotham ] > If you start including the time it takes to convert lists to sets its even > more > pronounced. hash values can collide and the bigger the data set the > more likely it is to happen. That's not so. Python's hash tables dynamically resize so that the load factor never ex

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Andrew Barnert via Python-ideas
On Sep 19, 2019, at 19:18, Richard Higginbotham wrote: > > It's not really constant though. It’s really hard to have a discussion when all of your posts are all replies, but you don’t give us any clue what you’re replying to. There are multiple responses from multiple people since your last em

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Richard Higginbotham
It's not really constant though. Look at all of the test results here. Richard Musil in the following post shows one value of A=1000,B=1 with .034 ms and the next point A=1,B=5 If the hash were constant time and A increased by 10x then the time should be 0.34ms but its not. It's 0.54

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Richard Higginbotham
I think there is a problem here using timeit for this type of problem, look at the large amount of time used by the set init time. Noticing this I changed my sorts to in place and got better results. I assume there is some copying of locals or other overhead but I can't explain the results. htt

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Andrew Barnert via Python-ideas
On Sep 19, 2019, at 15:18, Richard Musil wrote: > > Ok, I misread the original code, the lists were not sorted in the previous > results (and their should be). So with the correction to have them sorted, I think to be fair, you want to show it _both_ ways, just as you’re showing sets both with

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Andrew Barnert via Python-ideas
On Sep 19, 2019, at 13:18, Richard Higginbotham wrote: >>> For example are the file names in A in B and if so return a new list with >>> those items. Long story short, I wrote some functions to do that. They are >>> quite simple and fast (due to timsort in no small part). >>> Even the plain pyt

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Dominik Vilsmeier
It might be interesting to note that Numpy provides various set routines operating on their arrays (and hence lists as well, by conversion): https://docs.scipy.org/doc/numpy/reference/routines.set.html For [intersection](https://docs.scipy.org/doc/numpy/reference/generated/numpy.intersect1d.htm

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Richard Musil
On Thu, Sep 19, 2019 at 11:58 PM Richard Musil wrote: > I did not verify the results (I hope they are correct), but the timing > suggest that the set operation is always faster. When however we add the > set setup to the mix, the build up time, which is O(n + m) becomes more > significant than ac

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Richard Musil
On Thu, Sep 19, 2019 at 10:18 PM Richard Higginbotham wrote: > I'm sorry I haven't used the mail list before and didn't send my comments > to the list. Please see my response to Chris for some example times. I'm > reluctant to use anything less than about a ms for timing purposes. There's > too m

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Richard Higginbotham
I meant a near linear time complexity. n log n is very very close to n even when n is large. It's not constant, if it were it could be dropped entirely... It's pretty close though. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Chris Angelico
On Fri, Sep 20, 2019 at 6:31 AM Richard Higginbotham wrote: > > That's exactly what I expected. It's the opposite though. The time to check > one list for all the elements in another list was far longer. Granted this > was a Linux box using ext2. The other helpful element is that the filesystem

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Richard Higginbotham
That's exactly what I expected. It's the opposite though. The time to check one list for all the elements in another list was far longer. Granted this was a Linux box using ext2. The other helpful element is that the filesystem returned the files in the proper order. One of the tests runs I ment

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Chris Angelico
On Fri, Sep 20, 2019 at 6:19 AM Richard Higginbotham wrote: > >c = [val for val in a if val in filterset] > This is the crux I think. The complexity is the number of elements in > filterset. Since filterset has to be check for every value in a its O(len(a) > * len(b) * filter search time). You w

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Richard Higginbotham
>> For example are the file names in A in B and if so return a new list with >> those items. Long story short, I wrote some functions to do that. They are >> quite simple and fast (due to timsort in no small part). >> Even the plain python code is faster than the built in set functions (afaik). >

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Andrew Barnert via Python-ideas
On Sep 19, 2019, at 12:09, Richard Higginbotham wrote: > > It might / probably? even be adapted to be used in the built in set > operations. How? I’m not sure if you get how sets work, and maybe you think they’re based on a red-black tree or skiplist or some other logarithmic collection, like

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Andrew Barnert via Python-ideas
> On Sep 19, 2019, at 12:07, Richard Higginbotham wrote: > > Its faster than the built in set type. Not according to timeit. If you want to dispute that, you need a proper benchmark, not just repeating the claims from your inaccurate one. I’m not sure what your relationtype is supposed to do (

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Chris Angelico
On Fri, Sep 20, 2019 at 5:11 AM Richard Higginbotham wrote: > The original use case was detecting new files in various directories. Each > directory was expected to have millions of 4KB files and maybe at most 100k > of new files. This was in around 2002 and it was a huge time savings for me. >

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Chris Angelico
On Fri, Sep 20, 2019 at 5:08 AM Richard Higginbotham wrote: > > Its faster than the built in set type. Uniqueness of values isn't a > requirement. The values need to be comparable so that they can be sorted. > This allows a linear time complexity. > Note that the linear time complexity assumes

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Richard Higginbotham
The value of the order is actually the critical part. If they are both ordered I can complete all comparisons in one pass over both lists at the same time. Lets think about in A and not B a = [1,2,3,6] b = [1,4,5,6,7] start at a[0] and b[0], 0 -- a[i] == b[j] so increment both indexes. 1 -- a[i]

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Richard Higginbotham
Its faster than the built in set type. Uniqueness of values isn't a requirement. The values need to be comparable so that they can be sorted. This allows a linear time complexity. That's basically the only restraint on the values. Here are some speed test results and pseudocode. def naive(a,b):

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Andrew Barnert via Python-ideas
On Sep 18, 2019, at 12:29, Richard Higginbotham wrote: > > I'm not sure if there is any interest by others but I have frequently come > across cases where I would like to compare items in one list in another > similar to relational algebra. For anyone who doesn’t want to read this whole reply:

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Chris Angelico
On Thu, Sep 19, 2019 at 10:32 PM Steven D'Aprano wrote: > > On Thu, Sep 19, 2019 at 03:26:27PM +1000, Chris Angelico wrote: > > On Thu, Sep 19, 2019 at 3:08 PM Richard Higginbotham > > wrote: > > > > > > I'm not sure if there is any interest by others but I have frequently > > > come across cas

[Python-ideas] Re: Set operations with Lists

2019-09-19 Thread Steven D'Aprano
On Thu, Sep 19, 2019 at 03:26:27PM +1000, Chris Angelico wrote: > On Thu, Sep 19, 2019 at 3:08 PM Richard Higginbotham > wrote: > > > > I'm not sure if there is any interest by others but I have frequently come > > across cases where I would like to compare items in one list in another > > simi

[Python-ideas] Re: Set operations with Lists

2019-09-18 Thread Chris Angelico
On Thu, Sep 19, 2019 at 3:08 PM Richard Higginbotham wrote: > > I'm not sure if there is any interest by others but I have frequently come > across cases where I would like to compare items in one list in another > similar to relational algebra. For example are the file names in A in B and > if