There is an "apple to apple" compare function. It's unsafe_object_compare.
If you look at the pre-sort check, you will find that if the list is
homogeneous but not of float, int, string, or tuple, it sets
compare_funcs.key_richcompare = ob_type->tp_richcompare and sets
compare_funcs.key_compare = u
Just submitted a PR implementing this:
https://github.com/python/cpython/pull/582
-- just need someone to review it now :)
Thanks for all your feedback, everyone!
On Sun, Mar 5, 2017 at 12:19 AM Elliot Gorokhovsky <
elliot.gorokhov...@gmail.com> wrote:
> (Summary of results: my patch
On Thu, Mar 9, 2017 at 3:04 AM Barry Scott wrote:
>
> So you do O(nlogn)*2 pointer compares with my suggestion it seems? Which
> is smaller the O(n) pointer checks?
>
Not sure what you mean here... pretty sure your inequality is backwards?
Look -- the point is, we already *do* the pointer check
On Wed, Mar 8, 2017 at 2:14 PM Barry wrote:
> Can you assume that list of of type(list[0]) and use that type's optimised
> sort?
> But in the optimised sort code check that the types are as required.
> If you hit an element that is not of the required type then fall back to
> the unoptimised sort
On Tue, Mar 7, 2017 at 4:53 PM David Mertz wrote:
>
>
> In [22]: class Eq(int):
> def __eq__(self, other):
> return True
>:
> In [23]: four, five, six = Eq(4), Eq(5), Eq(6)
> In [24]: lst = [four, five, six]
> In [25]: lst.count(Eq(7))
> Out[25]: 3
>
>
> How would this work (o
On Tue, Mar 7, 2017 at 1:47 PM Erik wrote:
>
> I'd prefer the sort optimization to be based on what my list contains
> NOW, not on what it may have contained some time in the past, so I'm not
> a fan of the "once heterogeneous, always considered heterogeneous"
> behaviour if it's cheap enough to
On Mon, Mar 6, 2017, 4:42 PM Jim J. Jewett wrote:
(1) Good Job.
Thanks!
(3) Ideally, your graph would have the desired-to-be lines after the
as-is lines; for English writing, that would mean putting your (short
red) lines to the right of the (tall blue) lines.
(4) I suspect colors other t
On Sun, Mar 5, 2017 at 11:31 PM Tim Peters wrote:
>
> The best approach is indeed to pass the function pointer to every
> location that needs it. Note that a MergeState struct is already
> created per sort invocation, That isn't a file global for much the
> same reason.
>
Right. It's a real sha
On Sun, Mar 5, 2017 at 11:39 PM INADA Naoki wrote:
>
> I think there is another safe solution: Gave up unsafe_object_compare.
>
> Compare function of long, float, and unicode must not call list.sort(),
> and must not release GIL.
> So all you need is, backup old compare_function before sort, and
On Sun, Mar 5, 2017 at 11:21 PM Jelle Zijlstra
wrote:
>
> I think using a global is unsafe even without multithreading, because
> the compare function itself could end up doing list.sort() (it's
> calling arbitrary Python code after all).
>
Right, of course. So, clearly, the only safe solution i
Another solution: check if there is more than one thread; if there is, then
disable optimization. Is sorting in multithreaded programs common enough to
warrant adding the complexity to deal with it?
On Sun, Mar 5, 2017 at 10:52 PM Elliot Gorokhovsky <
elliot.gorokhov...@gmail.com> wrote:
On Sun, Mar 5, 2017 at 10:45 PM Elliot Gorokhovsky <
elliot.gorokhov...@gmail.com> wrote:
>
> the problem is, how can we get a unique identifier for the thread in a
> platform-independent way? Any ideas?
>
Oh, I could probably just copy over code from threading.get_ident()... n
On Sun, Mar 5, 2017 at 10:25 PM David Mertz wrote:
>
> Could we make the file global a table of comparison functions and have
> each thread reference a position in the table? It would be fine if multiple
> threads happened to use the position of e.g. the int comparison, just so
> long as each cho
On Sun, Mar 5, 2017 at 9:25 PM Tim Peters wrote:
>
> Would someone please move the patch along? I expect it's my fault it's
> languished so long, since I'm probably the natural person to review it, but
> I've been buried under other stuff.
>
> But the patch doesn't change anything about the sort
On Sun, Mar 5, 2017 at 9:12 PM MRAB wrote:
>
> Although it's true that both programmers and Python might treat 10 as
> functionally identical to 10.0, in practice the numbers that are being
> added to the list probably come from some code that returns integers
> /or/ floats, rather than a mixture
On Sun, Mar 5, 2017 at 8:29 PM Nick Timkovich
wrote:
> I see the benchmarks, and while I assume the asymptotic complexity is the
> same, is there a longer "start-up" time your optimizations need? Do you
> have benchmarks where you sort 10, 100...10**6 items to show that beyond
> the types you're
On Sun, Mar 5, 2017 at 8:23 PM David Mertz wrote:
>
> But also, it's not just the number of list objects that are sorted, but
> how often it's done. I could have a 10,000 line program with only one call
> to `my_list.sort()` in it... but that one line is something that is called
> inside an inne
On Sun, Mar 5, 2017 at 8:09 PM David Mertz wrote:
>
> If we added __type_hint__ as None/type-object and added those comparisons
> to it on .insert()/.append()/etc, then we would be slower by some increment
> while all we were doing was adding things. There could only be a win when
> the list is
On Sun, Mar 5, 2017 at 8:20 PM David Mertz wrote:
>
> B. I think a very large percentage of lists are heterogeneous. But most
> of the time when they are, it's not because there are several different
> numeric types but rather because a list collects some sort of custom
> objects. Maybe those e
On Sun, Mar 5, 2017 at 8:10 PM Chris Angelico wrote:
>
> Also, the performance hit is so small, and even that is in the very
> worst case (a homogeneous list with one different type at the end). I
> like changes that make stuff run faster.
>
I agree.
_
On Sun, Mar 5, 2017 at 7:50 PM Chris Angelico wrote:
>
> I would be rather curious to know how frequently a list consists of
> "numbers", but a mix of ints and floats. Does it happen a
> lot in real-world code?
>
>
This is of course undecidable to verify statically, so we can't just crawl
PyPI...
pending on whether the problem raised in
the previous paragraph can be addressed. What do you think?
On Sun, Mar 5, 2017 at 7:47 PM Steven D'Aprano wrote:
On Sun, Mar 05, 2017 at 07:19:43AM +, Elliot Gorokhovsky wrote:
> You may remember seeing some messages on here about optimi
On Sun, Mar 5, 2017 at 6:30 PM INADA Naoki wrote:
> Good job. I'll read your work later.
>
Thanks!
So please don't use string-key dict specialization to rationalize
> list-sort specialization.
>
I'm not quite doing that: I agree that dict is a special case because of
internal use. I'm just sa
On Sun, Mar 5, 2017 at 10:51 AM David Mertz wrote:
> Thanks for the hard work, this looks very promising.
>
Thank you!
> Real world data tends to be mostly-sorted. So it would be useful for your
> benchmarks to include:
>
> A) Performance on completely sorted data
> i) Of homogenous type
>
(Summary of results: my patch at https://bugs.python.org/issue28685 makes
list.sort() 30-50% faster in common cases, and at most 1.5% slower in the
uncommon worst case.)
Hello all,
You may remember seeing some messages on here about optimizing list.sort()
by exploiting type-homogeneity: since com
n
the implementation I was able to figure out that Py_SIZE is interpreted as
the sign times the number of digits (unless I'm missing something), but
this should be in the docs IMO.
On Wed, Oct 19, 2016 at 7:24 PM Thomas Nyberg wrote:
> On 10/19/2016 09:04 PM, Elliot Gorokhovsky wrote:
>
A quick note:
I'm working on a special-case compare function for bounded integers for the
sort stuff. By looking at the implementation, I figured out that Py_SIZE of
a long is the sign times the number of digits (...right?). Before looking
at the implementation, though, I had looked for this info
On Wed, Oct 12, 2016 at 4:26 PM Terry Reedy wrote:
> I suspect that optimizing string sorting
> will take some experimentation. If the initial item is str, it might be
> worthwhile to record the highest 'kind' during the type scan, so that
> strncmp can be used if all are ascii or latin-1.
>
My
at 4:08 PM Alexander Belopolsky <
alexander.belopol...@gmail.com> wrote:
>
> On Wed, Oct 12, 2016 at 5:57 PM, Elliot Gorokhovsky <
> elliot.gorokhov...@gmail.com> wrote:
>
> On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith wrote:
>
> But this isn't relevant
On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith wrote:
> But this isn't relevant to Python's str, because Python's str never uses
> UTF-8.
>
Really? I thought in python 3, strings are all unicode... so what encoding
do they use, then?
___
Python-ideas
> So what's Python doing? Is it a codepoint ordering?
>
...ya...how is the python interpreter supposed to know what language
strings are in? There is a unique ordering of unicode strings defined by
the unicode standard, AFAIK.
If you want to sort by natural language ordering, see here:
https://pyp
On Wed, Oct 12, 2016 at 3:39 PM Nathaniel Smith wrote:
> It looks like PyUnicode_Compare already has a special case to use
> memcmp when both of the strings fit into latin1:
>
Wow! That's great! I didn't even try reading through unicode_compare,
because I felt I might miss some subtle detail tha
On Wed, Oct 12, 2016 at 5:36 AM Paul Moore wrote:
> On 12 October 2016 at 11:16, Steven D'Aprano wrote:
> > On Wed, Oct 12, 2016 at 12:25:16AM +, Elliot Gorokhovsky wrote:
> >
> >> Regarding generalization: the general technique for special-casing is
> y
On Tue, Oct 11, 2016 at 9:56 PM Nick Coghlan wrote:
> Once you get to the point of being able to do performance mentions on
> a CPython build with a modified list.sort() implementation, you'll
> want to take a look at the modern benchmark suite in
> https://github.com/python/performance
>
Yup, t
On Wed, Oct 12, 2016 at 9:20 AM Tim Peters wrote:
> > What I'm *not* quite clear on is why Python 3's change to reject
> > comparisons between unrelated types makes this optimisation possible.
>
> It doesn't. It would also apply in Python 2. I simply expect the
> optimization will pay off more
quoting private email as a general
> courtesy, but I hereby give you permission if it was mine that was private.
> [Though I think your summary was better than a quote anyhow.]
>
> -jJ
>
> On Oct 11, 2016 4:58 PM, "Elliot Gorokhovsky" <
> elliot.gorokhov...@gmail.co
So I got excited here. And the reason why is that I got those numbers *on
Tim's benchmark*. When I got these kinds of numbers on my benchmarks, I
figured there was probably a problem with they way I was timing, and
certainly the gains couldn't be as extreme as they suggested. But this is
on a bench
Warning: the contents of this message may be dangerous for readers with
heart conditions.
So today I looked at PyFloat_RichCompare. I had been scared initially
because it was so complicated, I was worried I would miss some important
error check or something if I special cased it. So I took the fol
On Mon, Oct 10, 2016 at 10:15 PM Guido van Rossum wrote:
> But that's suspicious in itself -- since no comparisons are needed to
>
> sort a 1-element list, if it's still faster, there must be something
>
> else you're doing (or not doing) that's affecting the time measured.
>
Oh, ya. Duh. So th
co wrote:
> On Tue, Oct 11, 2016 at 2:41 PM, Elliot Gorokhovsky
> wrote:
> > Oh no, the idea here is just you would copy over the floats associated
> with
> > the PyObject* and keep them in an array of such structs, so that we know
> > which PyObject* are associat
quotes from now
on.
On Mon, Oct 10, 2016 at 9:38 PM Chris Angelico wrote:
> On Tue, Oct 11, 2016 at 2:29 PM, Elliot Gorokhovsky
> wrote:
> > Ya, I think this may be a good approach for floats: if the list is all
> > floats, just copy all the floats into a seperate array, use t
Thanks for looking at this!
That's why I spent months of my life (overall) devising a sequence of
sorting algorithms for Python that reduced the number of comparisons needed.
Yes, that's why I think this is so cool: for a couple dozen lines of code,
we can get (at least for some cases, according
42 matches
Mail list logo