Re: [HACKERS] A worst case for qsort

2014-08-08 Thread Peter Geoghegan
On Fri, Aug 8, 2014 at 5:54 AM, Robert Haas wrote: > Well, I'm not sure why you're having a hard time imagining it. > Presorted input is a common case in general; that's why we have a > check for it. That check adds overhead in the non-pre-sorted case to > improve the pre-sorted case, and nobody'

Re: [HACKERS] A worst case for qsort

2014-08-08 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote: > On Thu, Aug 7, 2014 at 5:52 PM, Peter Geoghegan wrote: > > On Thu, Aug 7, 2014 at 2:34 PM, Rod Taylor wrote: > >> This one is frequently sorted as batch operations against the files are > >> performed in alphabetical order to reduce conflict issues t

Re: [HACKERS] A worst case for qsort

2014-08-08 Thread Robert Haas
On Thu, Aug 7, 2014 at 5:52 PM, Peter Geoghegan wrote: > On Thu, Aug 7, 2014 at 2:34 PM, Rod Taylor wrote: >> This one is frequently sorted as batch operations against the files are >> performed in alphabetical order to reduce conflict issues that a random >> ordering may cause between jobs. > >

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan
On Thu, Aug 7, 2014 at 2:34 PM, Rod Taylor wrote: > This one is frequently sorted as batch operations against the files are > performed in alphabetical order to reduce conflict issues that a random > ordering may cause between jobs. Sure. There are cases out there. But, again, I have a hard time

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Rod Taylor
Sigh. Found another example. A table with 15 million entries and a unique key on filesystem location for things users created via a web interface. Entries all begin with /usr/home/ ... This one is frequently sorted as batch operations against the files are performed in alphabetical order to redu

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan
On Thu, Aug 7, 2014 at 2:23 PM, Rod Taylor wrote: > While I'm sure it's not common, I've seen a couple of ten-million tuple > tables having a URL column as primary key where 98% of the entries begin > with 'http://www.' > > So, that exact scenario is out there. Sure, that scenario is relatively c

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Rod Taylor
On Thu, Aug 7, 2014 at 3:06 PM, Peter Geoghegan wrote: > I think that pre-sorted, all-unique text datums, that have all > differences beyond the first 8 bytes, that the user happens to > actually want to sort are fairly rare. While I'm sure it's not common, I've seen a couple of ten-million tup

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan
On Thu, Aug 7, 2014 at 12:06 PM, Peter Geoghegan wrote: > I think that pre-sorted, all-unique text datums, that have all > differences beyond the first 8 bytes, that the user happens to > actually want to sort are fairly rare. Actually, you could use that case to justify not doing strxfrm() trans

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan
On Thu, Aug 7, 2014 at 11:33 AM, Robert Haas wrote: > I think that's actually not a very unrealistic case at all. In > general, I think that if a particular data distribution is a > reasonable scenario, that data distribution plus "it's already sorted" > is also reasonable. Data gets presorted i

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Robert Haas
On Thu, Aug 7, 2014 at 1:16 PM, Peter Geoghegan wrote: > On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas wrote: >> So here. You may not agree that the mitigation strategies for which >> others are asking for are worthwhile, but you can't expect everyone >> else to agree with your assessment of which

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan
On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas wrote: > So here. You may not agree that the mitigation strategies for which > others are asking for are worthwhile, but you can't expect everyone > else to agree with your assessment of which cases are likely to occur > in practice. The case of a coho

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread John Cochran
On Thu, Aug 7, 2014 at 11:07 AM, Robert Haas wrote: > On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan wrote: > > "The adversarial method works for almost any polymorphic program > > recognizable as quicksort. The subject quicksort may copy values at > > will, or work with lists rather than array

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Robert Haas
On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan wrote: > "The adversarial method works for almost any polymorphic program > recognizable as quicksort. The subject quicksort may copy values at > will, or work with lists rather than arrays. It may even pick the > pivot at random. The quicksort will

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Fabien COELHO
Hello John, [...] In fact, the mentioned paper says this about the subject "Moreover, if worst-case performance is important, Quicksort is the wrong algorithm." I fully agree with this conclusion. -- Fabien -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make chan

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Fabien COELHO
IMHO, while worst case performance is a very useful tool for analyzing algorithms (particularly their worst case time complexity), a worst case should be put in its practical context. For example, if we had reason to be concerned about *adversarial* inputs, I think that there is a good chance th

Re: [HACKERS] A worst case for qsort

2014-08-06 Thread Fabien COELHO
Random pivot selection will certainly result in more frequent lopsided partitions without any malicious intent. Yep. It makes "adversary input" attacks more or less impossible, at the price of higher average cost. Maybe a less randomized version would do, i.e. select randomly one of the "3"

Re: [HACKERS] A worst case for qsort

2014-08-06 Thread Peter Geoghegan
On Wed, Aug 6, 2014 at 9:18 AM, John Cochran wrote: > So it seems to me that the vulnerability only exits if an attacker supplied > comparison function is permitted. For all other cases, assuming that only > vetted comparison functions are permitted, the selection of a random pivot > would make an

Re: [HACKERS] A worst case for qsort

2014-08-06 Thread John Cochran
I just browsed the paper linked by Peter and it looks like the attack has to be active against a currently executing qsort. In the paper, what happens is the comparison function is supplied by the attacker and effectively lies about the result of a comparison. It keeps the lies consistent in a very

Re: [HACKERS] A worst case for qsort

2014-08-06 Thread Fabien COELHO
If so, adding some randomness in the decision process would suffice to counter the adversarial input argument you raised. This is specifically addressed by the paper. Indeed, randomly choosing a pivot is a common strategy. It won't fix the problem. Too bad. I must admit that I do not see how

Re: [HACKERS] A worst case for qsort

2014-08-05 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 11:49 PM, Fabien COELHO wrote: > If so, adding some randomness in the decision process would suffice to > counter the adversarial input argument you raised. This is specifically addressed by the paper. Indeed, randomly choosing a pivot is a common strategy. It won't fix th

Re: [HACKERS] A worst case for qsort

2014-08-05 Thread Fabien COELHO
For example, if we had reason to be concerned about *adversarial* inputs, I think that there is a good chance that our qsort() actually would be problematic to the point of driving us to prefer some generally slower alternative. That is an interesting point. Indeed, a database in general of

[HACKERS] A worst case for qsort

2014-08-05 Thread Peter Geoghegan
There was some informal discussion of worst case performance of my text sort support patch at the most recent developer's meeting in Ottawa. There is no need to repeat the details here, which aren't terribly interesting - I made a point about putting worst case performance in context. I also made a