I'm not sure what the convention is for resurrecting old threads
but I felt it would be best to start fresh. For reference, the
older thread is here:
http://forum.dlang.org/thread/pyduqwmskkkoicsxi...@forum.dlang.org
tl;dr - I studied a paper with the goal of implementing a stable
3-way partitioning algorithm which runs in linear time without
allocating any memory on the heap. I spent over a month studying
that paper and only managed to understand about 80% of the
content. However, that was enough for me to derive a partitioning
algorithm which runs in O(n) time using O(log n) space.
The good news is that the O(log n) space can easily be allocated
on the stack so this achieves the original goal of avoiding
memory allocations. The bad news is that, despite running in
linear time, it comes at the cost of a huge constant factor which
makes it too slow for any practical purposes. It's a complex
algorithm which I never fully implemented but preliminary
benchmarks were discouraging.
With that said, I've been sitting on this for several months and
Phobos is still lacking a stable partition3 implementation so
it's about time I did something about that. I implemented a few
"potential candidates" with various pros and cons, two of which
can accept a variable-length buffer of a specific type. DPaste
isn't the best platform for running benchmarks so I suggest
compiling and running on your own machine.
http://dpaste.dzfl.pl/2a22af60baf9
Notes:
* "Assignments" requires the range have assignable elements;
"Swaps" and "Insertions" works on both swappable and assignable
elements.
* "Assignments Large Buffer" is how you would classically
implement a stable partitioning algorithm using O(n) space.
Clearly, it's by far the fastest.
* "Swaps Large Buffer" vigorously thrashes the cache so it gets
much slower on larger inputs.
* The recursive code is identical in all three implementations;
all that changes is the method for partitioning small sublists.
So it's possible to merge all these techniques into a single
function or even make them external functions separate from the
recursive code.
Any feedback is appreciated. I'll make a pull request once we
have consensus on what is the best solution.