Re: [Tutor] sorting algorithm

Jeff Johnson Thu, 11 Mar 2010 11:49:20 -0800

C.T. Matsumoto wrote:

Dave Angel wrote:
(You forgot to do a Reply-All, so your message went to just me, ratherthan to me and the list )
C.T. Matsumoto wrote:
Dave Angel wrote:
C.T. Matsumoto wrote:
Hello,
This is follow up on a question I had about algorithms. In thethread it was suggested I make my own sorting algorithm.
Here are my results.

#!/usr/bin/python

def sort_(list_):
   for item1 in list_:
       pos1 = list_.index(item1)
       pos2 = pos1 + 1
       try:
           item2 = list_[pos2]
       except IndexError:
           pass

       if item1 >= item2:
           try:
               list_.pop(pos2)
               list_.insert(pos1, item2)
               return True
           except IndexError:
               pass

def mysorter(list_):
   while sort_(list_) is True:
       sort_(list_)
I found this to be a great exercise. In doing the exercise, I gotpretty stuck. I consulted another programmer (my dad) who describedhow to go about sorting. As it turned out the description hedescribed was the Bubble sort algorithm. Since coding the solutionI know the Bubble sort is inefficient because of repeatediterations over the entire list. This shed light on the quick sortalgorithm which I'd like to have a go at.
Something I haven't tried is sticking in really large lists. I wastold that with really large list you break down the input list intosmaller lists. Sort each list, then go back and use the sameswapping procedure for each of the different lists. My question is,at what point to you start breaking things up? Is that based onlist elements or is it based on memory(?) resources python is using?
One thing I'm not pleased about is the while loop and I'd like toreplace it with a for loop.
Thanks,

T
There are lots of references on the web about Quicksort, including avideo at:
    http://www.youtube.com/watch?v=y_G9BkAm6B8
which I think illustrates it pretty well. It would be a greatlearning exercise to implement Python code directly from thatdescription, without using the sample C++ code available.
(Incidentally, there are lots of variants of Quicksort, so I'm notgoing to quibble about whether this is the "right" one to be calledthat.)
I don't know what your earlier thread was, since you don't mentionthe subject line, but there are a number of possible reasons youmight not have wanted to use the built-in sort. The best one is foreducational purposes. I've done my own sort for various reasons inthe past, even though I had a library function, since the libraryfunction had some limits. One time I recall, the situation was thatthe library sort was limited to 64k of total data, and I had to workwith much larger arrays (this was in 16bit C++, in "large" model).I solved the size problem by using the C++ sort library on 16ksubsets (because a pointer was 2*2 bytes). Then I merged theresults of the sorts. At the time, and in the circumstancesinvolved, there were seldom more than a dozen or so sublists tomerge, so this approach worked well enough.
Generally, it's better for both your development time and theefficiency and reliabilty of the end code, to base a new sortmechanism on the existing one. In my case above, I was replacingwhat amounted to an insertion sort, and achieved a 50* improvementfor a real customer. It was fast enough that other factorscompletely dominated his running time.
But for learning purposes? Great plan. So now I'll respond to yourother questions, and comment on your present algorithm.
It would be useful to understand about algorithmic complexity, theso called Order Function. In a bubble sort, if you double the sizeof the array, you quadruple the number of comparisons and swaps.It's order N-squared or O(n*n). So what works well for an array ofsize 10 might take a very long time for an array of size 10000 (likea million times as long). You can do much better by sorting smallerlists, and then combining them together. Such an algorithm can beO(n*log(n)).
You ask at what point you consider sublists? In a language like C,the answer is when the list is size 3 or more. For anything largerthan 2, you divide into sublists, and work on them.
Now, if I may comment on your code. You're modifying a list whileyou're iterating through it in a for loop. In the most generalcase, that's undefined. I think it's safe in this case, but I wouldavoid it anyway, by just using xrange(len(list_)-1) to iteratethrough it. You use the index function to find something you wouldalready know -- the index function is slow. And the firsttry/except isn't needed if you use a -1 in the xrange argument, as Ido above.
You use pop() and push() to exchange two adjacent items in thelist. Both operations copy the remainder of the list, so they'rerather slow. Since you're exchanging two items in the list, you cansimply do that:
    list[pos1], list[pos2] = list[pos2], list[pos1]

That also eliminates the need for the second try/except.
You mention being bothered by the while loop. You could replace itwith a simple for loop with xrange(len(list_)), since you know thatN passes will always be enough. But if the list is partiallysorted, your present scheme will end sooner. And if it's fullysorted, it'll only take one pass over the data.
There are many refinements you could do. For example, you don'thave to stop the inner loop after the first swap. You could finishthe buffer, swapping any other pairs that are out of order. You'dthen be saving a flag indicating if you did any swaps. You couldkeep a index pointing to the last pair you swapped on the previouspass, and use that for a limit next time. Then you just terminatethe outer loop when that limit value is 1. You could even keep twolimit values, and bubble back and forth between them, as theygradually close into the median of the list. You quit when theycollide in the middle.
The resultant function should be much faster for medium-sized lists,but it still will slow down quadratically as the list sizeincreases. You still need to divide and conquer, and quicksort isjust one way of doing that.
DaveA
Thanks a lot Dave,

Sorry the original thread is called 'Python and algorithms'.
Yes, I think it's best to use what python provides and build on topof that. I got to asking my original question based on trying tolearn more about algorithms in general, through python. Of late manypeople have been asking me how well I can 'build' algorithms, andthis prompted me to start the thread. This is for learning purposes(which the original thread will give you and indication where I'mcoming from).
The refactored code looks like this. I have tackled a couple items.First the sub-listing (which I'll wait till I can get the full sortworking), then the last couple of paragraphs about refinements.Starting with the first refinement, I'm not sure how *not* to stopthe inner loop?
def s2(list_):
   for pos1 in xrange(len(list_)-1):
       item1 = list_[pos1]
       pos2 = pos1 + 1
       item2 = list_[pos2]
       if item1 >= item2:
           list_[pos1], list_[pos2] = list_[pos2], list_[pos1]
           return True

def mysorter(list_):
   # This is the outer loop?
   while s2(list_) is True:
       # Calling s2 kicks off the inner loop?
       s2(list_)

if __name__ == '__main__':
   from random import shuffle
   foo = range(10)
   shuffle(foo)
   mysorter(foo)


Thanks again.
As before, I'm not actually trying this code, so there may be typos.But assuming your code here works, the next refinement would be:
In s2() function, add a flag variable, initially False. Then insteadof the return True, just say flag=True
Then at the end of the function, return flag
About the while loop. No need to say 'is True' just use whiles2(list_): And no need to call s2() a second time.
while s2(list_):
    pass
Okay up to here I follow. This all makes sense.

def s2(list_):
    flag = False
    for pos1 in xrange(len(list_)-1):
        item1 = list_[pos1]
        pos2 = pos1 + 1
        item2 = list_[pos2]
        if item1 >= item2:
            list_[pos1], list_[pos2] = list_[pos2], list_[pos1]
flag = Truereturn flag
def mysorter(list_):
    while s2(list_):
        pass
Before you can refine the upper limit, you need a way to preserve itbetween calls. Simplest way to do that is to combine the twofunctions, as a nested loop. Then, instead of flag, you can have avalue "limit" which indicates what index was last swapped. And theinner loop uses that as an upper limit on its xrange.
Where I start to get confused is refining the 'upper limit'. What is theupper limit defining? I'm guessing it is the last position processed.
T
DaveA

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Take out the = in the following line
         if item1 >= item2:

and it will sort like items together, which I think is what youoriginally wanted.


Also change:
>             return flag
to:
>    return flag
So that False gets returned if you don't make a swap.

This worked for me.  Thank you for the interesting thread!

--
Jeff

Jeff Johnson
j...@dcsoftware.com
Phoenix Python User Group - sunpigg...@googlegroups.com
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] sorting algorithm

Reply via email to