A fast technique to achieve your objective is to perform what I believe is called a "monkey puzzle" sort. The data is not moved, instead an array of descriptors to each element is sorted. The output is realized by scanning the list of descriptors and picking up the associated record from the input list.

Using a modest machine your application should run in less than ten minutes using that method. One way we use it is as a first stage in building a B-Tree index rapidly.

Chris Jones wrote:
Thanks everyone for your feedback.

I ended up doing a presort on the data, and then adding the data in order. At first I was a little concerned about how I was going to implement an
external sort on a data set that huge, and realized that the unix "sort"
command can handle large files, and in fact does it pretty efficiently.

So, I did a "sort -u -S 1800M fenout.txt > fenoutsort.txt"

The sort took about 45 minutes, which is acceptable for me (it was much
longer without the -S option to tell it to make use of more memory), and
then loading the table was very efficient.  Inserting all the rows into my
table in sorted order took only 18 minutes.
So, all in all, I can now load the table in just about an hour, which is
great news for me.

Thanks!
Chris



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to