Chris Jones wrote:
Hi all,

I have a very simple schema.  I need to assign a unique identifier to a
large collection of strings, each at most 80-bytes, although typically
shorter.

The problem is I have 112 million of them.

Maybe you could start by breaking the data into 8 equal groups and make a table of each group. Then merge the original groups pairwise, then merge those 4 groups, and finally the 2 semifinal groups (kinda like March Madness, come to think of it). Since each merging will be of already sorted/indexed data, it might save a lot of time.

Or perhaps do a block sort based on the first character of the string (or nth char if most of the first chars are the same) and have a bunch of smaller tables with that character as (part of) the table name. The global unique identifier could be the character concatenated with the rowid in its table.

HTH,

Gerry

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to