Hi all,

I have a very simple schema.  I need to assign a unique identifier to a
large collection of strings, each at most 80-bytes, although typically
shorter.

The problem is I have 112 million of them.

My schema looks as follows:

CREATE TABLE rawfen ( fen VARCHAR(80) );
CREATE INDEX rawfen_idx_fen ON rawfen(fen);

Unforuntately, data loading this table takes virtually forever.  After 24
hours, its not finished, and that is inserting rows in transactions of
100,000 rows per transaction.

I tried dropping the index, and building it after row insertion.

That has two problems.  First, since I have no index, I can't guarantee
string uniqueness (and I'd like to).

Second, it still doesn't solve my speed problem.   Insertion without the
index takes a little over an hour, but the index creation never finishes. 
Well, I gave it 6 hours and it was unclear if it was making any progress.

I've read elsewhere that this is a data locality issue, which certainly
makes sense.

And in those threads, a suggestion has been made to insert in sorted order. 
But it's unclear to me exactly what the sorting function would need to be -
it's likely my sorting function (say strcmp()) wouldn't match what sqlite
expects.  It's also a little unclear if I can even sort this many keys
externally with any ease.  After all, that's why I'm trying to use a
database.

Last, is this something that is likely to affect me if I port over to
another database?  Do others, say mySQL for instance, handle this better?

Does anyone have any suggestions?    Upgrading my server isn't an option -
it already has 4Gig of memory and very fast disks in a RAID-5 configuration
(with a 512MB cache on the raid controller).

Thanks!
-- 
View this message in context: 
http://www.nabble.com/Index-creation-on-huge-table-will-never-finish.-tf3444218.html#a9604705
Sent from the SQLite mailing list archive at Nabble.com.


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to