At 04:47 22/03/2007, you wrote:

I don't think that solves my problem.  Sure, it guarantees that the IDs are
unique, but not the strings.

My whole goal is to be able to create a unique identifier for each string,
in such a way that I dont have the same string listed twice, with different
identifiers.

In your solution, there is no way to lookup a string to see if it already
exists, since there is no index on the string.

Thanks,
Chris

So you have a file with data, a large collection of strings 112 millions, each at most 80-bytes, although typically
shorter.

How do you manage repeated data? Replace? First In? Modify string to be unique?

You want put them in a sqlite3 database, but each string must be only once. The problem i see here is if you have a data file with repeated strings or not. I think that a grep or a perl script can help you a lot cleaning your data first. Then import to database will be fast.

HTH




--------------------------------------------------------------------------
"Hemos encontrado al enemigo y somos nosotros"



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to