Hi Michael,
Two thoughts -- and I hope others will chime in if I'm off-base here: 1) Build just one index after the import: >From page: http://www.sqlite.org/lang_createtable.html "The UNIQUE constraint causes an unique index to be created on the specified columns." I think that if you then create your own index, it will be redundant, and, with millions of records, time-consuming. However, you may well want to KEEP your own index (and add the UNIQUE constrait to it), and *remove* the UNIQUE constraint on the table column definition. I hear that creating the index after the data has been imported is faster (and less fragmented) than creating an implicit one on-the-fly during import. 2) Hashing If Vinnie [thev...@yahoo.com] was correct in guessing your goal (to determine if a given string has been examined before), and if performance remains a problem, I wondered if the design might benefit from using hashing to "pre-qualify" a string and/or substitute for a direct match. A short hash (perhaps a 64-bit integer?) could be stored in a separate table or even a separate database from the strings themselves. If the hash of a new string does not match any hash in the database, then you know there is no need to search for the string itself. If the entire hash index can be kept in RAM, you might get a big benefit. You could decide to take this a step further by using a longer hash as a proxy for the string itself. The hash would need to be long enough to make collisions extremely unlikely, yet not so long as to negate the value of using a proxy for the original strings. In practice, you'd probably want to compute a single long hash for each string, then use the first X bytes as the "pre-qualifying" hash, and the remainder as the "confirming hash." If your average string length is short, then using *two* hashes may not be much of a gain. (Of course, since this is not a cryptographic application, you don't need to worry about whether your chosen hash algorithm is "secure" or not.) Regards, Donald -----Original Message----- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Vinnie Sent: Sunday, March 29, 2009 6:14 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] creating unique data takes many hours, help Michael: While my answer isn't strictly limited to sqlite, the performance nut in me can't help myself. You have a a table with only one column, a string. I could be wrong here but it seems like you just want to keep a list of values that you have already tried. After you insert a bunch of strings into the table you want to be able to quickly look up a string to see if it exists, so that you can tell if you already probed that sequence (taking a guess here). If my guess is right, and the only thing you are doing is looking up sorted single-column elements, you probably can get away with your own quick disk-based binary tree implementation and avoid sqlite for this particular circumstance altogether. The result would be several orders of magnitude faster, even after you have followed the suggestions others have given. > Hi, > > I am new with sqlite, and I create a program that reads several mllion > records and puts them into a sqlite db using. > > The table has one column ONLY indexed and unique, but it takes many > hours. > > Is it necessary to pre-allocate the space, or is the anything that I > can do to reduce the time it takes. > > this is how I create the db, table and index. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users