Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Joe Wilson
--- [EMAIL PROTECTED] wrote: > Joe Wilson <[EMAIL PROTECTED]> wrote: > > > > As for the stats from sqlite3_analyzer, they seem to be in the right > > ballpark. > > But I'm not sure its heuristic accounts for rows that are significantly > > larger > > than the page size, though. In such cases I a

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread drh
Joe Wilson <[EMAIL PROTECTED]> wrote: > > As for the stats from sqlite3_analyzer, they seem to be in the right ballpark. > But I'm not sure its heuristic accounts for rows that are significantly larger > than the page size, though. In such cases I am seeing higher than expected > fragmentation af

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Joe Wilson
--- [EMAIL PROTECTED] wrote: > Joe Wilson <[EMAIL PROTECTED]> wrote: > > > > See also: Changes to support fragmentation analysis in sqlite3_analyzer. > > http://www.sqlite.org/cvstrac/chngview?cn=3634 > > > > I'm not real sure those patches are working right. > I need to revisit that whole fragm

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread drh
Joe Wilson <[EMAIL PROTECTED]> wrote: > > See also: Changes to support fragmentation analysis in sqlite3_analyzer. > http://www.sqlite.org/cvstrac/chngview?cn=3634 > I'm not real sure those patches are working right. I need to revisit that whole fragmentation analysis thing before the next relea

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Martin Jenkins
P Kishor wrote: Mac/Unix person meself, but the Windows XP sort is pretty darn good as well. I'll take a look. Last time I used it it was useless. Win9x days? these days (especially for a one off) I'd probably go straight to doing it in Python to avoid x-platform syntax issues. Martin

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Chris Jones
Ah yes, I should read more carefully :) Thanks, right, I was actually guaranteeing uniqueness originally by just fetching and then inserting only if there wasn't a match (I needed a rowid if the row existed anyway). Now I'm guaranteeing uniqueness by letting sort do the work for me, but simila

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Joe Wilson
--- [EMAIL PROTECTED] wrote: > Gerry Snyder <[EMAIL PROTECTED]> wrote: > > Chris Jones wrote: > > > Hi all, > > > > > > I have a very simple schema. I need to assign a unique identifier to a > > > large collection of strings, each at most 80-bytes, although typically > > > shorter. > > > > > > T

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Gerry Snyder
Chris Jones wrote: I probably should have made this more explicit, but in sqlite, every row has a unique identifier named rowid, which exists even if it isn't explicity declared in the schema, and I was depending on that. If you declare a PRIMARY KEY, then this replaces rowid. A tiny cor

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Derrell . Lipman
Chris Jones <[EMAIL PROTECTED]> writes: > Derrell.Lipman wrote: >> >> Chris Jones <[EMAIL PROTECTED]> writes: >> >> I don't think that your original solution solves that problem either. You >> first posted this schema: >> >>> My schema looks as follows: >>> >>> CREATE TABLE rawfen ( fen VARCH

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Chris Jones
Derrell.Lipman wrote: > > Chris Jones <[EMAIL PROTECTED]> writes: > > I don't think that your original solution solves that problem either. You > first posted this schema: > >> My schema looks as follows: >> >> CREATE TABLE rawfen ( fen VARCHAR(80) ); >> CREATE INDEX rawfen_idx_fen ON rawfe

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Dennis Cote
Chris Jones wrote: So, I did a "sort -u -S 1800M fenout.txt > fenoutsort.txt" The sort took about 45 minutes, which is acceptable for me (it was much longer without the -S option to tell it to make use of more memory), and then loading the table was very efficient. Inserting all the rows into m

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Joe Wilson
--- Dennis Cote <[EMAIL PROTECTED]> wrote: > You could also improve the locality in the database file further by > running a vacuum command after it has been created. this will move the > pages around so that the page of the table are contiguous and so are the > pages of the index, rather than h

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread John Stanton
A fast technique to achieve your objective is to perform what I believe is called a "monkey puzzle" sort. The data is not moved, instead an array of descriptors to each element is sorted. The output is realized by scanning the list of descriptors and picking up the associated record from the

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread John Stanton
An issue with cache is cache shadowing, the churning as data is copied from one cache to another to another. An example is the speed-up achieved on network accesses by using sendfile or TransmitFile and bypassing up to four levels of buffering for a message being despatched to a network interf

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread P Kishor
On 3/22/07, Martin Jenkins <[EMAIL PROTECTED]> wrote: Chris Jones wrote: > realized that the unix "sort" If I'd known you were on 'nix I'd have suggested using 'sort' and/or 'md5sum' about 12 hours ago. ;) Mac/Unix person meself, but the Windows XP sort is pretty darn good as well. -- Pune

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Martin Jenkins
Chris Jones wrote: realized that the unix "sort" If I'd known you were on 'nix I'd have suggested using 'sort' and/or 'md5sum' about 12 hours ago. ;) Martin - To unsubscribe, send email to [EMAIL PROTECTED]

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread John Stanton
You could sort the table then perform a merge which removes duplicates. Chris Jones wrote: I don't think that solves my problem. Sure, it guarantees that the IDs are unique, but not the strings. My whole goal is to be able to create a unique identifier for each string, in such a way that I

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Chris Jones
Thanks everyone for your feedback. I ended up doing a presort on the data, and then adding the data in order. At first I was a little concerned about how I was going to implement an external sort on a data set that huge, and realized that the unix "sort" command can handle large files, and in f

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Dennis Cote
Chris Jones wrote: I've read elsewhere that this is a data locality issue, which certainly makes sense. And in those threads, a suggestion has been made to insert in sorted order. But it's unclear to me exactly what the sorting function would need to be - it's likely my sorting function (say s

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread P Kishor
On 3/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: "P Kishor" <[EMAIL PROTECTED]> wrote: > Richard, > > On 3/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > ... > > The problem is that your working set is bigger than your cache > > which is causing thrashing. I suggest a solution lik

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread drh
"P Kishor" <[EMAIL PROTECTED]> wrote: > Richard, > > On 3/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > ... > > The problem is that your working set is bigger than your cache > > which is causing thrashing. I suggest a solution like this: > > > > Add entries to table ONE until the table a

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Eduardo Morras
At 04:47 22/03/2007, you wrote: I don't think that solves my problem. Sure, it guarantees that the IDs are unique, but not the strings. My whole goal is to be able to create a unique identifier for each string, in such a way that I dont have the same string listed twice, with different identif

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread P Kishor
Richard, On 3/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: .. The problem is that your working set is bigger than your cache which is causing thrashing. I suggest a solution like this: Add entries to table ONE until the table and its unique index get so big that they no longer fit in ca

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Brad Stiles
Gerry Snyder <[EMAIL PROTECTED]> wrote: > Chris Jones wrote: > Hi all, > I have a very simple schema. I need to assign a unique identifier > to a large collection of strings, each at most 80-bytes, although > typically shorter. Would it help to hash the strings, then save them in the DB, checki

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread drh
Ion Silvestru <[EMAIL PROTECTED]> wrote: > > drh wrote: > > INSERT INTO two SELECT * FROM one ORDER BY unique_column; > > >The ORDER BY is important here. > > This is an excerpt from SQLite documentation: > > The second form of the INSERT statement takes it data from a SELE

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Derrell . Lipman
Chris Jones <[EMAIL PROTECTED]> writes: > I don't think that solves my problem. Sure, it guarantees that the IDs are > unique, but not the strings. > > My whole goal is to be able to create a unique identifier for each string, > in such a way that I dont have the same string listed twice, with

Re[2]: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Ion Silvestru
> drh wrote: > INSERT INTO two SELECT * FROM one ORDER BY unique_column; >The ORDER BY is important here. This is an excerpt from SQLite documentation: The second form of the INSERT statement takes it data from a SELECT statement. The number of columns in the result of the

Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread drh
Gerry Snyder <[EMAIL PROTECTED]> wrote: > Chris Jones wrote: > > Hi all, > > > > I have a very simple schema. I need to assign a unique identifier to a > > large collection of strings, each at most 80-bytes, although typically > > shorter. > > > > The problem is I have 112 million of them. > >

Re: [sqlite] Index creation on huge table will never finish.

2007-03-21 Thread Gerry Snyder
Chris Jones wrote: Hi all, I have a very simple schema. I need to assign a unique identifier to a large collection of strings, each at most 80-bytes, although typically shorter. The problem is I have 112 million of them. Maybe you could start by breaking the data into 8 equal groups and make

Re: [sqlite] Index creation on huge table will never finish.

2007-03-21 Thread P Kishor
You stated in your OP I need to assign a unique identifier to a large collection of strings Obviously I misunderstood that to mean you wanted the strings tagged uniquely, not that the strings were unique. In your case, it seems then, you will have to put up with checking each string, and as th

Re: [sqlite] Index creation on huge table will never finish.

2007-03-21 Thread Chris Jones
I don't think that solves my problem. Sure, it guarantees that the IDs are unique, but not the strings. My whole goal is to be able to create a unique identifier for each string, in such a way that I dont have the same string listed twice, with different identifiers. In your solution, there i

Re: [sqlite] Index creation on huge table will never finish.

2007-03-21 Thread P Kishor
On 3/21/07, Chris Jones <[EMAIL PROTECTED]> wrote: Hi all, I have a very simple schema. I need to assign a unique identifier to a large collection of strings, each at most 80-bytes, although typically shorter. The problem is I have 112 million of them. My schema looks as follows: CREATE TAB

[sqlite] Index creation on huge table will never finish.

2007-03-21 Thread Chris Jones
Hi all, I have a very simple schema. I need to assign a unique identifier to a large collection of strings, each at most 80-bytes, although typically shorter. The problem is I have 112 million of them. My schema looks as follows: CREATE TABLE rawfen ( fen VARCHAR(80) ); CREATE INDEX rawfen_id