Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

R Smith Thu, 04 Aug 2016 08:59:14 -0700


On 2016/08/04 5:05 PM, Kevin O'Gorman wrote:

Lots of answers, so thanks all around.  Some more info:

1. All partitions have at least 3 GB free, and it's not changing.  /tmp is
3 TiB and empty.
2. I have a RAID partition, for size, but no RAID controller.  As a hobby
project, I don't have spare parts, and I fear the results of a failure of a
hardware RAID without a spare, so I use Linux mdadm to manage software RAID
across three 4-TB drives.
3. Positions are 64 bytes always, so your size guesses are right.  They are
in no particular order.  I like the suggestion of a separate position
table, because they're going to appear in multiple qmove records, with an
average of about 3 or 4 appearances I think.  Maybe more.

I'm going to retry, using some of the suggestions above: smaller
transactions, IGNORE, position table rowids in the moves table, smaller
subsets being loaded, developing ideas of how time depends on data size.

If it doesn't go well fairly quickly, I'll probably go back to flat files
and writing the queries the hard way.  At least I know what I'm dealing
with there.

Note that there is nothing about SQLite that isn't working extremelywell for your purpose - going back to flat files is akin to going backto the dark ages. Databases of this size (and even much larger) workperfectly all around with some people here able to quote you morestaggering figures even - but it does require some special processingwhich might be thwarted by your current system layout or design choices.however, if done right, it should not be significantly slower thanwriting flat files - so try to persist to find what "done right" entails.

Remember that once this has worked - you can query the data with SQL...that is an amazingly powerful feature over flat files, and once an indexexists (however slow it took to make), queries will be lightning fast,something a flat-file can never give you.

QUESTIONS:
If it's best in general to build indexes as a separate step, does this also
apply to primary indexes?  Can a table without a primary index have one
added later?  Isn't ROWID the real primary, presuming it has ROWIDs?  And
if so, then is a primary index on a ROWID table just for compliance with
standard SQL, and really no better than any other index?  Obviously, I'm a
bit confused about this.

A primary index is nothing other than a normal Unique index which has aguaranteed 1-to-1 key-to-row ratio and has look-up priority status. AllSQL tables MUST have such a key to be able to guarantee access to anyindividual row, and if you omit the primary key bit, then some internalmechanism is used for it - in SQLite's case, this mechanism is calledthe row_id. A nice recent feature of SQLite allows you to get rid ofthis row_id overhead by explicitly specifying a Primary Key and thenWITHOUT ROWID after the Table declaration.

The advantage of NOT using an explicit Primary key from the start isthat the row_id can simply be added by increments during insertstatements due to its auto-supplied-by-the-db-engine and INT typenature. A primary key has to be sorted (or at least, has to determinethe correct sort-order insert position) for every added row. This costsa good bit of time on really large insert operations. The bulky singlesort operation while building the PK index after-the-fact takes a lotless time than the repeated look-up-insert operations for the key whilemaking the table.

I hope this answers the question, but feel free to ask more if anythingremains unclear.


While I'm at it, I may as well ask if ROWID has any physical significance,
such that a VACUUM operation might change it.  Or is it just an arbitrary
ID inserted by SQLite and added to each record when they exist at all.

It is arbitrary, it is supplied by the DB engine in general, but it willnever be changed by anything! That would go against all SQL and RDBMSpremises.You could supply it yourself to (and I often advocate this to be thebetter practice).


If you declare a table with a primary key specified /exactly/ like this:
CREATE myTable ("ID" INTEGER PRIMARY KEY, .... )

Then the "ID" in this case becomes an alias for the row_id andmanipulating/reading the value in it is in fact reading / manipulatingthe actual row_id.

(The "ID" can be anything else you like, but the "INTEGER PRIMARY KEY"part needs to be exactly written like that).


The current dataset is intended to solve one particular issue in the
overall project.  It looks like I'd want to build each such dataset
separately, as there will likely be a few hundred, and I gather that adding
to these tables will be pretty slow once the indexes have been built.  Or
is it sensible to drop indexes, add data and rebuild?

No this is not true - it will be really fast to add another few rows...it's just slow when you add the initial few zillion rows due to simplelaws of quantity. If however you will be adding rows at an amazing rate,I would suggest a different way of doing it.

Also, you can gain a LOT of speed by simply using the C API directly inyour exporter and not try to import text from it after the fact, not tomention the advantage of being able to track speed and progress - it'seasy too, any reason you are not going this route?(If you are not confident with C or the API, I'm sure 99% of the peoplehere can type you out a piece of code in under a minute that will do theinserts via API, if you need it).


Cheers,
Ryan
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

Reply via email to