Re: [sqlite] Bulk load strategy

Simon Slavin Wed, 17 May 2017 09:06:06 -0700

On 17 May 2017, at 4:06pm, Joseph L. Casale <jcas...@activenetwerx.com> wrote:


> CREATE TABLE AdObject (
>    Id                INTEGER PRIMARY KEY NOT NULL,
>    DistinguishedName TEXT    NOT NULL COLLATE NOCASE,
>    SamAccountName    TEXT    COLLATE NOCASE
> );
> CREATE UNIQUE INDEX AdObject_idx_0 ON AdObject (
>    DistinguishedName
> );
> CREATE INDEX AdObject_idx_1 ON AdObject (
>    SamAccountName
> );
> 
> CREATE TABLE AdAttribute (
>    Id       INTEGER PRIMARY KEY NOT NULL,
>    Type     TEXT    NOT NULL COLLATE NOCASE,
>    Value    TEXT    NOT NULL COLLATE NOCASE,
>    AdObjectId INTEGER NOT NULL REFERENCES AdObject ON DELETE CASCADE ON 
> UPDATE CASCADE
> );
> CREATE INDEX AdAttribute_idx_0 ON AdAttribute (
>    Type
> );
> CREATE INDEX AdAttribute_idx_1 ON AdAttribute (
>    Value
> );
> CREATE INDEX AdAttribute_idx_2 ON AdAttribute (
>    AdObjectId
> );
> 
> The bulk of the inserts look like:
> INSERT INTO AdObject
>      (DistinguishedName, SamAccountName)
>  VALUES
>      (@DistinguishedName, @SamAccountName);
> 
> INSERT OR IGNORE INTO AdAttribute
>      (Type, Value, AdObjectId)
>  VALUES
>      (@Type, @Value, @AdObjectId);

Fastest way to do bulk inserts would be to delete all the indexes which don’t 
play any part in identifying duplicates, then do the inserting, then remake the 
indexes.

So if you never insert duplicates on AdObject(DistinguishedName), DROP that 
index.  And definitely DROP all the others.  Then do your users.  The reCREATE 
the indexes.

As for the inserts themselves, batching them up in transactions of 10,000 seems 
acceptably fast.  But depending on the amount of memory you have free 100,000 
may be faster.  Or maybe even 2,000.  You’ll have to try it out.

I’m questioning the point of AdAttribute_idx_0 and AdAttribute_idx_1.  It’s 
rare to usefully index values without types, for instance.  Do you actually 
have a SELECT which uses that one ?  Wouldn’t it be more efficient to do

        CREATE INDEX AdAttribute_idx_tv ON AdAttribute (
           Type, Value
        );

?  That’s assuming that even that one gets used at all, since it seems far more 
likely that you’d use (AdObjectId,Type).

> (just noticed that IGNORE in the second query which serves no purpose).

That won’t slow down SQLite much.  Don’t worry about it.

> Things grind to a halt when I start the following:
> 
> INSERT OR IGNORE INTO AdAttribute
>      (Type, Value, AdObjectId)
>  VALUES
>      (@Type, @Value, (SELECT Id FROM AdObject WHERE DistinguishedName = 
> @DistinguishedName));

The sub-select is killing you.  Since it’s identical for all the INSERT 
commands I suggest that you do that first, and keep the results in memory as a 
lookup table, or a hashed table, or a dictionary, or whatever your preferred 
language does.  You can look up those values in RAM far more quickly than 
SQLite can do the required file handling.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Bulk load strategy

Reply via email to