Re: [sqlite] Anomalously slow performance on updates to earlyentries in a DB

Black, Michael (IS) Fri, 09 Nov 2012 07:07:29 -0800

What I would do is find the max length of your data fields.
Then dump the database, change the create table to use default values at those 
string lengths.
Import it.


See what that does for you.  Or just reload your data the way you've been doing 
with the new default string lengths.

Also a compound index on name/created_at could help you a lot.

And since you're in a trigger I'm not sure if the insert or replace would help 
since that's essentially what you're doing anyways and I believe that's all 
wrapped inside a transaction inside triggers.  Somebody please correct me if 
this is not true.



Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems

________________________________________
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of O'Toole, Eamonn [eamonn.oto...@hp.com]
Sent: Friday, November 09, 2012 8:53 AM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Anomalously slow performance on updates to 
earlyentries in a DB

> -----Original Message-----
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Black, Michael (IS)
> Sent: 09 November 2012 14:26
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Anomalously slow performance on updates to
> earlyentries in a DB
>
> >[O'Toole, Eamonn] This definitely sounds like it could be an issue.
> There is just one container_stat entry >confirmed by sqlite3_analyzer
> output which I'll post later).  So you're saying that the single
> container_stat table :entry is potentially being relocated very
> frequently the closer the update is to the beginning of the db?
>
> Yes....you didn't say what your data flow is...but since it's account-
> based I assume you have a bunch of accounts that get preloaded.
[O'Toole, Eamonn] In Swift the db is used to store information on the 
containers.  The hierarchy in Swift is account->container->object.  The test 
targets a specific container of a specific account, which contains information 
on 10 million objects, and changes the "created_at" field for the first 2 
million entries.  First we create the container, add the 10 million entries to 
it, then we start the update cycle.  We see this slow performance on every 
update run, although if you run updates in succession without any break in 
between runs you do see an improvement in performance due to caching.


> So the first n-thousand records are Size1.
> You then start updating each of those...none of them are big
> enough...the new records get inserted at the first available empty slot
> (is that actually how this works or is there another row allocation
> strategy?).
[O'Toole, Eamonn] The SQL transactions are done through python.  This is the 
relevant section of code that deals with the object table, the container_stat 
table is updated by the triggers:

            for rec in item_list:
                query = '''
                    DELETE FROM object
                    WHERE name = ? AND (created_at < ?)
                '''
                if self.get_db_version(conn) >= 1:
                    query += ' AND deleted IN (0, 1)'
                conn.execute(query, (rec['name'], rec['created_at']))
                query = 'SELECT 1 FROM object WHERE name = ?'
                if self.get_db_version(conn) >= 1:
                    query += ' AND deleted IN (0, 1)'
                if not conn.execute(query, (rec['name'],)).fetchall():
                    conn.execute('''
                        INSERT INTO object (name, created_at, size,
                            content_type, etag, deleted)
                        VALUES (?, ?, ?, ?, ?, ?)
                    ''', ([rec['name'], rec['created_at'], rec['size'],
                          rec['content_type'], rec['etag'], rec['deleted']]))


> Now you go to update those records again...they can't fit in the 1st
> block...and some percentage of the data won't fit into the 2nd block
> (depends on the variability in size).  So, let's say half the records
> get relocated....eventually you reach homeostasis.
>
> Also...disk fragmentation could be affecting you too but I wouldn't
> expect an order of magnitude difference on that.
>
[O'Toole, Eamonn] Disk fragmentation is definitely a factor, but as you say 
defragging doesn't get you an order of magnitude improvement.
> Are you on Windows or Unix?
[O'Toole, Eamonn] Linux (Ubuntu)
>
> Also...are your records indexed for the updates?
[O'Toole, Eamonn] The only object table index that I can see is the 
ix_deleted_name index, and that isn't used by the table update logic.
>
>
>
> Michael D. Black
> Senior Scientist
> Advanced Analytics Directorate
> Advanced GEOINT Solutions Operating Unit
> Northrop Grumman Information Systems
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Anomalously slow performance on updates to earlyentries in a DB

Reply via email to