Thanks again Michael.
> -----Original Message----- > From: [email protected] [mailto:sqlite-users- > [email protected]] On Behalf Of Black, Michael (IS) > Sent: 09 November 2012 15:07 > To: General Discussion of SQLite Database > Subject: Re: [sqlite] Anomalously slow performance on updates to > earlyentries in a DB > > What I would do is find the max length of your data fields. > Then dump the database, change the create table to use default values > at those string lengths. > Import it. > > See what that does for you. Or just reload your data the way you've > been doing with the new default string lengths. > > Also a compound index on name/created_at could help you a lot. > > And since you're in a trigger I'm not sure if the insert or replace > would help since that's essentially what you're doing anyways and I > believe that's all wrapped inside a transaction inside triggers. > Somebody please correct me if this is not true. > > > > Michael D. Black > Senior Scientist > Advanced Analytics Directorate > Advanced GEOINT Solutions Operating Unit > Northrop Grumman Information Systems > > ________________________________________ > From: [email protected] [[email protected]] > on behalf of O'Toole, Eamonn [[email protected]] > Sent: Friday, November 09, 2012 8:53 AM > To: General Discussion of SQLite Database > Subject: EXT :Re: [sqlite] Anomalously slow performance on updates to > earlyentries in a DB > > > -----Original Message----- > > From: [email protected] [mailto:sqlite-users- > > [email protected]] On Behalf Of Black, Michael (IS) > > Sent: 09 November 2012 14:26 > > To: General Discussion of SQLite Database > > Subject: Re: [sqlite] Anomalously slow performance on updates to > > earlyentries in a DB > > > > >[O'Toole, Eamonn] This definitely sounds like it could be an issue. > > There is just one container_stat entry >confirmed by sqlite3_analyzer > > output which I'll post later). So you're saying that the single > > container_stat table :entry is potentially being relocated very > > frequently the closer the update is to the beginning of the db? > > > > Yes....you didn't say what your data flow is...but since it's > account- > > based I assume you have a bunch of accounts that get preloaded. > [O'Toole, Eamonn] In Swift the db is used to store information on the > containers. The hierarchy in Swift is account->container->object. The > test targets a specific container of a specific account, which contains > information on 10 million objects, and changes the "created_at" field > for the first 2 million entries. First we create the container, add > the 10 million entries to it, then we start the update cycle. We see > this slow performance on every update run, although if you run updates > in succession without any break in between runs you do see an > improvement in performance due to caching. > > > > So the first n-thousand records are Size1. > > You then start updating each of those...none of them are big > > enough...the new records get inserted at the first available empty > slot > > (is that actually how this works or is there another row allocation > > strategy?). > [O'Toole, Eamonn] The SQL transactions are done through python. This > is the relevant section of code that deals with the object table, the > container_stat table is updated by the triggers: > > for rec in item_list: > query = ''' > DELETE FROM object > WHERE name = ? AND (created_at < ?) > ''' > if self.get_db_version(conn) >= 1: > query += ' AND deleted IN (0, 1)' > conn.execute(query, (rec['name'], rec['created_at'])) > query = 'SELECT 1 FROM object WHERE name = ?' > if self.get_db_version(conn) >= 1: > query += ' AND deleted IN (0, 1)' > if not conn.execute(query, (rec['name'],)).fetchall(): > conn.execute(''' > INSERT INTO object (name, created_at, size, > content_type, etag, deleted) > VALUES (?, ?, ?, ?, ?, ?) > ''', ([rec['name'], rec['created_at'], rec['size'], > rec['content_type'], rec['etag'], > rec['deleted']])) > > > > Now you go to update those records again...they can't fit in the 1st > > block...and some percentage of the data won't fit into the 2nd block > > (depends on the variability in size). So, let's say half the records > > get relocated....eventually you reach homeostasis. > > > > Also...disk fragmentation could be affecting you too but I wouldn't > > expect an order of magnitude difference on that. > > > [O'Toole, Eamonn] Disk fragmentation is definitely a factor, but as you > say defragging doesn't get you an order of magnitude improvement. > > Are you on Windows or Unix? > [O'Toole, Eamonn] Linux (Ubuntu) > > > > Also...are your records indexed for the updates? > [O'Toole, Eamonn] The only object table index that I can see is the > ix_deleted_name index, and that isn't used by the table update logic. > > > > > > > > Michael D. Black > > Senior Scientist > > Advanced Analytics Directorate > > Advanced GEOINT Solutions Operating Unit > > Northrop Grumman Information Systems > > > > _______________________________________________ > > sqlite-users mailing list > > [email protected] > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ > sqlite-users mailing list > [email protected] > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ > sqlite-users mailing list > [email protected] > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

