[Trac-dev] [PATCHES] mitigating "database is locked"

Eli Carter Fri, 09 Aug 2013 19:55:31 -0700

All,

Everyone knows the "database is locked" issue with sqlite.  According to
http://trac.edgewall.org/wiki/MostFrequentDuplicates it should be gone
with 0.12dev.  But, it isn't; I have a way to get that error for 15
minutes at a time on 0.12.5.


The root of the problem that I'm seeing is that viewing an individual
file in the source browser will take 15 minutes if that file has only
one commit in the repository history.  This is because of the SQL query
run in CachedRepository._next_prev_rev() which is called in two forms
during the request.  That SQL query is searching the node_change table,
and for a node with a single commit, will return no results.  In my
particular case, that table has 3.5 million entries (from one svn repo).
 Running those two queries will take a total of ~950 seconds to complete
on my machine.  During that time, if a commit is made to the repository,
every request will attempt to sync the repository, and fail because it
cannot get a write lock.  This includes requests that would otherwise be
read-only operations.

I'd like to improve this situation in a few ways:

1. Add a try/except around repo.sync() and turn that exception into a
add_warning().  While this does nothing to address the real underlying
issue, this makes the problem much less painful and allows the user to
at least look at their data.  (db-locked-mitigation-part1.patch)

2. Add an explicit duration check around sql execute() calls when [trac]
debug_sql is enabled, and log SQL operations that take over 10 seconds
to complete at the warn level.  This will let an admin enable debug_sql
and set the logging level to INFO and see that the database is the
bottleneck, and what query is the cause.  Identifying the exact query
will allow developers to figure out the root of the issue and identify
the specific operations that limit scalability.
(db-locked-mitigation-part2.patch)

3. Address half of the root problem by making the
CachedRepository.previous_rev() call self.repo.previous_rev()
unconditionally and not use the database cache.  The database solution
will take time based on the size of the node_change table, but the
repository already has links to parent revs and can find the information
much more directly. In the case I was testing, the previous_rev() call
was taking over 200 seconds, and with the patch now takes under a
second.  (part of db-locked-mitigation-part3.patch)

4. That leaves the second half of the root problem; namely, that
CachedRepository.next_rev() takes a very long time under these
conditions.  (12+ minutes for my particular testcase.)  Worse,
self.repo.next_rev() takes an even longer time.  (Did not complete over
lunch.)  While I do not have a solution for this, making the problem
visible is a step in the right direction.  Therefore, I added a duration
check around the next_rev() call which logs a warning if that takes more
than 10 seconds.  The warning includes a suggestion to enable [trac]
debug_sql to lead an admin to the database level check from part 2.
(the other part of db-locked-mitigation-part3.patch)  It appears that on
my system this is I/O bound in the sqlite database, leading me to doubt
that Postgresql would solve the problem in my case, though I have not
attempted that.  This needs further investigation.


I think patches 1 and 2 should be reasonably uncontroversial.  Patch #3
may warrant some discussion and benchmarking.  The patches are all
against the 0.12-stable branch, but they will also apply (with offsets)
to trunk.

If feedback on the patches is favorable, I'll commit them to 0.12-stable
and trunk.  (Assuming I still can... it's been a while.)


Questions, comments, concerns?

Eli
----------------------. "If it ain't broke now,
Eli Carter             \              it will be soon." -- crypto-gram
[email protected] `---------------------------------------------

-- 
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/trac-dev.
For more options, visit https://groups.google.com/groups/opt_out.

Index: trac/versioncontrol/api.py
===================================================================
--- trac/versioncontrol/api.py	(revision 11935)
+++ trac/versioncontrol/api.py	(working copy)
@@ -340,7 +340,12 @@
                 try:
                     repo = self.get_repository(reponame)
                     if repo:
-                        repo.sync()
+                        try:
+                            repo.sync()
+                        except Exception, e:
+                            add_warning(req, "Failed to sync with repository: "
+                                "%s: %s; repository information may be out of "
+                                "date." % (e, to_unicode(e.message)))
                     else:
                         self.log.warning("Unable to find repository '%s' for "
                                          "synchronization",

Index: trac/db/util.py
===================================================================
--- trac/db/util.py	(revision 11935)
+++ trac/db/util.py	(working copy)
@@ -15,7 +15,9 @@
 #
 # Author: Christopher Lenz <[email protected]>
 
+import time
 
+
 def sql_escape_percent(sql):
     import re
     return re.sub("'((?:[^']|(?:''))*)'",
@@ -51,9 +53,22 @@
             try:
                 if args:
                     self.log.debug('args: %r', args)
+                    start = time.time()
                     r = self.cursor.execute(sql_escape_percent(sql), args)
                 else:
+                    start = time.time()
                     r = self.cursor.execute(sql)
+                duration = time.time() - start
+                if duration > 10:
+                    if args:
+                        self.log.warn(
+                            "SQL execution took %0.2fs for %r with args %r",
+                            duration, sql, args)
+                    else:
+                        self.log.warn("SQL execution took %0.2fs for %r",
+                            duration, sql)
+                else:
+                    self.log.debug("SQL execution took %0.2fs", duration)
                 rows = getattr(self.cursor, 'rows', None)
                 if rows is not None:
                     self.log.debug("prefetch: %d rows", len(rows))

Index: trac/versioncontrol/cache.py
===================================================================
--- trac/versioncontrol/cache.py	(revision 11935)
+++ trac/versioncontrol/cache.py	(working copy)
@@ -15,6 +15,7 @@
 # Author: Christopher Lenz <[email protected]>
 
 import os
+import time
 
 from trac.cache import cached
 from trac.core import TracError
@@ -349,16 +350,22 @@
         return self.rev_db(self.metadata.get(CACHE_YOUNGEST_REV))
 
     def previous_rev(self, rev, path=''):
-        if self.has_linear_changesets:
-            return self._next_prev_rev('<', rev, path)
-        else:
-            return self.repos.previous_rev(self.normalize_rev(rev), path)
+        # Hitting the repository is faster than searching the database,
+        # particularly as the number of entries in node_change grows large.
+        return self.repos.previous_rev(self.normalize_rev(rev), path)
 
     def next_rev(self, rev, path=''):
+        start = time.time()
         if self.has_linear_changesets:
-            return self._next_prev_rev('>', rev, path)
+            n_rev = self._next_prev_rev('>', rev, path)
         else:
-            return self.repos.next_rev(self.normalize_rev(rev), path)
+            n_rev = self.repos.next_rev(self.normalize_rev(rev), path)
+        duration = time.time() - start
+        if duration > 10:
+            self.log.warn("Finding rev %s -> %s for %r took %0.2fs.  "
+                "Setting [trac] debug_sql = true may provide clues.",
+                rev, n_rev, path, duration)
+        return n_rev
 
     def _next_prev_rev(self, direction, rev, path=''):
         srev = self.db_rev(rev)

[Trac-dev] [PATCHES] mitigating "database is locked"

Reply via email to