Hello Eli!

On 8/10/2013 4:54 AM, Eli Carter wrote:
> All,
> 
> Everyone knows the "database is locked" issue with sqlite.  According to
> http://trac.edgewall.org/wiki/MostFrequentDuplicates it should be gone
> with 0.12dev.  But, it isn't; I have a way to get that error for 15
> minutes at a time on 0.12.5.

Indeed. As with any request taking an unreasonable amount of time, the
whole-db-lock nature of SQLite makes it prone to this error when one
tries to write to it, like a lot of requests do, if only for saving
cookies or trying to persist the session.

> 
> The root of the problem that I'm seeing is that viewing an individual
> file in the source browser will take 15 minutes if that file has only
> one commit in the repository history.  This is because of the SQL query
> run in CachedRepository._next_prev_rev() which is called in two forms
> during the request.  That SQL query is searching the node_change table,
> and for a node with a single commit, will return no results.  In my
> particular case, that table has 3.5 million entries (from one svn repo).
>  Running those two queries will take a total of ~950 seconds to complete
> on my machine.  During that time, if a commit is made to the repository,
> every request will attempt to sync the repository, and fail because it
> cannot get a write lock.  This includes requests that would otherwise be
> read-only operations.

Well, the sync-per-request problem is simple to avoid:
http://trac.edgewall.org/wiki/TracRepositoryAdmin#ExplicitSync
... which doesn't mean we can't improve the case of the implicit sync:

> 
> I'd like to improve this situation in a few ways:
> 
> 1. Add a try/except around repo.sync() and turn that exception into a
> add_warning().  While this does nothing to address the real underlying
> issue, this makes the problem much less painful and allows the user to
> at least look at their data.  (db-locked-mitigation-part1.patch)

Maybe we should add a hint about configuring explicit sync, something like:
"If this is a recurring problem, you can set up an _explicit
synchronization_" (with that linking to the same URL as given above).

> 
> 2. Add an explicit duration check around sql execute() calls when [trac]
> debug_sql is enabled, and log SQL operations that take over 10 seconds
> to complete at the warn level.  This will let an admin enable debug_sql
> and set the logging level to INFO and see that the database is the
> bottleneck, and what query is the cause.  Identifying the exact query
> will allow developers to figure out the root of the issue and identify
> the specific operations that limit scalability.
> (db-locked-mitigation-part2.patch)

Good idea.

> 
> 3. Address half of the root problem by making the
> CachedRepository.previous_rev() call self.repo.previous_rev()
> unconditionally and not use the database cache.  The database solution
> will take time based on the size of the node_change table, but the
> repository already has links to parent revs and can find the information
> much more directly. In the case I was testing, the previous_rev() call
> was taking over 200 seconds, and with the patch now takes under a
> second.  (part of db-locked-mitigation-part3.patch)

Ok, that seems to help. The "real" solution would be to find a
appropriate way to index that table and optimize that query. I tried a
few things so far, but without success as simply adding an index on the
repo/path won't help as SQLite won't use it with that query (EXPLAIN
QUERY PLAN is helpful here).

> 
> 4. That leaves the second half of the root problem; namely, that
> CachedRepository.next_rev() takes a very long time under these
> conditions.  (12+ minutes for my particular testcase.)  Worse,
> self.repo.next_rev() takes an even longer time.  (Did not complete over
> lunch.)

That's why in #8813 I said "better not use directly the svnfs API and
use the DB instead". But we should really make a difference here between
next_rev and prev_rev, at least until we find a way to make the DB query
faster.

Note that we already have a ticket for improving this specific "query of
hell", it's #11104 (reported in the context of changesets and MySQL, but
it's the same thing).

>  While I do not have a solution for this, making the problem
> visible is a step in the right direction.  Therefore, I added a duration
> check around the next_rev() call which logs a warning if that takes more
> than 10 seconds.  The warning includes a suggestion to enable [trac]
> debug_sql to lead an admin to the database level check from part 2.
> (the other part of db-locked-mitigation-part3.patch)  It appears that on
> my system this is I/O bound in the sqlite database, leading me to doubt
> that Postgresql would solve the problem in my case, though I have not
> attempted that.  This needs further investigation.

One "solution" proposed in #8639 was to make an indirect link (e.g.
?rev_following=...) with the link showing a hint about the risk that the
request could take a long time to complete.

> 
> I think patches 1 and 2 should be reasonably uncontroversial.  Patch #3
> may warrant some discussion and benchmarking.  The patches are all
> against the 0.12-stable branch, but they will also apply (with offsets)
> to trunk.

I'll rebase and try them out. Thanks a lot for throwing in some work and
new ideas on this topic. Together with the mergeinfo issue that Jun has
started to tackle recently (#11219), those are some of the worst
performance issues we have.

> 
> If feedback on the patches is favorable, I'll commit them to 0.12-stable
> and trunk.  (Assuming I still can... it's been a while.)

No worries, you don't get removed unless you ask to be removed ;-)
OTOH, if you forgot your password in the meantime, just PM me.

-- Christian

-- 
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/trac-dev.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to