On 08/10/2013 10:28 AM, Christian Boos wrote: > Hello Eli! :) Missed working on Trac; finally had something I *had* to deal with that gave me enough time to tackle this.
> On 8/10/2013 4:54 AM, Eli Carter wrote: >> All, >> >> Everyone knows the "database is locked" issue with sqlite. According to >> http://trac.edgewall.org/wiki/MostFrequentDuplicates it should be gone >> with 0.12dev. But, it isn't; I have a way to get that error for 15 >> minutes at a time on 0.12.5. > > Indeed. As with any request taking an unreasonable amount of time, the > whole-db-lock nature of SQLite makes it prone to this error when one > tries to write to it, like a lot of requests do, if only for saving > cookies or trying to persist the session. > >> >> The root of the problem that I'm seeing is that viewing an individual >> file in the source browser will take 15 minutes if that file has only >> one commit in the repository history. This is because of the SQL query >> run in CachedRepository._next_prev_rev() which is called in two forms >> during the request. That SQL query is searching the node_change table, >> and for a node with a single commit, will return no results. In my >> particular case, that table has 3.5 million entries (from one svn repo). >> Running those two queries will take a total of ~950 seconds to complete >> on my machine. During that time, if a commit is made to the repository, >> every request will attempt to sync the repository, and fail because it >> cannot get a write lock. This includes requests that would otherwise be >> read-only operations. > > Well, the sync-per-request problem is simple to avoid: > http://trac.edgewall.org/wiki/TracRepositoryAdmin#ExplicitSync > ... which doesn't mean we can't improve the case of the implicit sync: Well, simple when things work well... not so simple when the explicit sync fails because someone has a 15 minute lock on the database... the post-commit hook needs to return quickly and not fail, but also needs to retry the explicit sync sometime in the future if the sync fails. Looks like the contrib/trac-svn-hook is not that sophisticated. >> I'd like to improve this situation in a few ways: >> >> 1. Add a try/except around repo.sync() and turn that exception into a >> add_warning(). While this does nothing to address the real underlying >> issue, this makes the problem much less painful and allows the user to >> at least look at their data. (db-locked-mitigation-part1.patch) > > Maybe we should add a hint about configuring explicit sync, something like: > "If this is a recurring problem, you can set up an _explicit > synchronization_" (with that linking to the same URL as given above). I didn't know about [trac] repository_sync_per_request; glad to see that's already configurable. I'll rework this patch with that. Right now it's doing an "except Exception", which is a bit broad; should I narrow that to the SQLite-specific exception for a locked db? I can make it fall back to Exception if the import of sqlite fails so those who use other DBs don't have have to have sqlite installed. >> 2. Add an explicit duration check around sql execute() calls when [trac] >> debug_sql is enabled, and log SQL operations that take over 10 seconds >> to complete at the warn level. This will let an admin enable debug_sql >> and set the logging level to INFO and see that the database is the >> bottleneck, and what query is the cause. Identifying the exact query >> will allow developers to figure out the root of the issue and identify >> the specific operations that limit scalability. >> (db-locked-mitigation-part2.patch) > > Good idea. I'll commit this change first then. >> 3. Address half of the root problem by making the >> CachedRepository.previous_rev() call self.repo.previous_rev() >> unconditionally and not use the database cache. The database solution >> will take time based on the size of the node_change table, but the >> repository already has links to parent revs and can find the information >> much more directly. In the case I was testing, the previous_rev() call >> was taking over 200 seconds, and with the patch now takes under a >> second. (part of db-locked-mitigation-part3.patch) > > Ok, that seems to help. The "real" solution would be to find a > appropriate way to index that table and optimize that query. I tried a > few things so far, but without success as simply adding an index on the > repo/path won't help as SQLite won't use it with that query (EXPLAIN > QUERY PLAN is helpful here). Yeah, I blindly attempted adding that index and compared performance: no change, as you already knew. IIRC, the indexing took ~7 minutes and added nearly 400MB to the .db file. >> 4. That leaves the second half of the root problem; namely, that >> CachedRepository.next_rev() takes a very long time under these >> conditions. (12+ minutes for my particular testcase.) Worse, >> self.repo.next_rev() takes an even longer time. (Did not complete over >> lunch.) > > That's why in #8813 I said "better not use directly the svnfs API and > use the DB instead". But we should really make a difference here between > next_rev and prev_rev, at least until we find a way to make the DB query > faster. I'll split the part3 patch in two; this first, and the rest second. > Note that we already have a ticket for improving this specific "query of > hell", it's #11104 (reported in the context of changesets and MySQL, but > it's the same thing). > >> While I do not have a solution for this, making the problem >> visible is a step in the right direction. Therefore, I added a duration >> check around the next_rev() call which logs a warning if that takes more >> than 10 seconds. The warning includes a suggestion to enable [trac] >> debug_sql to lead an admin to the database level check from part 2. >> (the other part of db-locked-mitigation-part3.patch) It appears that on >> my system this is I/O bound in the sqlite database, leading me to doubt >> that Postgresql would solve the problem in my case, though I have not >> attempted that. This needs further investigation. > > One "solution" proposed in #8639 was to make an indirect link (e.g. > ?rev_following=...) with the link showing a hint about the risk that the > request could take a long time to complete. The purpose of the page is to show the content of that node; the link to adjacent versions is secondary to that. I think making the link indirect until there is a better solution would be a good thing. >> I think patches 1 and 2 should be reasonably uncontroversial. Patch #3 >> may warrant some discussion and benchmarking. The patches are all >> against the 0.12-stable branch, but they will also apply (with offsets) >> to trunk. > > I'll rebase and try them out. Thanks a lot for throwing in some work and > new ideas on this topic. Together with the mergeinfo issue that Jun has > started to tackle recently (#11219), those are some of the worst > performance issues we have. > >> >> If feedback on the patches is favorable, I'll commit them to 0.12-stable >> and trunk. (Assuming I still can... it's been a while.) > > No worries, you don't get removed unless you ask to be removed ;-) > OTOH, if you forgot your password in the meantime, just PM me. I believe I still have it. Thanks. :) Eli -- You received this message because you are subscribed to the Google Groups "Trac Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/trac-dev. For more options, visit https://groups.google.com/groups/opt_out.
