On 09/01/2010 11:41 AM, Taras Glek wrote: > Hi, > Currently VACUUM takes care of sqlite-level fragmentation. > Unfortunately it does little for fs-level fragmentation since the same > file is being reused. It would be really beneficial for Mozilla > performance if we could get a vacuum/hotcopy mode. > > As I understand it, currently vacuum works by copying out the data to > a new file, then copying it back in and truncating the original db file. > > It would be really nice to instead do something like: > > copy the data to a new file > > swap the underlying filehandles to point at new file > > remove old file > > rename the new file to old name. > > This yields two benefits: > > A less fragmented db > > ~50% vacuum speedup since the data is only copied once > > Currently we can copy the data to a new file, but it is a pretty > invasive change to swap all of the current sqlite connections to the > new file. Things like prepared statements, etc need to be updated for > every single db consumer. Thus it would make sense to have this > feature on the sqlite side. > > Is this reasonable request? > > Thanks, > Taras I sent this last week, wanted to address some issues that were raised about my proposal.
Sorry for the weird(and late!) reply, I'm subscribed via a digest(would be so nice to have a gmane mirror), so I can't reply directly. Richard Hipp: > If other connections have the database file open while it is being vacuumed, > then on unix the other connections will still be left open on the old > unlinked version of the database file and will never see the new content. > And on windows, the file swapping and renaming simply is not allowed while > other connections have the database files open. > The work around is to modify SQLite so that it is constantly closing and > reopening the database files. But that adds rather large overheads that > seem likely to be much greater than any savings seen through a reduction in > disk FS fragmentation. > > I agree with both points. A copying VACUUM should specify that it does not support the multi-connection usecase. It fail abort if it detects another db connection( or have this mentioned in documentation if this detection isn't possible). The wins from avoiding disk fragmentation+copying less data are significant. Punting seems like a reasonable alternative to forcing sqlite to constantly close/open the db. Jay A. Kreibich: > You're also breaking transactional integrity. You need the option of > backing-out of the operation right up until the moment it works, and > this procedure can't do that. For example, if you lose power right > after "remove old file", your database is no longer there. > You are right my original sequence of events was flawed, it should be: copy the data to a new file swap the underlying filehandles to point at new file rename the new file to old name(this also removes old file). This takes care of the "remove old file" problem. > You'd have to do that anyways, for all connections other than the one > that issued the VACUUM command. Coordinating FD swaps across multiple > connections in the same process would be confusing enough-- there > would be no possible way to do it across database connections in > multiple processes. As I said above, I think restricting this to single-connection usecases is reasonable. > Maybe there would be some way to pre-populate the rollback journal > with the full contents of the original database. Then the file could > be truncated before the copy-back procedure. That would make it > clear to the OS that it is free to allocate whatever file blocks it > wants, hopefully in better patterns. The copy back could also be > done in very large chunks. That sounds like a reasonable alternative to me. It does more IO than copy+rename, but I'm mostly interested in avoiding fragmentation here and this solve that issue. > On a personal level, I don't think it is worth it. In the end, you're > still hoping the OS and filesystem will make smart choices about block > allocations. An application shouldn't need to be spending a lot > of time worrying about this level of filesystem performance. No > matter what, you're just hinting and setting up conditions that > should allow the filesystem driver to do something smart and fast. > It may, or it may not, actually do so. I appreciate your feeling on the matter. But there are APIs(ie fallocate) that are specifically designed to minimize fragmentation. The underlying filesystem can not be expected to guess every possible workload correctly. Scott Hess: > I agree with Jay - while it is tempting to have SQLite bite off > optimizing this kind of thing, it's pretty far out of scope. Next > we'll be talking about running SQLite on raw partitions! The recent > addition of SQLITE_FCNTL_CHUNK_SIZE is probably about all the hinting > you can productively manage. > > [In the end, VACUUM itself is kind of a hack, I sometimes suspect that > more than half of the time when it is being used it is probably as a > result of a micro-benchmark finding, and in the long term may actually > be causing more harm than good.] I'm not suggesting raw partitions, though they would not have these icky fragmentation issues :) I strongly disagree about VACUUM is causing more harm then good. A heavily fragmented db (either on sqlite level or fs level) causes noticeably/measurably more sluggish Firefox performance. SQLITE_FCNTL_CHUNK_SIZE solves a different problem. SQLITE_FCNTL_CHUNK_SIZE: grow the database in a way that minimizes fragmentation. Copying VACUUM: Take an existing db and minimize fragmentation. The two features nicely complement each other. To Ben Danper: Indeed. To summarize: This wont work for all scenarios, but it would be immensely useful for the single-connection responsive-io-addicted use-case that is Firefox. Thanks, Taras _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users