On Tue, Mar 16, 2010 at 06:18:13PM +0300, Max Vlasov scratched on the wall:
> When I read the comments it was obvious that the algorithm uses very simple > approach: > Attach blank database, copy all data, detach, rename. Sure I might be > wrong in details, but generally it looks like this. With one exception. The temporary database is not renamed back to the original database name. Rather, the data is copied back, page by page, from the temp database file back into the original database file. The original database file is then truncated to the correct size. > With this actions journal file(s) for the new database should not contain > much data. That is correct. The comments in the code support this. > So my quess is one only need at most the size of the actual data > from the source base plus very tiny journal file. But in fact (as everyone > sees) it uses much more. Almost. In addition to the original file, you need enough free space for the newly vacuumed database, as well as the journal files for both databases. As mentioned, the journal file for the temporary database is quite small. However, the "copy back" operation creates a journal file on the original database. This file is going to be similar in size to the original database, since every page in that file will be touched as part of the vacuum process. The journal file is required, as any failure or error (including a process or power failure) will require the journal file to recreate the original pre-vacuum database file. > So the question is what is so special about sqlite3RunVacuum that it needs > more space than a simple emulation of its actions? The copy-back operation. Yes, it would be faster and easier to just copy the temp database back over to the original database name, but there are issues with that. First, I'm sure it is nearly impossible to do this as a guaranteed, atomic operation on most OSes and filesystems. That's bad for data security. Second, if this is meant to look like a cleanup operation on the original file, the original file (including any filesystem meta-data) should be kept in-tact. Replacing the file by copying something on top of it won't do this. Last, file manipulations of this sort aren't supported by the current VFS interface, and (IMHO) with good reason. The current vacuum process works regardless of the operating environment, including in-memory databases and any custom VFS modules. Reaching outside of that context to things like filesystem and directory manipulations complicates this. This makes modification of the current vacuum process unlikely. However, it might be nice to have a "VACUUM TO <file>" version of the command. This would allow the user/application to vacuum a database to another non-temporary file and skip the copy-back operation. This would save considerable space and some amount of time. You could then do as you want with the new file. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Our opponent is an alien starship packed with atomic bombs. We have a protractor." "I'll go home and see if I can scrounge up a ruler and a piece of string." --from Anathem by Neal Stephenson _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users