On Tue, 14 Jun 2016 10:49:05 +0900 ?????????? <nat...@zenlok.com> wrote:
> > On 13 Jun 2016, at 10:13pm, Richard Hipp <d...@sqlite.org> wrote: > > > > The rename-is-atomic assumption is so wide-spread in the Linux > > world, that the linux kernel was modified to make renames closer to > > being atomic on common filesystems such as EXT4. > > http://man7.org/linux/man-pages/man2/rename.2.html rename(2) *is* atomic. That doesn't mean it's synchronous with respect to external storage. It only means that no two processes will ever see the file "in flight" in two places. If process A calls rename(N,M), at no point will process B have acceess to both N and M. Once M is available, N is extinquished. That's a useful property for a process that succeeds, and for which the OS successfully flushes the data to disk. When Richard says rename isn't atomic, he means that it's not synchronous with respect to the disk. It makes no guarantee that the directory entries were updated on disk. The rename happens in the kernel's filesystem memory structures, which *eventually* are persisted to disk. I have heard that that time lag may be measured in seconds. > I am interested to know what it would take to make linux renames > fully atomic. Reading it as is it feels like the action of rename > would be the most important piece to making rename atomic. The docs > claim this is atomic. What other aspects would be necessary? To make Linux rename fully synchronous is technically infeasible and politically impossible. On the political side, the preference in Linux is invariably for performance, often at ever-finer divisions of responsibility. As an example, Unix fsync(2) traditionally updated both the file and its metadata; Linux divided those into fsync and fdatasync, and added the requirement to call fsync on the directory. What was once a single call became 2 or 3. As a technical matter, it's really infeasible because there are too many moving parts: kernel, filesystem driver, and hardware. It is possible for a human being to know what kind of disk is installed and how configured, and to know the semantics of a given filesystem. It is not possible for the kernel to patrol all those things, and hence the kernel cannot make any guarantees about them. (To take an extreme example: NFS.) By the way, every DBMS I know anything about (and SQLite no exception), tends to eschew OS services except at the most minimal level. The internals of a DBMS carry a lot of state information unavailable to the kernel that the DBMS uses to prioritize how memory is used and when and where I/O is required. That's why every DBMS has its own logging mechnism, and some bypass the filesystem altogether. --jkl _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users