On Sun, 03 May 2015 13:09:43 -0700 Scott Doctor <scott at scottdoctor.com> wrote:
> I wonder why the network developers have not appropriately addressed > this issue. Looks like I need to research this problem more before > implementing. >From one perspective -- a common use case -- they *have* addressed this issue. But first let's be clear on what's going on and why network filesystems are absolutely, positively unreliable for SQLite. Consider a 1-byte file containing the letter 'A'. Two processes open it read-write, and read the file. The first time read(2) is called, the 'A' is read from the server's file into the client NFS (or whatever) driver. Thereafter, on each machine, each iteration of lseek(fd, 0, 0, SEEK_SET); read(fd, buf, 1); will read from the filebuffer cache, not from the server, at least for a while. Now the first machine writes 'B' to the file. On that machine, the above iteration returns 'B' because that machine's filebuffer cache was updated when the 'B' was sent to the server. The first machine's cache is coherent with the file contents on the server. The second machine is *not* notified of the update. Reads continue to be satisfied from cache, and the cache continues to contain 'A'. The cache is inconsistent with the file contents on the server. That is the way most remote filesystems are designed and implemented and documented. Cf. http://www.ietf.org/rfc/rfc1813.txt: 4.11 Caching policies The NFS version 3 protocol does not define a policy for caching on the client or server. In particular, there is no support for strict cache consistency between a client and server, nor between different clients. See [Kazar] for a discussion of the issues of cache synchronization and mechanisms in several distributed file systems. If you can find documentation for cache semantics for CIFS, I'd be interested to see it. This is why I keep reminding the list that problems with remote filesystems aren't due to "bugs". They work as designed and as documented. They just don't work as expected, when "expected" means, "just like local filesystems (only slower)". A moment's thought reveals why they work this way. Network-wide client-cache coherency is has O(n) complexity where N is the number of clients. It's fraught with timing and performance issues. We haven't even mentioned SQLite per se. I don't know where SQLite tracks table-update status. I do know that some of the filebuffer cache data are in SQLite variables. Some invariants about thread-of-control consistency of those variables do not hold when the underlying file is changed by an "external" process, which is what a SQLite client running on another machine is. The situation invites textbook write-after-read cache-coherency problems. The problems that SQLite has with network filesystems isn't the least mysterious if you scratch the surface. RFC 1813 is in the public domain, and references several papers that are freely available. One evening will convince you. You might ask, if the problems are hard but solveable, why not solve them anyway, and perhaps make the sematics configurable on a correctness-performance tradeoff? I think the answer is that most applications don't require it! Most remote-file applications read the entire file into memory, modify the data in memory, and write the file back out minutes or hours later when the user saves it. For them, last-writer-wins is OK. For applications that need to coordinate comingled writes to a shared file, the dominant solution is a server-based DBMS such as Postgres. --jkl