The SQLite FAQ says [0] "But use caution: this locking mechanism might not work correctly if the database file is kept on an NFS filesystem. This is because fcntl() file locking is broken on many NFS implementations." Based on reading the archives of this mailing list, this statement seems to have caused some confusion as how SQLite works on NFS. I did some research on this for my own transaction log system, which I think will be helpful to SQLite users as well. I am far from an expert on NFS by any means, so don't take anything I say as the final word on this matter. I provide references to substantiate most claims, however some of those sources are questionable.
When I say something is safe or sure to work, generally I mean that the safety is dependent on the bug-free operation of the NFS system, that the administrator hasn't explicitly made things unsafe, and that any correctness issues that apply to NFS are also relevant to local storage. For example, with NFS, it would not be surprising to have an fsync() successfully persist all the data and then return an error due to a networking partition that occurred after the server commits the data to stable storage, but before the client receives confirmation of the commit. However, even with local storage, fsync can return an error even if it successfully commits the data to disk. Thus, there is no real difference between remote and local storage here, except that such situations are much more likely to occur with remote storage than with local storage. Please keep in mind that many NFS clients have had a lot of bugs that affect cache coherency issues and locking issues. So, every time you see an asterisk (*), keep in mind the phrase "in theory but probably not in practice." If the database is static (no system anywhere is updating it), then it is safe* to access it from multiple computers, via NFS or CIFS or basically any other filesystem. Prior to NFSv4, the NFS specification defined no cache coherency requirements. However, all NFS clients ended up implementing "close-to-open" file consistency [3][4], and NFSv4 officially mandated it. But, beware of the evil "nocto" option [4]. Consequently, it should be safe* to share a writable database between multiple clients, as long as every client re-opens the file whenever any *other* system (another client or the server) commits to it. Closing and opening files is the surest way to ensure* NFS client cache coherency, except when the "nocto" option is used. However, the widespread use of "nocto" means that reopening files needs to be combined with the additional file locking described below. By far, the most common requirement for SQLite over NFS seems to be a single NFS client exclusively reading and writing a SQLite database stored on an NFS server, with no need to consider other clients accessing the same database. This is safe* because it is a special case of "close-to-open". The application doesn't need to keep closing and open the database because no other system is committing to it. Additionally, no additional byte-range locks are needed because SQLite's locking convention will work fine for local processes. NFSv4 also requires data cache consistency for fcntl-based locking [5]. If the NFS client implements these cache consistency requirements correctly, and the application uses fcntl-locking in an NFS-compatible-way, and the NFS client updates its copy of the cached file size when a fcntl lock is acquired, then it should be safe* to share a file between multiple clients without having to constantly close/re-open the file. NFSv4-capable clients seem to implement these consistency guarantees* even when they are using earlier versions of the protocol on the wire. The statement quoted from the SQLite FAQ says that SQLite-NFS problems are caused by NFS clients that do not implement locking correctly. While that is definitely true, SQLite is also to blame, because it uses fnctl locks in a way that is incompatible with NFS locking/cache coherency semantics. Section 9.3.2 of the NFSv4 specification [6] describes SQLite's technique and the problem with it well: "For those applications that choose to use file locking instead of share reservations to exclude inconsistent file access, there is an analogous set of constraints that apply to client side data caching. These rules are effective only if the file locking is used in a way that matches in an equivalent way the actual READ and WRITE operations executed. This is as opposed to file locking that is based on pure convention. For example, it is possible to manipulate a two-megabyte file by dividing the file into two one-megabyte regions and protecting access to the two regions by file locks on bytes zero and one. A lock for write on byte zero of the file would represent the right to do READ and WRITE operations on the first region. A lock for write on byte one of the file would represent the right to do READ and WRITE operations on the second region. As long as all applications manipulating the file obey this convention, they will work on a local filesystem. However, they may not work with the NFS version 4 protocol unless clients refrain from data caching." SQLite uses exactly this type of "locking that is based on pure convention," and so consequently we can say that it is never safe to use SQLite on NFS in a read-write situation. Future versions of SQLite can alleviate this problem by ensuring that it never reads from any section of *any* file (database files or log files) that is not protected by a F_RDLCK, and that it never writes to a file or fsyncs it without the modified sections being wrapped in a F_WRLCK. Because a transaction may append data to the file, these locks have to extend past the end of the file. Adding these locks will ensure* that SQLite never reads stale data. Note using open(... | O_DIRECT) generally has a similar affect with regards to data cache coherency. However, the O_DIRECT option is unusable in practice as SQLite's performance depends greatly on the operating system's file caching. Adding these additional byte range locks will also probably degrade performance, but only for NFS, not for local access. Even if these additional locks are added, there is still a problem for some NFS clients. SQLite may resize the file, the NFS client caches the size separately from the data, and fcntl locking ensures cache coherency only for the data cache, not for the cache used for the file size. Fortunately, some platforms, notably Linux and Solaris, revalidate the file size when a lock is acquired [5], even though the NFS specification doesn't require that. For other platforms, some other technique to flush the attribute cache must be used [7], but personally I wouldn't depend on it. [0] http://www.sqlite.org/faq.html#q5 [1] http://www.kernel.org/pub/linux/kernel/v2.6/ [2] http://www.apps.ietf.org/rfc/rfc1813.html#sec-4.11 [3] http://www.netapp.com/library/tr/tr2002.pdf [4] http://nfs.sourceforge.net/#faq_a8 [6] http://www.apps.ietf.org/rfc/rfc3530.html#sec-9.3.2 [7] http://www.unixcoding.org/NFSCoding#head-a092c1477f04fa51abffc99d62cf649296aa4c4c Cheers, Brian _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

