Re: [sqlite] presentation about ordering and atomicity of filesystems

Scott Robison Sat, 13 Sep 2014 14:33:56 -0700

On Sat, Sep 13, 2014 at 2:53 PM, Howard Chu <h...@symas.com> wrote:

> Scott Robison wrote:
>
>> On Sat, Sep 13, 2014 at 2:24 PM, Howard Chu <h...@symas.com> wrote:
>>
>>  Scott Robison wrote:
>>>
>>>  A couple of academic thoughts.
>>>>
>>>>
>>>  1. If one wanted to embed the journal within the database, would it be
>>>> adequate to reserve a specific page as the "root" page of the journal,
>>>> then
>>>> allocate the remaining pages as normal (either to the journal or the
>>>> main
>>>> database)? This does leave the big hole problem so it may still not be
>>>> ideal, but it would give you a known location to find the beginning of
>>>> the
>>>> journal without doubling the database size or requiring an extra file.
>>>>
>>>>
>>> Starting with a known location is definitely a step in the right
>>> direction.
>>>
>>>   2. Building on 1, could sparse files be used to accomplish this? Seek
>>> to
>>>
>>>> "really big constant offset" and do all journaling operations at that
>>>> point, allowing the operating system to manage actual disk allocation?
>>>> If
>>>>
>>>>
>>> We're talking about implementing a filesystem. "the operating system" is
>>> your own code, in this case, you don't get to foist the work off onto
>>> anyone else.
>>>
>>
>>
>> No, Simon's original question was to the effect of why doesn't SQLite just
>> use the already open database file for journaling purposes as well.
>>
>
> OK, maybe I missed that, but I thought that question itself arose from how
> to use SQLite to implement a filesystem, on a raw partition. And the answer
> to that question (operating inside a raw partition) could apply equally
> well to operating inside a single file.
>
> If you preassign a fixed maximum size to the file, you could e.g. reserve
> the tail of the file for the journal, growing backward toward the head of
> the file, while the main data grows the usual direction from the head of
> the file toward the tail. This would basically be your (2) above. On HDDs
> this approach would have horrible seek latencies but it could work OK on
> SSDs.
>
> The other point though - like the existing journaling filesystems, you
> should not limit yourself to using a single file/storage device. Allow the
> option of storing the journal somewhere else - the performance potential is
> worth it.
>
>  My
>> point 1 was in response to the need to know where the journal file is, so
>> just pick a dedicated page in the file as the root page of the journal,
>> allowing the two files to be co-mingled. It doesn't address every possible
>> bad reason for co-mingling the data, but it would at least answer the
>> question "how do you find the journal".
>>
>> My second point was about existing SQLite database files that live in a
>> file system managed by some operating system. SQLite already foists that
>> work off on to someone else, this would be no different. It still may be a
>> bad idea, but that's not the reason why it wouldn't work. :)
>>
>>
To be fair, I may have read something out of context. As I said originally,
the questions were academic. I have not thought about these problems to
anywhere near the extent or depth the SQLite devs have. I just was thinking
out loud.


You are absolutely right, there are very good reasons to allow the
possibility to have external journals, perhaps on different physical
devices (much as the test_multiplex vfs uses multiple files) to support
much higher throughput and/or larger database sizes. At the same time,
there certainly could be reasons that keeping everything self contained in
a single file could be useful, and a VFS could be written to accommodate
that (similar to the test_onefile vfs) without needing to modify SQLite.

I was thinking some time back about a VFS that communicates with a network
service. The network service would be a replacement for NFS / SMB / CIFS
that would get file locking semantics "right" through a custom protocol
instead of hoping (and being disappointed) that the system's standard file
handling APIs got it right. The network service would essentially be a page
server. It would likely not be as fast as a local file, but it could allow
distributed access to a shared database in a secure manner.

Anyway, my point with the last paragraph was that my thought was to use a
SQLite database privately owned by the network service as the container
that would hold the pages read & written by the remote clients. A VFS can
do almost anything it wants as long as it has a reliable means of locking
out access. Not that it would be *easy*, just possible.

Sorry for the ramble there...

-- 
Scott Robison
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] presentation about ordering and atomicity of filesystems

Reply via email to