Re: [sqlite] [sqlite-dev] Can I safely use the pragma synchronization = OFF?

Shuki Sasson Mon, 28 Jan 2013 02:57:44 -0800

UFS is not fully journaled FS it jut keeps the metadata.
With fully journaled File System that keeps metadata and data there is
no possibility to loose unsaved data.
Anything that was handed to fwrite and fwrite returned an OK for it is
backed by the journal.
Read the following:
http://en.wikipedia.org/wiki/Journaling_file_system
 Read the following from there:
Physical journals


A *physical journal* logs an advance copy of every block that will later be
written to the main file system. If there is a crash when the main file
system is being written to, the write can simply be replayed to completion
when the file system is next mounted. If there is a crash when the write is
being logged to the journal, the partial write will have a missing or
mismatched checksum and can be ignored at next mount.

Physical journals impose a significant performance penalty because every
changed block must be committed *twice* to storage, but may be acceptable
when *absolute fault protection is
required.*[2]<http://en.wikipedia.org/wiki/Journaling_file_system#cite_note-tweedie-1-2>
[edit<http://en.wikipedia.org/w/index.php?title=Journaling_file_system&action=edit&section=4>
]

Hope this clears things up.
Shuki
On Sun, Jan 27, 2013 at 10:14 PM, Pavel Ivanov <[email protected]> wrote:

> OK. I picked this one:
> http://www.freebsd.org/doc/en/articles/gjournal-desktop/article.html.
> It says:
>
> A journaling file system uses a log to record all transactions that
> take place in the file system, and preserves its integrity in the
> event of a system crash or power failure. Although it is still
> possible to lose unsaved changes to files, journaling almost
> completely eliminates the possibility of file system corruption caused
> by an unclean shutdown.
>
> So with UFS you have guarantees that file system won't corrupt. But
> there's absolutely no durability guarantees ("it is possible to lose
> unsaved changes") and I don't see guarantees that SQLite file format
> won't corrupt (FS may be non-corrupt while file data are bogus). While
> I agree the latter is arguable and could be preserved, durability is a
> big reason to use pragma synchronous = normal. Sure, if you don't care
> about it you may not use that, you may as well use WAL journal mode
> (which AFAIR can also lose some of last changed data with pragma
> synchronous = normal). But still your claim that UFS with full
> journaling is a complete replacement for pragma synchronous = normal
> is false.
>
>
> Pavel
>
> On Sun, Jan 27, 2013 at 5:20 PM, Shuki Sasson <[email protected]>
> wrote:
> > Pick up any book about UFS and read about the journal...
> >
> > Shuki
> >
> > On Sun, Jan 27, 2013 at 7:56 PM, Pavel Ivanov <[email protected]>
> wrote:
> >
> >> > So in any file system that supports journaling fwrite is blocked until
> >> all
> >> > metadata and data changes are made to the buffer cache and journal is
> >> > update with the changes.
> >>
> >> Please give us some links where did you get all this info with the
> >> benchmarks please. Because what you try to convince us is that with
> >> journaling FS write() doesn't return until the journal record is
> >> guaranteed to physically make it to disk. First of all I don't see
> >> what's the benefit of that compared to direct writing to disk not
> >> using write-back cache. And second do you realize that in this case
> >> you can't make more than 30-50 journal records per second? Do you
> >> really believe that for good OS performance it's enough to make less
> >> than 30 calls to write() per second (on any file, not on each file)? I
> >> won't believe that until I see data and benchmarks from reliable
> >> sources.
> >>
> >>
> >> Pavel
> >>
> >>
> >> On Sun, Jan 27, 2013 at 8:53 AM, Shuki Sasson <[email protected]>
> >> wrote:
> >> > Hi Pavel, thanks a lot for your answer. Assuming xWrite is using
> fwrite
> >> > here is what is going on the File System:
> >> > In a legacy UNIX File System (UFS) the journaling protects only the
> >> > metadata (inode structure directory block indirect block etc..) but
> not
> >> the
> >> > data itself.
> >> > In more modern File Systems (usually one that are enterprise based
> like
> >> EMC
> >> > OneFS on the Isilon product) both data and meta data are journaled.
> >> >
> >> > How journaling works?
> >> > The File System has a cache of the File System blocks it deals with
> (both
> >> > metadata and data) when changes are made to a buffer cached block it
> is
> >> > made to the memory only and the set of changes is save to the journal
> >> > persistently. When the persistent journal is on disk than saving both
> >> data
> >> > and meta data changes
> >> > takes too long and and only meta data changes are journaled. If the
> >> journal
> >> > is placed on NVRAM then it is fast enough to save both data and
> metadata
> >> > changes to the journal.
> >> > So in any file system that supports journaling fwrite is blocked until
> >> all
> >> > metadata and data changes are made to the buffer cache and journal is
> >> > update with the changes.
> >> > The only question than is if the File System keeps a journal of both
> meta
> >> > data and data , if your system has a file system that supports
> journaling
> >> > to both metadata and data blocks than you are theoretically (if there
> are
> >> > no bugs in the FS) guaranteed against data loss in case of system
> panic
> >> or
> >> > loss of power.
> >> > So in short, fully journaled File System gives you the safety of
> >> > synchronized = FULL (or even better) without the huge performance
> penalty
> >> > associated with fsync (or fsyncdada).
> >> >
> >> > Additional Explanation: Why is cheaper to save the changes to the log
> >> > rather the whole chached buffer (block)?
> >> > Explanation: Each FileSystem block is 8K in size, some of the changes
> >> > includes areas in the block that are smaller in size and only these
> >> changes
> >> > are recorders.
> >> > What happens if a change to the File System involves multiple changes
> to
> >> > data blocks as well as metadata blocks like when an fwrite operation
> >> > increases the file size and induced an addition of an indirect meta
> data
> >> > block?
> >> > Answer: The journal is organized in transactions that each of them is
> >> > atomic, so all the buffered cache changes for such operation are put
> into
> >> > the transaction. Only fully completed transaction are replayed when
> the
> >> > system is recovering from a panic or power loss.
> >> >
> >> > In short, in most file systems like UFS using synchronization = NORMAL
> >> > makes a lot of sense as data blocks are not protected by the journal,
> >> > however with more robust File System that have full journal for
> metadata
> >> as
> >> > well as data it makes all the sense in the world to run with
> >> > synchronization = OFF and gain the additional performance benefits.
> >> >
> >> > Let me know if I missed something and I hope this makes things
> clearer.
> >> > Shuki
> >> >
> >> >
> >> >
> >> >
> >> > On Sat, Jan 26, 2013 at 10:31 PM, Pavel Ivanov <[email protected]>
> >> wrote:
> >> >
> >> >> On Sat, Jan 26, 2013 at 6:50 PM, Shuki Sasson <[email protected]
> >
> >> >> wrote:
> >> >> >
> >> >> > Hi all, I read the documentation about the synchronization pragma.
> >> >> > It got to do with how often xSync method is called.
> >> >> > With synchronization = FULL xSync is called after each and every
> >> change
> >> >> to
> >> >> > the DataBase file as far as I understand...
> >> >> >
> >> >> > Observing the VFS interface used by the SQLITE:
> >> >> >
> >> >> > typedef struct sqlite3_io_methods sqlite3_io_methods;
> >> >> > struct sqlite3_io_methods {
> >> >> >   int iVersion;
> >> >> >   int (*xClose)(sqlite3_file*);
> >> >> >   int (*xRead)(sqlite3_file*, void*, int iAmt, sqlite3_int64
> iOfst);
> >> >> >   *int (*xWrite)(sqlite3_file*, const void*, int iAmt,
> sqlite3_int64
> >> >> iOfst);*
> >> >> >   int (*xTruncate)(sqlite3_file*, sqlite3_int64 size);
> >> >> >  * int (*xSync)(sqlite3_file*, int flags);*
> >> >> >
> >> >> > *
> >> >> > *
> >> >> >
> >> >> > I see both xWrite and xSync...
> >> >> >
> >> >> > Is this means that xWrite initiate  a FS write to the file?
> >> >>
> >> >> Yes, in a sense that subsequent read without power cut from the
> >> >> machine will return written data.
> >> >>
> >> >> >
> >> >> > Is that means that xSync makes sure that the FS buffered changes
> are
> >> >> > synced to disk?
> >> >>
> >> >> Yes.
> >> >>
> >> >> > I guess it is calling fsync in case of LINUX /FreeBSD am I right?
> >> >>
> >> >> fdatasync() I think.
> >> >>
> >> >> > If the above is correct and SQLITE operates over modern reliable FS
> >> >> > that has journaling with each write, than despite the fact that the
> >> >> > write buffer cache are not fully synced they are protected by the
> FS
> >> >> > journal that fully records all the changes to the file and that is
> >> >> > going to be replayed in case of a FS mount after a system crash.
> >> >> >
> >> >> > If  my understanding is correct than assuming the FS journaling  is
> >> >> > bullet proof than I can safely operate with synchronization = OFF
> with
> >> >> > SQLITE and still be fully protected by the FS journal in case
> system
> >> >> > crash, right?
> >> >>
> >> >> I really doubt journaling filesystems work like that. Yes, your file
> >> >> will be restored using journal if the journal records made it to
> disk.
> >> >> But FS just can't physically write every record of the journal to
> disk
> >> >> at the moment of that record creation. If it did that your computer
> >> >> would be really slow. But as FS doesn't do that fdatasync still makes
> >> >> sense if you want to guarantee that when COMMIT execution is finished
> >> >> it's safe to cut the power off or crash.
> >> >>
> >> >> > Meaning synchronization = NORMAL doesn't buy me anything in fact it
> >> >> > severely slows the Data Base operations.
> >> >> >
> >> >> > Am I missing something here?
> >> >>
> >> >> Please re-check documentation on how journaling FS work.
> >> >>
> >> >>
> >> >> Pavel
> >> >> _______________________________________________
> >> >> sqlite-dev mailing list
> >> >> [email protected]
> >> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-dev
> >> >>
> >> > _______________________________________________
> >> > sqlite-users mailing list
> >> > [email protected]
> >> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >> _______________________________________________
> >> sqlite-users mailing list
> >> [email protected]
> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >>
> > _______________________________________________
> > sqlite-users mailing list
> > [email protected]
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-dev mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-dev
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [sqlite-dev] Can I safely use the pragma synchronization = OFF?

Reply via email to