Hi Pavel, thanks a lot for your answer. Assuming xWrite is using fwrite
here is what is going on the File System:
In a legacy UNIX File System (UFS) the journaling protects only the
metadata (inode structure directory block indirect block etc..) but not the
data itself.
In more modern File Systems (usually one that are enterprise based like EMC
OneFS on the Isilon product) both data and meta data are journaled.

How journaling works?
The File System has a cache of the File System blocks it deals with (both
metadata and data) when changes are made to a buffer cached block it is
made to the memory only and the set of changes is save to the journal
persistently. When the persistent journal is on disk than saving both data
and meta data changes
takes too long and and only meta data changes are journaled. If the journal
is placed on NVRAM then it is fast enough to save both data and metadata
changes to the journal.
So in any file system that supports journaling fwrite is blocked until all
metadata and data changes are made to the buffer cache and journal is
update with the changes.
The only question than is if the File System keeps a journal of both meta
data and data , if your system has a file system that supports journaling
to both metadata and data blocks than you are theoretically (if there are
no bugs in the FS) guaranteed against data loss in case of system panic or
loss of power.
So in short, fully journaled File System gives you the safety of
synchronized = FULL (or even better) without the huge performance penalty
associated with fsync (or fsyncdada).

Additional Explanation: Why is cheaper to save the changes to the log
rather the whole chached buffer (block)?
Explanation: Each FileSystem block is 8K in size, some of the changes
includes areas in the block that are smaller in size and only these changes
are recorders.
What happens if a change to the File System involves multiple changes to
data blocks as well as metadata blocks like when an fwrite operation
increases the file size and induced an addition of an indirect meta data
block?
Answer: The journal is organized in transactions that each of them is
atomic, so all the buffered cache changes for such operation are put into
the transaction. Only fully completed transaction are replayed when the
system is recovering from a panic or power loss.

In short, in most file systems like UFS using synchronization = NORMAL
makes a lot of sense as data blocks are not protected by the journal,
however with more robust File System that have full journal for metadata as
well as data it makes all the sense in the world to run with
synchronization = OFF and gain the additional performance benefits.

Let me know if I missed something and I hope this makes things clearer.
Shuki




On Sat, Jan 26, 2013 at 10:31 PM, Pavel Ivanov <paiva...@gmail.com> wrote:

> On Sat, Jan 26, 2013 at 6:50 PM, Shuki Sasson <gur.mons...@gmail.com>
> wrote:
> >
> > Hi all, I read the documentation about the synchronization pragma.
> > It got to do with how often xSync method is called.
> > With synchronization = FULL xSync is called after each and every change
> to
> > the DataBase file as far as I understand...
> >
> > Observing the VFS interface used by the SQLITE:
> >
> > typedef struct sqlite3_io_methods sqlite3_io_methods;
> > struct sqlite3_io_methods {
> >   int iVersion;
> >   int (*xClose)(sqlite3_file*);
> >   int (*xRead)(sqlite3_file*, void*, int iAmt, sqlite3_int64 iOfst);
> >   *int (*xWrite)(sqlite3_file*, const void*, int iAmt, sqlite3_int64
> iOfst);*
> >   int (*xTruncate)(sqlite3_file*, sqlite3_int64 size);
> >  * int (*xSync)(sqlite3_file*, int flags);*
> >
> > *
> > *
> >
> > I see both xWrite and xSync...
> >
> > Is this means that xWrite initiate  a FS write to the file?
>
> Yes, in a sense that subsequent read without power cut from the
> machine will return written data.
>
> >
> > Is that means that xSync makes sure that the FS buffered changes are
> > synced to disk?
>
> Yes.
>
> > I guess it is calling fsync in case of LINUX /FreeBSD am I right?
>
> fdatasync() I think.
>
> > If the above is correct and SQLITE operates over modern reliable FS
> > that has journaling with each write, than despite the fact that the
> > write buffer cache are not fully synced they are protected by the FS
> > journal that fully records all the changes to the file and that is
> > going to be replayed in case of a FS mount after a system crash.
> >
> > If  my understanding is correct than assuming the FS journaling  is
> > bullet proof than I can safely operate with synchronization = OFF with
> > SQLITE and still be fully protected by the FS journal in case system
> > crash, right?
>
> I really doubt journaling filesystems work like that. Yes, your file
> will be restored using journal if the journal records made it to disk.
> But FS just can't physically write every record of the journal to disk
> at the moment of that record creation. If it did that your computer
> would be really slow. But as FS doesn't do that fdatasync still makes
> sense if you want to guarantee that when COMMIT execution is finished
> it's safe to cut the power off or crash.
>
> > Meaning synchronization = NORMAL doesn't buy me anything in fact it
> > severely slows the Data Base operations.
> >
> > Am I missing something here?
>
> Please re-check documentation on how journaling FS work.
>
>
> Pavel
> _______________________________________________
> sqlite-dev mailing list
> sqlite-...@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-dev
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to