Pick up any book about UFS and read about the journal...

Shuki

On Sun, Jan 27, 2013 at 7:56 PM, Pavel Ivanov <paiva...@gmail.com> wrote:

> > So in any file system that supports journaling fwrite is blocked until
> all
> > metadata and data changes are made to the buffer cache and journal is
> > update with the changes.
>
> Please give us some links where did you get all this info with the
> benchmarks please. Because what you try to convince us is that with
> journaling FS write() doesn't return until the journal record is
> guaranteed to physically make it to disk. First of all I don't see
> what's the benefit of that compared to direct writing to disk not
> using write-back cache. And second do you realize that in this case
> you can't make more than 30-50 journal records per second? Do you
> really believe that for good OS performance it's enough to make less
> than 30 calls to write() per second (on any file, not on each file)? I
> won't believe that until I see data and benchmarks from reliable
> sources.
>
>
> Pavel
>
>
> On Sun, Jan 27, 2013 at 8:53 AM, Shuki Sasson <gur.mons...@gmail.com>
> wrote:
> > Hi Pavel, thanks a lot for your answer. Assuming xWrite is using fwrite
> > here is what is going on the File System:
> > In a legacy UNIX File System (UFS) the journaling protects only the
> > metadata (inode structure directory block indirect block etc..) but not
> the
> > data itself.
> > In more modern File Systems (usually one that are enterprise based like
> EMC
> > OneFS on the Isilon product) both data and meta data are journaled.
> >
> > How journaling works?
> > The File System has a cache of the File System blocks it deals with (both
> > metadata and data) when changes are made to a buffer cached block it is
> > made to the memory only and the set of changes is save to the journal
> > persistently. When the persistent journal is on disk than saving both
> data
> > and meta data changes
> > takes too long and and only meta data changes are journaled. If the
> journal
> > is placed on NVRAM then it is fast enough to save both data and metadata
> > changes to the journal.
> > So in any file system that supports journaling fwrite is blocked until
> all
> > metadata and data changes are made to the buffer cache and journal is
> > update with the changes.
> > The only question than is if the File System keeps a journal of both meta
> > data and data , if your system has a file system that supports journaling
> > to both metadata and data blocks than you are theoretically (if there are
> > no bugs in the FS) guaranteed against data loss in case of system panic
> or
> > loss of power.
> > So in short, fully journaled File System gives you the safety of
> > synchronized = FULL (or even better) without the huge performance penalty
> > associated with fsync (or fsyncdada).
> >
> > Additional Explanation: Why is cheaper to save the changes to the log
> > rather the whole chached buffer (block)?
> > Explanation: Each FileSystem block is 8K in size, some of the changes
> > includes areas in the block that are smaller in size and only these
> changes
> > are recorders.
> > What happens if a change to the File System involves multiple changes to
> > data blocks as well as metadata blocks like when an fwrite operation
> > increases the file size and induced an addition of an indirect meta data
> > block?
> > Answer: The journal is organized in transactions that each of them is
> > atomic, so all the buffered cache changes for such operation are put into
> > the transaction. Only fully completed transaction are replayed when the
> > system is recovering from a panic or power loss.
> >
> > In short, in most file systems like UFS using synchronization = NORMAL
> > makes a lot of sense as data blocks are not protected by the journal,
> > however with more robust File System that have full journal for metadata
> as
> > well as data it makes all the sense in the world to run with
> > synchronization = OFF and gain the additional performance benefits.
> >
> > Let me know if I missed something and I hope this makes things clearer.
> > Shuki
> >
> >
> >
> >
> > On Sat, Jan 26, 2013 at 10:31 PM, Pavel Ivanov <paiva...@gmail.com>
> wrote:
> >
> >> On Sat, Jan 26, 2013 at 6:50 PM, Shuki Sasson <gur.mons...@gmail.com>
> >> wrote:
> >> >
> >> > Hi all, I read the documentation about the synchronization pragma.
> >> > It got to do with how often xSync method is called.
> >> > With synchronization = FULL xSync is called after each and every
> change
> >> to
> >> > the DataBase file as far as I understand...
> >> >
> >> > Observing the VFS interface used by the SQLITE:
> >> >
> >> > typedef struct sqlite3_io_methods sqlite3_io_methods;
> >> > struct sqlite3_io_methods {
> >> >   int iVersion;
> >> >   int (*xClose)(sqlite3_file*);
> >> >   int (*xRead)(sqlite3_file*, void*, int iAmt, sqlite3_int64 iOfst);
> >> >   *int (*xWrite)(sqlite3_file*, const void*, int iAmt, sqlite3_int64
> >> iOfst);*
> >> >   int (*xTruncate)(sqlite3_file*, sqlite3_int64 size);
> >> >  * int (*xSync)(sqlite3_file*, int flags);*
> >> >
> >> > *
> >> > *
> >> >
> >> > I see both xWrite and xSync...
> >> >
> >> > Is this means that xWrite initiate  a FS write to the file?
> >>
> >> Yes, in a sense that subsequent read without power cut from the
> >> machine will return written data.
> >>
> >> >
> >> > Is that means that xSync makes sure that the FS buffered changes are
> >> > synced to disk?
> >>
> >> Yes.
> >>
> >> > I guess it is calling fsync in case of LINUX /FreeBSD am I right?
> >>
> >> fdatasync() I think.
> >>
> >> > If the above is correct and SQLITE operates over modern reliable FS
> >> > that has journaling with each write, than despite the fact that the
> >> > write buffer cache are not fully synced they are protected by the FS
> >> > journal that fully records all the changes to the file and that is
> >> > going to be replayed in case of a FS mount after a system crash.
> >> >
> >> > If  my understanding is correct than assuming the FS journaling  is
> >> > bullet proof than I can safely operate with synchronization = OFF with
> >> > SQLITE and still be fully protected by the FS journal in case system
> >> > crash, right?
> >>
> >> I really doubt journaling filesystems work like that. Yes, your file
> >> will be restored using journal if the journal records made it to disk.
> >> But FS just can't physically write every record of the journal to disk
> >> at the moment of that record creation. If it did that your computer
> >> would be really slow. But as FS doesn't do that fdatasync still makes
> >> sense if you want to guarantee that when COMMIT execution is finished
> >> it's safe to cut the power off or crash.
> >>
> >> > Meaning synchronization = NORMAL doesn't buy me anything in fact it
> >> > severely slows the Data Base operations.
> >> >
> >> > Am I missing something here?
> >>
> >> Please re-check documentation on how journaling FS work.
> >>
> >>
> >> Pavel
> >> _______________________________________________
> >> sqlite-dev mailing list
> >> sqlite-...@sqlite.org
> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-dev
> >>
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to