Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Joseph Fernandes Mon, 24 Nov 2014 05:24:13 -0800


----- Original Message -----
From: "James K. Lowden" <jklow...@schemamania.org>
To: sqlite-users@sqlite.org
Sent: Monday, November 24, 2014 3:56:14 AM
Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in 
Glusterfs

On Sat, 22 Nov 2014 23:25:16 -0500 (EST)
Joseph Fernandes <josfe...@redhat.com> wrote:

> 2) Using the changelog to feed the db has another issue i.e freshness
> of data in the DB w.r.t the IO. Few of our data maintainer scanners
> would require the freshness of the feed to be close to real. [...]
> Your thoughts on this. 

If your in-memory LRU structure suffices to describe all "hot" files,
you're in good shape.  Rather than dumping periodically, I would
consider placing it in shared memory and write a virtual table
function for it in SQLite, such that it can be queried directly as
needed.  

To me based on your description your choice isn't how best to use
SQLite in line with I/O, but how best to capture the data such that
they can be aggregated with SQLite at time of update.  That choice is
one of two: 1) capture each I/O event in a sequential file, always
appending, or 2) maintain per-file counts in a hash or map.  Which is
better depends on how much you're willing to pay for each I/O.  By
paying the lookup cost of #2 each time, the total space is smaller and
the maintenance-time computation less.  

> 3) Now that we would use Sqlite3(with WAL) to be direcly feed by the
> IO path(in the absence of changelog) we are looking to get the best
> performance from it. 

Metadata updates to Posix filesystems are seen as so costly that 
fsync(2) on the datafile descriptor doesn't update them.  A separate
sync on the directory is required.  Compared to an in-memory update
(of metadata, in kernel space) and a single fsync call, the price of a
SQLite transaction is enormous, at a guess an order of magnitude more.
Bear in mind that WAL buys you not efficiency but consistency, the very
thing you don't really need.  The data are written sequentially to the
log and then inserted into the table.  You can expect no better than
O(n log n) performance.  Filesystems generally would never tolerate
that, but for your application you'd be the judge. 

>> Ok few questions on the WAL journal mechanism, 
1) As far as I understand(I may be wrong), During a insert or update WAL
 just records it sequentially in WAL file. And during a checkpoint (manual or 
automatic)
 a. Flush the in-memory appends in the WAL file
 b. and then Merges with the actually tables. 
 And therefore check pointing takes a toll on the performance. But if I don't 
 do check point often i.e set auto checkpoint to say 1 GB or 2 GB.
 Following would be the implications of doing so
 a. Read will be slow because now I will have a large amount of data in the log 
    and it would take time to read and collate data from the log. We are fine 
with
    this in our usage case as we DON'T read (select queries) from database in 
the file IO path ever.
    These are done by data maintaining scanner which are scheduled. When tested 
with 1 million records
    and joining two tables and having the WAL log file to grow to 10 gb , 
    it takes almost 1 min to retrieve 1 million records. Which is fine for data 
scanners as they
    the waited for hour-hours.
 b. we will occupy huge space for the WAL Log file.

 Here is the question does WAL during an insert/update in the log file do any 
internal search/sort 
 and then insert/update to the log or just appends the WAL log with the 
incoming insert/update entry ?

2) Glusterfs is not replacing POSIX atime,mtime and ctime with this db (Sqlite 
is not a metadata db). i.e
   stat will always read from actual atime/mtime/ctime of the inode. Therefore 
as mention in [1]
   we don't do any db read operations in the file IO path.

--jkl
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Reply via email to