Re: [Tracker] Database access abstraction

Jamie McCracken Thu, 06 Nov 2008 06:38:24 -0800

its not that clear cut

firstly, pausing the indexer is an async process and does not have to be
synchronous at all so there is no round trip with direct access. In most
cases the indexer will be idle when querying. A client could even bypass
by checking to see if tracker-indexer is running


dbus has overhead with millions of strings - each one has to be type
verified and copied several times. Unless you are testing with huge
result sets like fetching 100,000 music files with all metadata then any
comparison is invalid. Also as we move to flattened tables the query
time will get faster and the this will make the dbus overhead more
prominent

In other cases, like say using tracker as a gconf backend you will often
get calls to fetch individual keys and these need to be in process to
avoid round trips for each key

ergo its premature to rule out direct access at this point 


jamie


On Thu, 2008-11-06 at 11:29 +0100, Philip Van Hoof wrote:
> Hi, this is a mail that I once sent to a few people at Nokia who wanted
> direct access to SQL too:
> 
> 
> > Hi guys,
> > 
> > I made a very simple test case that selects a LIMIT of 1 until 100 of
> > a few columns out of the Services table.
> > 
> > After I finished the "over DBus" version of it, and while I was
> > measuring its performance, I was already confident that as konttori
> > pointed out too, that the DBus overhead truly is minimal when compared
> > to query time.
> > 
> > I could write the same test with direct access and it would most
> > likely shave of another few tenths of milliseconds. But for a UI
> > application I don't really see the point in that (a mainloop iteration
> > that has to do a few exposes and draws is likely going to take
> > longer).
> > 
> > So I attached a vala app for testing this and I included the generated
> > C source code for it. `pkg-config dbus-glib-1 --cflags --libs` it and
> > for the Vala stuff take a look here:
> > http://live.gnome.org/Vala/DBusSample
> > 
> > So this is over DBus:
> > 
> >         [EMAIL PROTECTED]:~/test$ ./test-sql-tracker 
> >         
> >         ...
> >        
> >         0.035126 seconds elapsed
> >         [EMAIL PROTECTED]:~/test$
> > 
> > 
> > Now with relation to the queuing: 
> > 
> > SQLite has a write-lock per transaction and it keeps all tables in the
> > connection involved in the transaction, locked. A write lock means
> > that while we (Tracker) are writing, you (your process, your
> > connection, your *direct* connection to the SQLite tables indeed) is
> > locked out.
> > 
> > Tracker writes in long transactions because SQLite is 50 or 60 times
> > faster at writing if you group lots of writes together. If you don't
> > do this you also have the fsync() problem more often. This is a
> > similar problem as what Firefox started having when it switched to
> > SQLite for several things.
> > 
> > Short: This means that we can't turn off our use of transactions. It's
> > vital in Tracker's design and performance.
> > 
> > For preempting these transactions we require a very strict
> > communication between the indexer and the front end query mechanism.
> > This implies a synchronous DBus message to the indexer. This DBus
> > message will instruct the indexer to do a preemptive commit of its
> > standing transaction.
> > 
> > This is possible because the transaction is not done for atomicity but
> > is only done for improving the write speed of SQLite (so we can early
> > commit it, and we do that).
> > 
> > So even if we'd make a library that has a direct (in process with the
> > app linking with that library) connection to the SQLite database, then
> > we'd still, just like what trackerd (the front end query mechanism)
> > has to do too, would need to execute a DBus message to request a
> > preemptive commit to the indexer. Making it pointless to do a direct
> > connection in the first place (because you still have the DBus
> > overhead for this one message, anyway).
> > 
> > 
> > Some pointers:
> > 
> > http://www.sqlite.org/faq.html#q5
> > http://www.sqlite.org/lang_transaction.html
> > 
> > 
> > What would be possible to avoid queuing, in case a lot of applications
> > concurrently want to query Tracker, is to introduce a connection pool
> > and a queue for each connection in the pool, to Tracker's query front
> > end.
> > 
> > I must warn that although very much possible, that this solution adds
> > complexity. If it's not proved that concurrent access will very often
> > occur, I don't think that it's worthwhile right now to implement this
> > solution already.
> > 
> > But we can keep it in mind for when the day comes.
> > 
> > /EO my advise ;)
> 
> 
> 
> 
> 
> 
> On Tue, 2008-10-21 at 08:47 -0400, Jamie McCracken wrote:
> > On Tue, 2008-10-21 at 09:36 +0100, Martyn Russell wrote:
> > 
> > > 
> > > > Also in the future i want to support direct access to sqlite via  a
> > > > client lib so we can bypass dbus (and trackerd) for select queries where
> > > > speed is paramount and volume of data is too big for dbus to handle
> > > > optimally (think get all my 100,000 music tracks with metadata). So this
> > > > library would have to handle all querying and any future ones (like
> > > > sparql) - so you will have no problem from me for implementing that
> > > > support in a lib
> > > 
> > > Hmm, I would like to see the difference it makes using DBus and if it
> > > really is an issue. We have an API like this in DBus now which Phillip
> > > added - I really don't like the idea of people executing random SQL on
> > > the databases. It can lead to much bigger problems. Phillip stresses
> > > this in the .xml file where we document this API. I think quite rightly
> > > so too.  
> > 
> > that should be moved to a direct access lib
> > 
> > the advantage of a direcdt access lib is that it removes dbus overhead
> > when large amounts of data are required.
> > 
> > Rob taylor probably knows this better but from what i understand dbus is
> > not optimal for large payloads (~1mb+) and somehting like get all music
> > and metadata might involve a million plus strings which dbus would have
> > to marshall, strdup and validate individually into multiple packets
> > (IIRC packet size is 4kb?) so you are looking at massive overhead with
> > multiple ipc calls
> > 
> > 
> > 
> > _______________________________________________
> > tracker-list mailing list
> > tracker-list@gnome.org
> > http://mail.gnome.org/mailman/listinfo/tracker-list
> > 

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] Database access abstraction

Reply via email to