On Thu, 2010-08-12 at 14:54 -0400, Jamie McCracken wrote:
> your proposal sounds fine  - what are you complaining about?
> 
> Only one thing stands out - direct access. You will surely need IPC to
> signal changes made by a direct access user as well as to receive them

Direct-access is always read-only (only SELECT): They can't make any
changes that would cause a signal (aka. why SQLite's WAL is sufficient
MVCC-ish for our use-case).

Updates go over either traditional D-Bus or D-Bus with FD-passing (the
Steroids D-Bus object). Those can of course cause the signal.

Cheers,

Philip


> On Thu, 2010-08-12 at 20:43 +0200, Philip Van Hoof wrote:
> > Comon guys,
> > 
> > I know I'm a natural born pessimist, and I know I shouldn't be. But
> > still, there must be *something* wrong about this proposal?!
> > 
> > Nobody is commenting at all? You know, the idea of posting it here is to
> > get some discussion going "before" I implement it ;-)
> > 
> > Ping, everybody!
> > 
> > Cheers,
> > 
> > Philip
> > 
> > 
> > On Thu, 2010-08-12 at 15:03 +0200, Philip Van Hoof wrote:
> > > A new class signal for Tracker
> > > 
> > > Today's situation
> > > 
> > > Today we have a simple signal system that causes quite a bit of
> > > overhead which we over time tried to reduce. The overhead comes from: 
> > >      A. Having to store the URIs of the resources involved in a
> > >         changeset in tracker-store's memory; 
> > >      B. Having to store the predicates involved in a changeset in
> > >         tracker-store's memory (although far less severe than #1); 
> > >      C. Having to UTF-8 validate the strings when we emit them over
> > >         D-Bus (D-Bus does this implicitly); 
> > >      D. D-Bus's own copying and handling of string data; 
> > >      E. Heavy traffic on D-Bus; 
> > >      F. Context switching between tracker-store and dbus-daemon; 
> > >      G. We have to wait with turning on the D-Bus objects until after
> > >         we have the latest ontology. So after journal replay. And we
> > >         need to reset the situation after a backup restore. Complex!
> > > Besides this overhead there are problems the consumers have too. I'll
> > > make a list in the next section.
> > > 
> > > Problems of today's signal 
> > >      1. Aforementioned overhead: consumes a lot of D-Bus traffic. This
> > >         is caused by sending over URLs for the subjects and the
> > >         predicates; 
> > >      2. Doesn't make it possible, in case of a delete of <a>, to know
> > >         <b> in <a> nfo:isLogicalPartOf <b>, as <a> is removed at the
> > >         point of signal emission; 
> > >      3. Round trips to know the literals create more D-Bus traffic; 
> > >      4. Transactional changes can't be reliably identified with
> > >         SubjectsAdded, SubjectsChanged and SubjectsRemoved being
> > >         separate signals; 
> > >      5. A lot of D-Bus objects, instead of letting clients use D-Bus's
> > >         filtering system.
> > > 
> > > The drive for a solution
> > > 
> > > Jürg Billeter and me brainstormed a bit about all these problems. Last
> > > few months while optimizing tracker-store's INSERT performance and
> > > memory utilization, we brainstormed a lot about how we could reduce
> > > the overhead. I believe we have a good idea of the current situation,
> > > its internal problems and our current solution (hey of course, we
> > > implemented it :p).
> > > 
> > > We also gained know how about most of the problems consumers have from
> > > the maintainer of libqttracker, Petteri Iridian Kiiskinen. Thanks
> > > Iridian!
> > > 
> > > Today I believe that we must abandon the old ship, redo the signal
> > > system, break the API. Break it all. Get over it, heal our wounds.
> > > Even if that means taking the stress away from all sorts of people
> > > who've been using the old signal system, offering massages, giving out
> > > sauna coupons. You know, the usual stuff that we won't do for real.
> > > Although I'm sure that at a next code-camp in Helsinki we'll have a
> > > good sauna to burn all our own stress away.
> > > 
> > > Anyway ... *shrug*
> > > 
> > > A proposed solution
> > > 
> > > Part one: Direct access
> > > With direct-access we will reduce the round-trip cost of a query from
> > > a consumer who wants a literal object involved in a changeset: it'll
> > > be executed directly on meta.db; you wont use libsqlite's API yourself
> > > but libtracker-sparql. However, libtracker-sparql is for direct-access
> > > a layer on top of aforementioned libsqlite. The so-called "round-trip"
> > > won't even involve IPC: by utilizing the TrackerSparqlCursor API,
> > > you'll end up doing sqlite3_step() in your own process, directly on
> > > meta.db.
> > > 
> > > For the consumers of the signal, this removes 3.
> > > 
> > > Part two: Sending IDs
> > > A while ago we introduced the SPARQL function tracker:id(). The
> > > tracker:id() function gives you a unique number that Tracker's RDF
> > > store internally. It's not RDF, RDF uses subject URL strings. We just
> > > convert this internally for performance reasons, and with tracker:id()
> > > you can access that.
> > > 
> > > Each resource, each class and each predicate (latter two are resources
> > > like any other) have such an unique internal ID.
> > > 
> > > Given that Tracker's class signal system isn't RDF anyway, we decided
> > > not to give you subject URL strings in it anymore. Instead, we'll give
> > > you these integer IDs.
> > > 
> > > This for us removes A, B, C, D and E. For the consumers of the signal,
> > > this removes 1. Whoohoo!
> > > 
> > > Part three: Combine SubjectsAdded and SubjectsChanged, and put
> > > SubjectsRemoved in the same signal
> > > So we give you two arrays: Inserts and Deletes. 
> > > 
> > > For consumers of the signal, this removes 4.
> > > 
> > > Part five: Add the class name to the signal
> > > This allows you to use a string filter on your signal subscription in
> > > D-Bus.
> > > 
> > > For us this removes G. For consumers of the signal, this removes 5.
> > > 
> > > Part six: Pass the object-id for resource objects
> > > You'll get a third number in the Inserts and Deletes arrays:
> > > object-id. We wont send you object literals, although for integral
> > > objects we're still discussing this. But for resource objects we can
> > > without much extra cost give you the object-id.
> > > 
> > > For consumers of the signal, this removes 2. Whoohoo (this was a hard
> > > one)!
> > > 
> > > Part seven: SPARQL IN, tracker:id() and tracker:subject()
> > > We recently added support for SPARQL IN, we already have tracker:id()
> > > and we'll implement tracker:subject().
> > > 
> > > This makes things like this possible:
> > > 
> > > SELECT ?t { ?r nie:title ?t .
> > >             FILTER (tracker:id(?r) IN (800, 801, 802, 807)) }
> > > 
> > > Where 800, 801, 802 and 807 will be the IDs that you receive in the
> > > class signal.
> > > 
> > > The tracker:subject() SPARQL function will allow you to make a very
> > > fast version of this:
> > > 
> > > SELECT ?s { ?s a rdfs:Resource .
> > >             FILTER (tracker:id(?s) IN (800)) }
> > > 
> > > So it would be something like ... (not sure that you can omit { } in
> > > SPARQL, though):
> > > 
> > > SELECT tracker:subject (800)
> > > 
> > > For consumers this removes most of the burden introduced by IDs.
> > > Consumers are also advised to keep a local Map<tracker:id(), subject>
> > > to avoid a lot of SPARQL queries. Although with direct-access it might
> > > be just fine.
> > > 
> > > Part eight: What is left?
> > > 
> > > What is left is context switching between tracker-store and
> > > dbus-daemon, F. But that's our problem. We'll reduce them by grouping
> > > transactions and signals together. It's mostly a problem on ARM
> > > hardware, but yeah that's a major and important target platform for
> > > us. We're on it, we will care about this!
> > > 
> > > Let's take a look!
> > > 
> > > <node name="/org/freedesktop/Tracker1/Resources">
> > >   <interface name="org.freedesktop.Tracker1.Resources.Class">
> > >     <signal name="class-signal">
> > >       <arg type="s" name="class-name" />
> > >       <arg type="a(iii)" name="inserts" />
> > >       <arg type="a(iii)" name="deletes" />
> > >     </signal>
> > >   </interface>
> > > </node>
> > > 
> > > Or in short: sa(iii)a(iii). Here's a bit of pseudo code how it'll look
> > > clientside:
> > > 
> > > void m_callback (cursor) {
> > >   while (cursor.next()) {
> > >    // With direct-access are these c.next()s, sqlite_step() calls
> > >     print ("title: %s", cursor.get_string ());
> > >   }
> > > }
> > > 
> > > void on_signal (class_name, deleted, inserted) {
> > >   string in_qry = "", qry;
> > >   bool first = true;
> > > 
> > >   foreach (insert in inserted) {
> > >     if (insert.subject_id is_in (my_resources)) {
> > >        if (!first) { in_qry += ", "; }
> > >        in_qry += insert.subject_id
> > >        first = false;
> > >     }
> > >   }
> > > 
> > >   qry = string.printf ("SELECT ?titles { ?r nie:title ?titles . 
> > >                         FILTER (tracker:id(?r) IN (%s)) }", in_qry);
> > > 
> > >   connection.query_async (qry, m_callback);
> > > }
> > > 
> > > 
> > > Cheers! :-)
> > > 
> > > Philip
> > > 
> > > 
> > > -- 
> > > 
> > > 
> > > Philip Van Hoof
> > > phi...@codeminded.be
> > > freelance software developer
> > > Codeminded BVBA - http://codeminded.be
> > > _______________________________________________
> > > tracker-list mailing list
> > > tracker-list@gnome.org
> > > http://mail.gnome.org/mailman/listinfo/tracker-list
> > 
> > -- 
> > 
> > 
> > Philip Van Hoof
> > freelance software developer
> > Codeminded BVBA - http://codeminded.be
> > 
> > _______________________________________________
> > tracker-list mailing list
> > tracker-list@gnome.org
> > http://mail.gnome.org/mailman/listinfo/tracker-list
> 
> 
> 

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to