Hi guys, Under the radar and as a in-my-free-time experiment I have made a very simple IPC for the tracker-store branch that uses a raw Unix Socket instead of D-Bus for the BatchSparqlUpdate and the BatchCommit calls[0].
In tracker-store you use BatchSparqlUpdate and BatchCommit for requesting storage of larger amounts of your metadata data to Tracker. With the Unix-Socket based IPC I made you instead have a library and two async functions: /* The define is just to help a bit with the layout of this E-mail */ #define X TrackerSocketIpcSparqlUpdateCallback typedef void (*X) (GError *error, gpointer user_data); void tracker_socket_ipc_queue_sparql_update (const gchar *sparql, X callback, gpointer user_data, GDestroyNotify destroy); void tracker_socket_ipc_queue_commit (X callback, gpointer user_data, GDestroyNotify destroy); Looking at the implementation you'll find GIOChannels at both the client and the service side. Meaning that this isn't using a thread, but instead is using the GMainLoop of tracker-store, service side, and your own application, client side. The improvised protocol has fixed sized 'commands' which allowed me to keep it simple and read the data in blocking I/O. Which means that the recv() call will block until all requested data arrived. it goes like this: > UPDATE {0000000000} {0000000033}\nINSERT { <test0> a nfo:Document } > UPDATE {0000000001} {0000000033}\nINSERT { <test1> a nfo:Document } < OK:0000000000:{0000000004}:none > UPDATE {0000000002} {0000000033}\nINSERT { <test2> a nfo:Document } < OK:0000000001:{0000000004}:none < OK:0000000002:{0000000004}:none > COMMIT {0000000003} {0000000006}\nCOMMIT < OK:0000000003:{0000000004}:none It might be slightly better for scheduling of the two processes involved if this wasn't done blocking. Already is the difference in throughput impressive. I'm already pipelining so the tracker-store process wont require you to wait for the OK or ER reply, you can simply continue pumping UPDATEs and COMMITs while receiving your OKs and ERs. Internally it uses a queue that runs on a lower priority on the GMainLoop of tracker-store, than the GIOChannel (which of course also uses a GSource, just like the queue does. But at a higher priority). The numbers are kinda meaningless for comparison with our current D-Bus communication, because we have not yet tried converting the format of the DBusMessage that we send to an array of strings (which would marshall and demarshall a lot faster than a dict, which is what we currently use). Fair is fair, but I want to stress this as a warning. The Unix-Socket IPC experiment doesn't need any marshalling nor demarshalling other than the client having to make the SPARQL Update sentence. Jürg has made an API called TrackerSparqlQueryBuilder that'll help a developer making such queries (it's available in tracker-store branch) [1]. To nonetheless give you a figure (because the difference is impressive): A test that Jürg did with the current DBusMessage format, we achieved transferring 12,000 statements in about 1.3 seconds. We indeed already grouped the statements so that the DBusMessages would be about 4k in size each. Don't worry, we too know about this in D-Bus world. With my test I achieved 10,000 statements in about 0.135 seconds and 100,000 statements in 1.6 seconds. Without grouping of statements. A Unix Socket is also page-based, so grouping would make this even faster. We have not yet decided that we'll support this experimental metadata import mode officially. At this moment is this Unix-Socket IPC stuff a nice experiment which will help us identify other bottlenecks that are more urgent to solve. Note that for 'Query' I would first have to make a serialization format to serialize the resulting statements of a query to send() them to the application's process. This being merely an experiment I have not yet done this. Serializing a bunch of strings wouldn't be very hard. I personally think that such a 'Query' will or would probably be faster than letting app processes directly access the sqlite3 db. Because each connection requires a new page-cache, too. A Unix-Socket means a send, a recv, a memcpy to skb and a memcpy from kernel space to user-space of the receiving process. It's kinda hard to beat that in raw performance and it's unlikely that processes will ever require more throughput of data, or that due to other bottlenecks we could ever deliver more throughput. I made a little test app which you can find at [2]. It illustrates how a app developer would use the API. Unlike with D-Bus there wouldn't be activation of tracker-store, though. Instead your callback would be called with its GError set, indicating that the service was not ready for your request. -- [0] http://git.gnome.org/cgit/tracker/log/?h=tracker-store-ipc [1] http://git.gnome.org/cgit/tracker/tree/src/libtracker-common/tracker-sparql-builder.vala?h=tracker-store [2] http://git.gnome.org/cgit/tracker/tree/src/libtracker-socket-ipc/tracker-socket-ipc-test.c?h=tracker-store-ipc -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be _______________________________________________ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list