Re: S4 Status Report

Erik Massop Thu, 09 Jul 2009 11:11:46 -0700

On Thu, 9 Jul 2009 17:51:40 +0200
Sivert Berg <sive...@stud.ntnu.no> wrote:
> Benchmarks: So.. was it all worth it? Well, you tell me:
> (Warning: highly unscientific benchmarks)
> +---------+------------+------------+-------------+
> | Backend | Disk space | List time* | Query time**|
> +---------+------------+------------+-------------+
> | SQlite  | 36.85 MB   | ~10.6 s    | ~0.22 s     |
> | S4      | 18.52 MB   | ~ 2.0 s    | ~0.14 s     |
> +---------+------------+------------+-------------+
> *  xmms2 list > /dev/null
> ** xmms2 mlib search "~foo" AND "id<4000" > /dev/null


I have several comments here.

First of all the indices in the sqlite database (which take a lot of
the diskspace) need to be cleaned up. Also does S4 have
case-insensitive as well as case-sensitive indices?

Secondly a generic match like ~foo (which is equivalent with
the collparser-pattern ":*foo*"), is exactly where S4's stringstore pays
off and sqlite is at a disadvantage. I think it would be fair to also
test the performance of patterns like ":foo" and ":foo*".

Moreover it is worth noting that the SQL-query xmms2 currently
generates is sub-optimal. It is something like this:
 SELECT m0.id FROM Media AS m0
              JOIN Media AS m1 ON m0.id = m1.id AND m1.key = 'album'
              JOIN Media AS m2 ON m0.id = m2.id AND m2.key = 'artist'
 WHERE m0.key = 'title'
 AND (
      (m0.value LIKE '%foo%' OR
       m1.value LIKE '%foo%' OR
       m2.value LIKE '%foo')
      AND m0.id < 4000);
Which in my database takes about 0.027 seconds. (This was measured
directly on the database.)

In coll2 (of which I am currently cleaning the history) it would be
rendered differently. Something like this:
 SELECT id FROM (
  SELECT id FROM Media WHERE key = 'artist' AND value LIKE '%foo%' UNION
  SELECT id FROM Media WHERE key = 'album' AND value LIKE '%foo%' UNION
  SELECT id FROM Media WHERE key = 'title' AND value LIKE '%foo%')
 WHERE id < 4000;
Which in my database takes only 0.003 seconds, which is faster.

> As you can see just listing all entries in the playlist S4 is about
> 5 times as fast, or so it would seem, however running it in callgrind
> reveals that only about 15% of the time is spent in S4, the rest is
> spent packing the values into dicts and whatnot. For SQlite the time
> spent inside the backend is ~55%.

This is a bit weird, as 100% - 55% = 45% of 10.6 is 4.77, which should
be the time spent outside the backend. 4.77 is however more than S4's
2.0 seconds total time?!

> Drawbacks: You didn't think the performance improvements came for free
> did you? The biggest drawback is that S4 isn't ACID-compliant, meaning
> we don't guarantee anything. So if your little kid thinks your
> powerplug looks like a fun toy and unplugs it all bets are off and you
> may lose all data that has not been synced yet. However in most cases
> this will not be a big deal as the medialib are mostly just read.

You don't accidentally have S4 implemented in Berkeley DB as well?


Great work!

Erik Massop / nesciens


P.S. Part of the sqlite session of my timing.
sqlite> .timer on
sqlite> SELECT m0.id FROM Media AS m0 JOIN Media AS m1 ON m0.id = m1.id AND 
m1.key = 'album' JOIN Media AS m2 ON m0.id = m2.id AND m2.key = 'artist' WHERE 
m0.key = 'title' AND ((m0.value LIKE '%foo%' OR m1.value LIKE '%foo%' OR 
m2.value LIKE '%foo') AND m0.id < 4000);
1276
145
155
132
CPU Time: user 0.029998 sys 0.003333
sqlite> SELECT id FROM (SELECT id FROM Media WHERE key = 'artist' AND value 
LIKE '%foo%' UNION SELECT id FROM Media WHERE key = 'album' AND value LIKE 
'%foo%' UNION SELECT id FROM Media WHERE key = 'title' AND value LIKE '%foo%') 
WHERE id < 4000;
132
145
155
1276
CPU Time: user 0.003333 sys 0.000000

--
_______________________________________________
Xmms2-devel mailing list
Xmms2-devel@lists.xmms.se
http://lists.xmms.se/cgi-bin/mailman/listinfo/xmms2-devel

Re: S4 Status Report

Reply via email to