On 19/04/14 18:33, Saud Aljaloud wrote:
Dear Jena folks,
We are investigating how efficient different triple stores, including
Jena TDB, handle literal strings within SPARQL. To this end, We are
now working on benchmarking these triple stores against a set of
specific queries, using the Berlin Benchmark (BSBM) test driver [1],
dataset and matrices[2].
BSBM measures a certain kind of workload (actually, 2 kinds, the explore
and BI). The benchmark driver in their SVN repository is somewhat ahead
of the last formal release. You are actually benchmarking TDB+Fuseki,
not TDB in isolation , because the work load has a significant
proportion of network communication.
There isn't much on matching parts of strings in BSBM.
As Paul observes, a text index can make a big difference.
We are using the latest Jena releases: Jena VERSION: 2.11.1, Fuseki:
VERSION: 1.0.1.
To get the best out of Jena, we would like to ask your valuable
feedback and other optimisations that can boost the performance of
Jena. I should provide more info, but non-public communication with
someone/group from Jena who are willing to be directly contacted by
email is preferable.
Jena is an open source project and works in public. I don't work
offlist unless there is a specific (usually, commercial) reason.
We can discuss TDB here. Not being a product, there is no reason not to
discuss both good and bad features here with the developers. Why are
you suggesting non-public?
There are other benchmark frameworks: eg.
http://www.slideshare.net/RobVesse/practical-sparql-benchmarking
which may be easier to use for a new set of queries and data.
Andy
Configurations are going to be publicly
available later within the benchmark.
Kind Regards,
Saud
[1] http://sourceforge.net/projects/bsbmtools/ [2]
http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/