Re: Fuseki: strange (and disappointing) performance when compared to a simple servlet that calls ARQ

Andy Seaborne Mon, 07 Sep 2015 09:50:08 -0700

Hi there,

Thanks for the jvisual info - a CSV file only goes so far though. Itseems a lot of time is waiting but that might be an artifact of profile.There are a number for formats jvisual can load (see the file dialogunder "load").

I did see one possible oddity - the latest development build (build20150907.115322-22) has a fix for the form of storage of the in-memorydata. That might help.


        Andy

On 07/09/15 10:49, François-Paul Servant wrote:

Andy,

Le 5 sept. 2015 à 18:18, Andy Seaborne <a...@apache.org> a écrit :

On 05/09/15 16:19, François-Paul Servant wrote:

Le 4 sept. 2015 à 10:21, Rob Vesse <rve...@dotnetrdf.org> a écrit :

You haven't shown your code so I can only guess at what may/may not be
going on


Hi Rob,

note that while the difference in performance is surprising, and that the most 
plausible cause is an error on my side, I’m still concerned with fuseki’s 
performances: if it can do better with the queries I made, it doesn’t seem to 
be a viable solution for me. One of these queries is just part of what is 
displayed at:
http://www.semanlink.net/tag/linked_data.html
(developed years ago with jena)
and I won’t be able to use fuseki if the response time for such a query is is 
the range of seconds.
So I hope that the final answer will be: “here is how to use fuseki correctly, 
and then it will be fast” :-)


It is DESCRIBE that your figures point to not SELECT so let's focus on those.


OK.
Note however that, depending on the query, we may also have significant 
differences with select, cf.:
SELECT ?tag WHERE {
        ?tag skos:broader* tag:science.
}
SIMPLE FIST CALL: 0.172
SIMPLE MEAN: 0.0225
FUSEKI FIST CALL: 3.981
FUSEKI MEAN: 3.1274


Could you please run a profiler on fuseki and run some DESCRIBE tests?


yes I can, and I did, using jvisualvm (but it doesn’t work with too long 
queries). What do you want me to do exactly? I send some output in another 
message to your address


Also - Rob had some questions about the client-side handling of results that 
are important here.


if I understood correctly, the point is to be sure that the client does read a 
complete answer. Here is the code that I use to read the data from one URI (I 
can send the complete test class if you want).

/**
  * get uri and return the result as a string.
  * Increment time in chrono */
public static String getIt(String uri, Client client, MediaType mediaType, 
Chrono ch) {
                if (ch != null) ch.start();
                WebTarget webTarget = client.target(uri);
                Invocation.Builder invocationBuilder = 
webTarget.request(mediaType);
                invocationBuilder.header("Cache-Control", "no-cache");
                invocationBuilder.header("Pragma", "no-cache");
                
                Response response = invocationBuilder.get();
                int status = response.getStatus();
                if (status != 200) {
                        throw new RuntimeException("Unexpected status: " + status + 
" getting " + uri); // TODO
                }
                String s = response.readEntity(String.class);
                if (ch != null) ch.stop();
                return s;
}

When it is a rdf query, I then convert the string to a jena model, I check that 
it contains a decent number of triples, and I check that I get the same number 
of triples returned by my servlet and by fuseki (when the query is supposed to: 
not when it contains a limit clause)

fps


        Andy

fps


Firstly did you actually consume the result set in your servlet?

A ResultSet is typically streamed so the fact that execSelect() returned
doesn't mean the actual query was fully evaluated simply that the first
result is available.  So if you did something like the following:

long start = System.currentTimeMillis();
qe.execSelect()
long elapsed = System.currentTimeMillis() - start;

Then all your have measured is the time to first solution not the time to
get all results so if this is the case you need to ensure you fully
consume the ResultSet somehow (whether by iterating over it, passing it to
some IO method that writes it out, call ResultSetFormatter.consume() on it
etc.) thus forcing ARQ to fully evaluate the query

On the point of IO, did your servlet actually write the results back to
the client since depending on the size of the results that can add
significant overhead relative to the actual query execution and Fuseki is
always going to do this.

Finally most of the queries exhibiting large differences are DESCRIBE
queries which are two pass evaluation, firstly the WHERE clause is
evaluated (via execSelect() internally) and then the description is built.
If your servlet is only calling execSelect() for those queries then it is
only timing the first pass of the WHERE clause (and possibly subject to
timing only the first result as noted above) rather than timing the full
query evaluation which Fuseki will be doing.

Rob

On 03/09/2015 23:19, "François-Paul Servant"
<francoispaulserv...@gmail.com> wrote:

Hi,

shouldn’t we have the same level of performance with Fuseki and with a
simple servlet that calls ARQ?

I hadn’t try fuseki until now. Yesterday, I downloaded the 2.3.0 release,
started the server in a terminal window of my mac (osx 10.10.5) with:
./fuseki-server --mem /ds
I uploaded a rdf file (skos-like data, 21K triples), and I began to make
some queries. I’m used to play with that data in jena memory models, and
to query it. Getting results in Fuseki GUI seemed slow to me, I decided
to compare with a simple servlet that loads a memory model with the same
data on init, and calls ARQ in its doGet method.

I loaded both fuseki and my simple servlet in an instance of tomcat 8,
both loaded with the same data (default graph, memory model), and I
measured the time for some GET queries as seen by a client I wrote using
jersey.

Here are the results. For each sparql query, times with the simple
servlet, and with fuseki: the time for the first call, and the mean when
calling it 10 times (with the simple servlet, it is generally much faster
after the first call, but this is not related to HTTP caching: I took
attention to it, and I verified, in the case of the simple servlet, that
its doGet method gets actually called)
Depending on the query, differences are small, or huge.

PREFIX tag: <http://127.0.0.1:8080/fuseki/ds/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?tag WHERE {
        ?tag skos:broader tag:semantic_web.
}
SIMPLE FIST CALL: 0.039
SIMPLE MEAN: 0.0213
FUSEKI FIST CALL: 0.025
FUSEKI MEAN: 0.0215

PREFIX tag: <http://127.0.0.1:8080/fuseki/ds/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
DESCRIBE ?tag WHERE {
        ?tag skos:broader tag:afrique.
}
SIMPLE FIST CALL: 0.039
SIMPLE MEAN: 0.0216
FUSEKI FIST CALL: 0.485
FUSEKI MEAN: 0.2284

PREFIX tag: <http://127.0.0.1:8080/fuseki/ds/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?tag WHERE {
        ?tag skos:broader* tag:science.
}
SIMPLE FIST CALL: 0.172
SIMPLE MEAN: 0.0225
FUSEKI FIST CALL: 3.981
FUSEKI MEAN: 3.1274

PREFIX tag: <http://127.0.0.1:8080/fuseki/ds/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
DESCRIBE ?tag WHERE {
        ?tag skos:broader* tag:linked_data.
}
SIMPLE FIST CALL: 0.131
SIMPLE MEAN: 0.0417
FUSEKI FIST CALL: 1.46
FUSEKI MEAN: 1.3244

PREFIX tag: <http://127.0.0.1:8080/fuseki/ds/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?tag WHERE {
        ?tag a <http://www.semanlink.net/2001/00/semanlink-schema#Tag>.
}
LIMIT 1000
SIMPLE FIST CALL: 0.07
SIMPLE MEAN: 0.0269
FUSEKI FIST CALL: 0.037
FUSEKI MEAN: 0.024399999999999998

PREFIX tag: <http://127.0.0.1:8080/fuseki/ds/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
DESCRIBE ?tag WHERE {
        ?tag a <http://www.semanlink.net/2001/00/semanlink-schema#Tag>.
}
LIMIT 1000
SIMPLE FIST CALL: 0.181
SIMPLE MEAN: 0.13440000000000002
FUSEKI FIST CALL: 6.471
FUSEKI MEAN: 5.497999999999999

Do you have an explanation?

Best Regards,

fps

Re: Fuseki: strange (and disappointing) performance when compared to a simple servlet that calls ARQ

Reply via email to