Re: FUSEKI Concurrent runs issues

Rob Vesse Wed, 29 Jan 2014 09:26:47 -0800

We don't give any specific guarantees about concurrent query performance
because it depends on various factors.

At the TDB layer you have the issue of caches, depending on the query load
you may have different queries modifying the caches which can result in
data one query uses dropping out of the cache forcing it to go to the
memory mapped data structures for the lookups.  Potentially you could get
in a race conditions where the queries keep forcing each other to get
cache misses.

At the ARQ layer the evaluation of the query produces some load on the JVM
(both in terms of heap memory usage and CPU usage) and depending on the
query there may be more CPU/memory usage (or both) needed resulting in
slowdowns for all queries.  On the other hand for simple queries this load
may be light and the queries don't compete for resources too much so the
benefits of concurrent evaluation shows through.

At the Fuseki layer you may also have outbound bandwidth constraints,
generally Fuseki writes results for most formats to the client as soon as
it has them available so if you have poor outbound bandwidth there may be
a bottleneck here.  Also if you chose a format that cannot be stream
written (e.g. Text) then all the results get cached into memory before
being written out which can result in high heap memory usage and you
possibly start to hit swap and GC issues.

In terms of your specific queries your first query makes heavy use of
STR() which as Andy pointed out previously requires looking at the actual
string value of the term which means a round trip to the node table to
lookup this value.  AFAIK there is a cache on top of this so you can get
into the frequent cache miss behaviour I talked about.  In your second
query the functions are predominantly on dates which are inlined in TDB
meaning the value can be read directly from the internal node ID without a
trip to the node table which will be much faster since there is no cache
contention involved.

Rob

On 29/01/2014 05:11, "Ewa Szwed" <ewaszy...@gmail.com> wrote:

>Hi,
>
>Since this group is so responsive I would like to sick advise in another
>field:
>
>
>Area: concurrent calls to Fuseki:
>
>
>I am performing concurrent SPARQL queries against freebase data using
>Fuseki and have noticed that for some queries running them in parallel
>versus in series results in a big difference in running time, whereas for
>others the difference in time is minimal or non-existent.
>
>
>For example my first query (notice new FILTER placement that improves
>performance a lot for me!):
>
>
> prefix fb: <http://rdf.freebase.com/ns/> <http://rdf.freebase.com/ns/>
>
>prefix fn: <http://www.w3.org/2005/xpath-functions#>
><http://www.w3.org/2005/xpath-functions>
>
>prefix xsd: <http://www.w3.org/2001/XMLSchema#>
><http://www.w3.org/2001/XMLSchema>
>
>select ?entity ?mID ?height ?wikipedia_url
>
>where
>
>{
>
>    {
>
>         ?mID_raw fb:type.object.type fb:people.person .
>
>         ?mID_raw fb:type.object.name ?entity .
>
>         ?mID_raw fb:people.person.height_meters ?height_in_meters .
>
>         ?mID_raw fb:common.topic.topic_equivalent_webpage ?wikipedia_url
>.
>
>         FILTER (lang(?entity) = "en" && regex (str(?wikipedia_url),
>"en.wikipedia", "i") && !regex (str(?wikipedia_url), "curid=", "i")) .
>
>    }
>
>    BIND(REPLACE(str(?mID_raw), "http://rdf.freebase.com/ns/";
><http://rdf.freebase.com/ns/>, "") as ?mID)
>
>    BIND(round(xsd:float(?height_in_meters)* xsd:float("100"))/
>xsd:float("100") as ?height_rounded)
>
>    BIND(xsd:float(?height_in_meters)* xsd:float("3.2808") AS
>?height_in_feet)
>
>    BIND(str(?height_in_feet) AS ?feet_str_value)
>
>    BIND(str(floor(xsd:decimal(?feet_str_value))) AS ?feet_final)
>
>    BIND(round(xsd:float(?height_in_feet -
>floor(xsd:decimal(?feet_str_value))) * 12) AS ?inches)
>
>    BIND(str(floor(xsd:decimal(str(?inches)))) as ?inches_final)
>
>    BIND(fn:concat(?feet_final, "' ",?inches_final,"\"
>(",?height_rounded, " m)" ) AS ?height)
>
>}
>
>
>Has the following runtime for a single query: 2 mins, 44 seconds
>
>and for 5 concurrent queries: 24 mins, 27 seconds
>
>Whereas for our second query:
>
>
>  prefix fb: <http://rdf.freebase.com/ns/> <http://rdf.freebase.com/ns/>
>
> prefix fn: <http://www.w3.org/2005/xpath-functions#>
><http://www.w3.org/2005/xpath-functions>
>
> select ?entity ?mID ?age_at_death ?wikipedia_url
>
> where
>
>{
>
>   {
>
>        ?mID_raw fb:type.object.type fb:people.person .
>
>        ?mID_raw fb:type.object.type fb:people.deceased_person .
>
>        ?mID_raw fb:type.object.name ?entity .
>
>        ?mID_raw fb:people.deceased_person.date_of_death ?date_of_death .
>
>        ?mID_raw fb:people.person.date_of_birth ?date_of_birth .
>
>        ?mID_raw fb:common.topic.topic_equivalent_webpage ?wikipedia_url .
>
>        FILTER (lang(?entity) = "en" && regex (str(?wikipedia_url),
>"en.wikipedia", "i") && !regex (str(?wikipedia_url), "curid=", "i")).
>
>   }
>
>   BIND(REPLACE(str(?mID_raw), "http://rdf.freebase.com/ns/";
><http://rdf.freebase.com/ns/>, "") as ?mID)
>
>   BIND(fn:year-from-dateTime(?date_of_birth) AS ?year_of_birth)
>
>   BIND(fn:year-from-dateTime(?date_of_death) AS ?year_of_death)
>
>   BIND(str(floor(fn:days-from-duration(?date_of_death -
>?date_of_birth) / 365)) as ?age)
>
>   BIND(fn:concat(?age, " (", ?year_of_birth, "-", ?year_of_death, ")"
>) AS ?age_at_death)
>
>}
>
>
>Has the following runtime for  a single query:  5 mins, 35 seconds
>
>Average for 5 concurrent queries: 5 mins, 35 seconds
>
>Does anybody have any insights why we are seeing such different behavior
>between the two queries when we run them concurrently?
>
>What our expectations should be when we run concurrent queries against
>Fuseki?
>
>I would guess that the time should be more or less the same no matter the
>load but if this is the expectation in general why we see such a big
>difference for first query?
>
>
>Also, for the second query above when executing this query we see 100s of
>lines similar to the following being printed to the log:
>
>05:39:35 WARN  NodeValue            :: Datatype format exception:
>"2008-05-16T09"^^xsd:dateTime
>
>We know that this problems originates with the import - we got a number of
>WARNs while importing the data using tdbloader.
>
>When we remove the bindings we do not see these Warnings in log and the
>query runs a lot faster. Any ideas how to overcome this?

Re: FUSEKI Concurrent runs issues

Reply via email to