Re: Parallel SELECT queries in Fuseki or Jena's CLI

Jakub Jałowiec Sun, 21 Nov 2021 14:51:18 -0800

Hi Andy,
sorry for late reply.

For this data, what does it look like?
>    How deep are the foaf:knows chains?
>    How many foaf:knows triples overall are there?
>    Are there cycles?
>    (presumably there are shared
>     substrees p1->p2->p3 and p1->p4->p3
>     as well)



To simplify things:
1) foaf:knows chains can be up to ~10 nodes long
2) there are ~1 Million foaf:knows triples (the graph is actually a union
of disconnected complete graphs with ~10 nodes)
3) there are cycles  (e.g. within a single complete graph with ~10 nodes,
the building block of the input graph)
4) there can be shared subtrees (again, e.g. within the single complete
graph)
I suspect it is safe to assume that the 1M triple graph has good locality
because it is a union of smaller disconnected complete graphs.
To give a bit of background, I am artificially generating my 1M triple
graph based on some real-life distributions I obtained in a research
project. I am playing around with those generated graphs in Apache Jena and
trying to perform some benchmarking on it.

If designing a system bottom-to-top for the :p+ and other path-like
> queries, the data can be arranged in datastructures suitable to
> partitioning across CPUs. Neither the in-memory nor TDB data storage is
> designed for that. Mid-scale LPG systems are, typically (adjacency
> layout). That design favours one set of use cases over others.


I found particularly interesting that a non-LPG in-memory version could not
be suitable here. Could you elaborate on that? Do you mean that the limit
here is the available RAM or something else?
Also could you point me to any resources on memory management in Apache
Jena? Is it possible to run TDB or TDB2 fully in an in-memory mode?

I'm sure that path evaluation could be improved.
> A good first step is to check that there aren't any basic, silly errors
> in the existing code (it wouldn't be the first time). If none, then
> there are specific algorithms like Floyd–Warshall (and I think there are
> parallel versions of that).


I'm curious how can I get involved in this. Is* abstract public class
PathEngine
<https://github.com/apache/jena/blob/08404760a0ac10c615b006eff52a9a9963ccec0b/jena-arq/src/main/java/org/apache/jena/sparql/path/eval/PathEngine.java>
*&
its implementations (e.g. PathEngineSPARQL
<https://github.com/apache/jena/blob/08404760a0ac10c615b006eff52a9a9963ccec0b/jena-arq/src/main/java/org/apache/jena/sparql/path/eval/PathEngineSPARQL.java>)
a good place to start?

Thanks for your suggestions.

Best regards,
Jakub

sob., 6 lis 2021 o 11:34 Andy Seaborne <[email protected]> napisał(a):

> Hi Jakub,
>
> For this data, what does it look like?
>    How deep are the foaf:knows chains?
>    How many foaf:knows triples overall are there?
>    Are there cycles?
>    (presumably there are shared
>     substrees p1->p2->p3 and p1->p4->p3
>     as well)
>
> If designing a system bottom-to-top for the :p+ and other path-like
> queries, the data can be arranged in datastructures suitable to
> partitioning across CPUs. Neither the in-memory nor TDB data storage is
> designed for that. Mid-scale LPG systems are, typically (adjacency
> layout). That design favours one set of use cases over others.
>
> The actual work per step per node is small so if there is poor
> co-locality to a CPU, then there is a good chance that inter-thread
> costs are going to dominate. Project Loom (java virtual threads) will
> help but not solve that because thread switching will be in the JVM, not
> the OS kernel.
>
> There is also the compute synchronization costs and related impact on
> CPU caches. Stalling to read memory (or competing for L2 cache) or to
> synchronize between threads is expensive because it breaks up the
> single-thread execution pipeline on a core.
>
> So the layout across cores benefits from clustering.
>
> If it done during query execution, rather than ahead-of-time, that more
> cost.
>
> It's a tradeoff - the other use of multiple cores is serving multiple
> requests concurrently - the Fuseki case.
>
> At the moment:
>
> I'm sure that path evaluation could be improved.
>
> A good first step is to check that there aren't any basic, silly errors
> in the existing code (it wouldn't be the first time). If none, then
> there are specific algorithms like Floyd–Warshall (and I think there are
> parallel versions of that).
>
>      Andy
>
>
> On 05/11/2021 12:25, Jakub Jałowiec wrote:
> > I've noticed that the figure from my last message has not been sent
> > properly, here is a link to it: https://kubajal.github.io/covidepid/
> > <https://kubajal.github.io/covidepid/>
> >
> > Best regards,
> > Jakub
> >
> > pt., 5 lis 2021 o 13:00 Jakub Jałowiec <[email protected]
> > <mailto:[email protected]>> napisał(a):
> >
> >     Hi Andy,
> >     thank for your input.
> >     Let me drill it down a little more so I understand it better. Let's
> >     say I have the following query:
> >
> >         prefix foaf: <http://xmlns.com/foaf/0.1/
> >         <http://xmlns.com/foaf/0.1/>>
> >         prefix ex: <http://example.net/ <http://example.net/>>
> >         SELECT ?person1 (count(?label) as ?labelCounter)
> >         WHERE {
> >            ?p1 foaf:knows+ ?p2 .
> >            ?p2 ex:hasLabel ?label .
> >         }
> >         GROUP BY ?p1
> >
> >
> >     Let's also say I have multiple cores on my machine and I want to
> >     speed up a single query just by utilizing their parallelism. The
> >     bottleneck here is the complex property path (/foaf:knows+/).
> >     To speed things up I want to split the search space into chunks
> >     processed by each core. Intuitively to me the best solution would be
> >     to split by /p1/ equally between all the cores (e.g. if I have /N/
> >     persons and /k/ cores then each core receives /N///k/ persons to
> >     evaluate as /p1/). The figure below shows workload distribution I am
> >     trying to achieve (green circles are persons, brown arcs are
> >     instances of the /foaf:knows/ relationship, instances of
> >     /ex:hasLabel/ have been left out).
> >     cores.png
> >     The query engine would start the evaluation at a "person" node
> >     (/p1/ in the query) and then just do a closure of the /foaf:knows
> >     /relationship (/p1 foaf:knows+ p2/). This would require shared
> >     memory between all the threads.
> >     I have three questions:
> >
> >      1. How would the SPARQL query engine know that it needs to split
> >         the workload in the 'per root of the pattern' manner and not in
> >         a different way? Is there a mechanism in the SPARQL interpreter
> >         for that?
> >      2. Can a single transaction be shared between multiple threads
> (cores)?
> >      3. Do I need a transaction if the threads I am running are
> >         guaranteed to not modify anything? (the query is a SELECT so it
> >         is 'read-only')
> >
> >     Best regards,
> >     Jakub
> >
> >     niedz., 31 paź 2021 o 11:29 Andy Seaborne <[email protected]
> >     <mailto:[email protected]>> napisał(a):
> >
> >         Hi Jakub,
> >
> >         The preferred way to have parallel actions on a dataset is via
> >         transactions.
> >
> >         concurrency-howto covers threading within a transaction.
> >         Possible with
> >         further MRSW (multiple reader or single writer) locking.
> >
> >         This is how Fuseki executes multiple requests.  Each HTTP
> >         request that
> >         is executing in true parallel is executed on a separate thread
> and
> >         inside a transaction.
> >
> >         So have each thread start a transaction, execute as many
> sequential
> >         queries as it needs and end the transaction.
> >
> >         In fact, only TDB2 enforces this; TDB1 only enforces it if it has
> >         already been used transactionally. Other datasets are
> >         multiple-reader
> >         safe anyway.  But placing inside a transaction is the correct
> way.
> >
> >                Andy
> >
> >         On 30/10/2021 15:44, Jakub Jałowiec wrote:
> >          > Dear community,
> >          > is there any high-level user interface to execute parallel
> >         SELECT queries in Apache Fuseki or the CLI of Apache Jena?
> >          > I've found a short note on parallelism in Apache Jena here:
> >
> https://jena.apache.org/documentation/notes/concurrency-howto.html
> >         <
> https://jena.apache.org/documentation/notes/concurrency-howto.html>.
> >         But that is not really what I am looking for as it is a general
> >         note on how to implement low-level parallelism in Apache Jena.
> >          > I am interested in analytic benchmarking of Apache Jena.
> >         Ideally, I am looking for something that works out-of-the-box
> >         just for SELECT queries (no need to modify the model in a
> >         parallel fashion or synchronize state etc.).
> >          >
> >          > I'd appreciate any suggestions or pointing out to any
> >         resources as I am new to Apache Jena. I couldn't find a lot in
> >         the archives of the list using the keywords "parallelism" and
> >         "concurrent".
> >          >
> >          > Best regards,
> >          > Jakub
> >          >
> >
>

Re: Parallel SELECT queries in Fuseki or Jena's CLI

Reply via email to