Hi Jakub,
For this data, what does it look like?
How deep are the foaf:knows chains?
How many foaf:knows triples overall are there?
Are there cycles?
(presumably there are shared
substrees p1->p2->p3 and p1->p4->p3
as well)
If designing a system bottom-to-top for the :p+ and other path-like
queries, the data can be arranged in datastructures suitable to
partitioning across CPUs. Neither the in-memory nor TDB data storage is
designed for that. Mid-scale LPG systems are, typically (adjacency
layout). That design favours one set of use cases over others.
The actual work per step per node is small so if there is poor
co-locality to a CPU, then there is a good chance that inter-thread
costs are going to dominate. Project Loom (java virtual threads) will
help but not solve that because thread switching will be in the JVM, not
the OS kernel.
There is also the compute synchronization costs and related impact on
CPU caches. Stalling to read memory (or competing for L2 cache) or to
synchronize between threads is expensive because it breaks up the
single-thread execution pipeline on a core.
So the layout across cores benefits from clustering.
If it done during query execution, rather than ahead-of-time, that more
cost.
It's a tradeoff - the other use of multiple cores is serving multiple
requests concurrently - the Fuseki case.
At the moment:
I'm sure that path evaluation could be improved.
A good first step is to check that there aren't any basic, silly errors
in the existing code (it wouldn't be the first time). If none, then
there are specific algorithms like Floyd–Warshall (and I think there are
parallel versions of that).
Andy
On 05/11/2021 12:25, Jakub Jałowiec wrote:
I've noticed that the figure from my last message has not been sent
properly, here is a link to it: https://kubajal.github.io/covidepid/
<https://kubajal.github.io/covidepid/>
Best regards,
Jakub
pt., 5 lis 2021 o 13:00 Jakub Jałowiec <[email protected]
<mailto:[email protected]>> napisał(a):
Hi Andy,
thank for your input.
Let me drill it down a little more so I understand it better. Let's
say I have the following query:
prefix foaf: <http://xmlns.com/foaf/0.1/
<http://xmlns.com/foaf/0.1/>>
prefix ex: <http://example.net/ <http://example.net/>>
SELECT ?person1 (count(?label) as ?labelCounter)
WHERE {
?p1 foaf:knows+ ?p2 .
?p2 ex:hasLabel ?label .
}
GROUP BY ?p1
Let's also say I have multiple cores on my machine and I want to
speed up a single query just by utilizing their parallelism. The
bottleneck here is the complex property path (/foaf:knows+/).
To speed things up I want to split the search space into chunks
processed by each core. Intuitively to me the best solution would be
to split by /p1/ equally between all the cores (e.g. if I have /N/
persons and /k/ cores then each core receives /N///k/ persons to
evaluate as /p1/). The figure below shows workload distribution I am
trying to achieve (green circles are persons, brown arcs are
instances of the /foaf:knows/ relationship, instances of
/ex:hasLabel/ have been left out).
cores.png
The query engine would start the evaluation at a "person" node
(/p1/ in the query) and then just do a closure of the /foaf:knows
/relationship (/p1 foaf:knows+ p2/). This would require shared
memory between all the threads.
I have three questions:
1. How would the SPARQL query engine know that it needs to split
the workload in the 'per root of the pattern' manner and not in
a different way? Is there a mechanism in the SPARQL interpreter
for that?
2. Can a single transaction be shared between multiple threads (cores)?
3. Do I need a transaction if the threads I am running are
guaranteed to not modify anything? (the query is a SELECT so it
is 'read-only')
Best regards,
Jakub
niedz., 31 paź 2021 o 11:29 Andy Seaborne <[email protected]
<mailto:[email protected]>> napisał(a):
Hi Jakub,
The preferred way to have parallel actions on a dataset is via
transactions.
concurrency-howto covers threading within a transaction.
Possible with
further MRSW (multiple reader or single writer) locking.
This is how Fuseki executes multiple requests. Each HTTP
request that
is executing in true parallel is executed on a separate thread and
inside a transaction.
So have each thread start a transaction, execute as many sequential
queries as it needs and end the transaction.
In fact, only TDB2 enforces this; TDB1 only enforces it if it has
already been used transactionally. Other datasets are
multiple-reader
safe anyway. But placing inside a transaction is the correct way.
Andy
On 30/10/2021 15:44, Jakub Jałowiec wrote:
> Dear community,
> is there any high-level user interface to execute parallel
SELECT queries in Apache Fuseki or the CLI of Apache Jena?
> I've found a short note on parallelism in Apache Jena here:
https://jena.apache.org/documentation/notes/concurrency-howto.html
<https://jena.apache.org/documentation/notes/concurrency-howto.html>.
But that is not really what I am looking for as it is a general
note on how to implement low-level parallelism in Apache Jena.
> I am interested in analytic benchmarking of Apache Jena.
Ideally, I am looking for something that works out-of-the-box
just for SELECT queries (no need to modify the model in a
parallel fashion or synchronize state etc.).
>
> I'd appreciate any suggestions or pointing out to any
resources as I am new to Apache Jena. I couldn't find a lot in
the archives of the list using the keywords "parallelism" and
"concurrent".
>
> Best regards,
> Jakub
>