Re: Parallel SELECT queries in Fuseki or Jena's CLI

ajs6f Tue, 23 Nov 2021 05:59:06 -0800

There are a number of points here, but I'm only qualified to speak to one
of them. :)



On Sun, Nov 21, 2021, 5:51 PM Jakub Jałowiec <j.jalow...@student.uw.edu.pl>
wrote:

> Hi Andy,
> sorry for late reply.
>

<snipped>

Neither the in-memory nor TDB data storage is
> > designed for that. Mid-scale LPG systems are, typically (adjacency
> > layout). That design favours one set of use cases over others.
>
> I found particularly interesting that a non-LPG in-memory version could
> not be suitable here. Could you elaborate on that? Do you mean that the
> limit here is the available RAM or something else?
>

Andy can correct me, but what I understood him to be saying was not that a
non-LPG in-memory data layout _could not_ be designed to meet this use
case, but that our particular current in-memory data layouts _do not_.

In particular, I can speak to the default "TIM" (transactional in-memory)
layout. It adopts a straightforward Java translation of functional data
structures from Scala that make assumptions: that they are running in a
single JVM, with a single contiguous heap, accessed at uniform speed by a
single processor with many threads. One could implement a similar design
with other structures that support partitioning across a network and new
logic to that account, but it would be far from trivial and it's not
obvious to me that you would end up with reasonable performance.

I'm not familiar myself with any open source distributed triplestores that
aren't academic products, but there are a few of those, e.g. TriAD or
Trinity.RDF. I found it useful to read papers about such systems to learn
about the design questions involved.

There has been some discussion of efforts from within the Jena community to
develop a distribution mechanism on top of the extant store designs, but we
haven't been able to get a finished contribution committed.

Adam

Also could you point me to any resources on memory management in Apache
> Jena? Is it possible to run TDB or TDB2 fully in an in-memory mode?
>
> I'm sure that path evaluation could be improved.
> > A good first step is to check that there aren't any basic, silly errors
> > in the existing code (it wouldn't be the first time). If none, then
> > there are specific algorithms like Floyd–Warshall (and I think there are
> > parallel versions of that).
>
>
> I'm curious how can I get involved in this. Is* abstract public class
> PathEngine
> <
> https://github.com/apache/jena/blob/08404760a0ac10c615b006eff52a9a9963ccec0b/jena-arq/src/main/java/org/apache/jena/sparql/path/eval/PathEngine.java
> >
> *&
> its implementations (e.g. PathEngineSPARQL
> <
> https://github.com/apache/jena/blob/08404760a0ac10c615b006eff52a9a9963ccec0b/jena-arq/src/main/java/org/apache/jena/sparql/path/eval/PathEngineSPARQL.java
> >)
> a good place to start?
>
> Thanks for your suggestions.
>
> Best regards,
> Jakub
>
> sob., 6 lis 2021 o 11:34 Andy Seaborne <a...@apache.org> napisał(a):
>
> > Hi Jakub,
> >
> > For this data, what does it look like?
> >    How deep are the foaf:knows chains?
> >    How many foaf:knows triples overall are there?
> >    Are there cycles?
> >    (presumably there are shared
> >     substrees p1->p2->p3 and p1->p4->p3
> >     as well)
> >
> > If designing a system bottom-to-top for the :p+ and other path-like
> > queries, the data can be arranged in datastructures suitable to
> > partitioning across CPUs. Neither the in-memory nor TDB data storage is
> > designed for that. Mid-scale LPG systems are, typically (adjacency
> > layout). That design favours one set of use cases over others.
> >
> > The actual work per step per node is small so if there is poor
> > co-locality to a CPU, then there is a good chance that inter-thread
> > costs are going to dominate. Project Loom (java virtual threads) will
> > help but not solve that because thread switching will be in the JVM, not
> > the OS kernel.
> >
> > There is also the compute synchronization costs and related impact on
> > CPU caches. Stalling to read memory (or competing for L2 cache) or to
> > synchronize between threads is expensive because it breaks up the
> > single-thread execution pipeline on a core.
> >
> > So the layout across cores benefits from clustering.
> >
> > If it done during query execution, rather than ahead-of-time, that more
> > cost.
> >
> > It's a tradeoff - the other use of multiple cores is serving multiple
> > requests concurrently - the Fuseki case.
> >
> > At the moment:
> >
> > I'm sure that path evaluation could be improved.
> >
> > A good first step is to check that there aren't any basic, silly errors
> > in the existing code (it wouldn't be the first time). If none, then
> > there are specific algorithms like Floyd–Warshall (and I think there are
> > parallel versions of that).
> >
> >      Andy
> >
> >
> > On 05/11/2021 12:25, Jakub Jałowiec wrote:
> > > I've noticed that the figure from my last message has not been sent
> > > properly, here is a link to it: https://kubajal.github.io/covidepid/
> > > <https://kubajal.github.io/covidepid/>
> > >
> > > Best regards,
> > > Jakub
> > >
> > > pt., 5 lis 2021 o 13:00 Jakub Jałowiec <j.jalow...@student.uw.edu.pl
> > > <mailto:j.jalow...@student.uw.edu.pl>> napisał(a):
> > >
> > >     Hi Andy,
> > >     thank for your input.
> > >     Let me drill it down a little more so I understand it better. Let's
> > >     say I have the following query:
> > >
> > >         prefix foaf: <http://xmlns.com/foaf/0.1/
> > >         <http://xmlns.com/foaf/0.1/>>
> > >         prefix ex: <http://example.net/ <http://example.net/>>
> > >         SELECT ?person1 (count(?label) as ?labelCounter)
> > >         WHERE {
> > >            ?p1 foaf:knows+ ?p2 .
> > >            ?p2 ex:hasLabel ?label .
> > >         }
> > >         GROUP BY ?p1
> > >
> > >
> > >     Let's also say I have multiple cores on my machine and I want to
> > >     speed up a single query just by utilizing their parallelism. The
> > >     bottleneck here is the complex property path (/foaf:knows+/).
> > >     To speed things up I want to split the search space into chunks
> > >     processed by each core. Intuitively to me the best solution would
> be
> > >     to split by /p1/ equally between all the cores (e.g. if I have /N/
> > >     persons and /k/ cores then each core receives /N///k/ persons to
> > >     evaluate as /p1/). The figure below shows workload distribution I
> am
> > >     trying to achieve (green circles are persons, brown arcs are
> > >     instances of the /foaf:knows/ relationship, instances of
> > >     /ex:hasLabel/ have been left out).
> > >     cores.png
> > >     The query engine would start the evaluation at a "person" node
> > >     (/p1/ in the query) and then just do a closure of the /foaf:knows
> > >     /relationship (/p1 foaf:knows+ p2/). This would require shared
> > >     memory between all the threads.
> > >     I have three questions:
> > >
> > >      1. How would the SPARQL query engine know that it needs to split
> > >         the workload in the 'per root of the pattern' manner and not in
> > >         a different way? Is there a mechanism in the SPARQL interpreter
> > >         for that?
> > >      2. Can a single transaction be shared between multiple threads
> > (cores)?
> > >      3. Do I need a transaction if the threads I am running are
> > >         guaranteed to not modify anything? (the query is a SELECT so it
> > >         is 'read-only')
> > >
> > >     Best regards,
> > >     Jakub
> > >
> > >     niedz., 31 paź 2021 o 11:29 Andy Seaborne <a...@apache.org
> > >     <mailto:a...@apache.org>> napisał(a):
> > >
> > >         Hi Jakub,
> > >
> > >         The preferred way to have parallel actions on a dataset is via
> > >         transactions.
> > >
> > >         concurrency-howto covers threading within a transaction.
> > >         Possible with
> > >         further MRSW (multiple reader or single writer) locking.
> > >
> > >         This is how Fuseki executes multiple requests.  Each HTTP
> > >         request that
> > >         is executing in true parallel is executed on a separate thread
> > and
> > >         inside a transaction.
> > >
> > >         So have each thread start a transaction, execute as many
> > sequential
> > >         queries as it needs and end the transaction.
> > >
> > >         In fact, only TDB2 enforces this; TDB1 only enforces it if it
> has
> > >         already been used transactionally. Other datasets are
> > >         multiple-reader
> > >         safe anyway.  But placing inside a transaction is the correct
> > way.
> > >
> > >                Andy
> > >
> > >         On 30/10/2021 15:44, Jakub Jałowiec wrote:
> > >          > Dear community,
> > >          > is there any high-level user interface to execute parallel
> > >         SELECT queries in Apache Fuseki or the CLI of Apache Jena?
> > >          > I've found a short note on parallelism in Apache Jena here:
> > >
> > https://jena.apache.org/documentation/notes/concurrency-howto.html
> > >         <
> > https://jena.apache.org/documentation/notes/concurrency-howto.html>.
> > >         But that is not really what I am looking for as it is a general
> > >         note on how to implement low-level parallelism in Apache Jena.
> > >          > I am interested in analytic benchmarking of Apache Jena.
> > >         Ideally, I am looking for something that works out-of-the-box
> > >         just for SELECT queries (no need to modify the model in a
> > >         parallel fashion or synchronize state etc.).
> > >          >
> > >          > I'd appreciate any suggestions or pointing out to any
> > >         resources as I am new to Apache Jena. I couldn't find a lot in
> > >         the archives of the list using the keywords "parallelism" and
> > >         "concurrent".
> > >          >
> > >          > Best regards,
> > >          > Jakub
> > >          >
> > >
> >
>

Re: Parallel SELECT queries in Fuseki or Jena's CLI

Reply via email to