Hi Andy, thank for your input. Let me drill it down a little more so I understand it better. Let's say I have the following query:
> prefix foaf: <http://xmlns.com/foaf/0.1/> > prefix ex: <http://example.net/> > SELECT ?person1 (count(?label) as ?labelCounter) > WHERE { > ?p1 foaf:knows+ ?p2 . > ?p2 ex:hasLabel ?label . > } > GROUP BY ?p1 > Let's also say I have multiple cores on my machine and I want to speed up a single query just by utilizing their parallelism. The bottleneck here is the complex property path (*foaf:knows+*). To speed things up I want to split the search space into chunks processed by each core. Intuitively to me the best solution would be to split by *p1* equally between all the cores (e.g. if I have *N* persons and *k* cores then each core receives *N*/*k* persons to evaluate as *p1*). The figure below shows workload distribution I am trying to achieve (green circles are persons, brown arcs are instances of the *foaf:knows* relationship, instances of *ex:hasLabel* have been left out). [image: cores.png] The query engine would start the evaluation at a "person" node (*p1* in the query) and then just do a closure of the *foaf:knows *relationship (*p1 foaf:knows+ p2*). This would require shared memory between all the threads. I have three questions: 1. How would the SPARQL query engine know that it needs to split the workload in the 'per root of the pattern' manner and not in a different way? Is there a mechanism in the SPARQL interpreter for that? 2. Can a single transaction be shared between multiple threads (cores)? 3. Do I need a transaction if the threads I am running are guaranteed to not modify anything? (the query is a SELECT so it is 'read-only') Best regards, Jakub niedz., 31 paź 2021 o 11:29 Andy Seaborne <a...@apache.org> napisał(a): > Hi Jakub, > > The preferred way to have parallel actions on a dataset is via > transactions. > > concurrency-howto covers threading within a transaction. Possible with > further MRSW (multiple reader or single writer) locking. > > This is how Fuseki executes multiple requests. Each HTTP request that > is executing in true parallel is executed on a separate thread and > inside a transaction. > > So have each thread start a transaction, execute as many sequential > queries as it needs and end the transaction. > > In fact, only TDB2 enforces this; TDB1 only enforces it if it has > already been used transactionally. Other datasets are multiple-reader > safe anyway. But placing inside a transaction is the correct way. > > Andy > > On 30/10/2021 15:44, Jakub Jałowiec wrote: > > Dear community, > > is there any high-level user interface to execute parallel SELECT > queries in Apache Fuseki or the CLI of Apache Jena? > > I've found a short note on parallelism in Apache Jena here: > https://jena.apache.org/documentation/notes/concurrency-howto.html. But > that is not really what I am looking for as it is a general note on how to > implement low-level parallelism in Apache Jena. > > I am interested in analytic benchmarking of Apache Jena. Ideally, I am > looking for something that works out-of-the-box just for SELECT queries (no > need to modify the model in a parallel fashion or synchronize state etc.). > > > > I'd appreciate any suggestions or pointing out to any resources as I am > new to Apache Jena. I couldn't find a lot in the archives of the list using > the keywords "parallelism" and "concurrent". > > > > Best regards, > > Jakub > > >