I've noticed that the figure from my last message has not been sent properly, here is a link to it: https://kubajal.github.io/covidepid/
Best regards, Jakub pt., 5 lis 2021 o 13:00 Jakub Jałowiec <j.jalow...@student.uw.edu.pl> napisał(a): > Hi Andy, > thank for your input. > Let me drill it down a little more so I understand it better. Let's say I > have the following query: > >> prefix foaf: <http://xmlns.com/foaf/0.1/> >> prefix ex: <http://example.net/> >> SELECT ?person1 (count(?label) as ?labelCounter) >> WHERE { >> ?p1 foaf:knows+ ?p2 . >> ?p2 ex:hasLabel ?label . >> } >> GROUP BY ?p1 >> > > Let's also say I have multiple cores on my machine and I want to speed up > a single query just by utilizing their parallelism. The bottleneck here is > the complex property path (*foaf:knows+*). > To speed things up I want to split the search space into chunks processed > by each core. Intuitively to me the best solution would be to split by > *p1* equally between all the cores (e.g. if I have *N* persons and *k* > cores then each core receives *N*/*k* persons to evaluate as *p1*). The > figure below shows workload distribution I am trying to achieve (green > circles are persons, brown arcs are instances of the *foaf:knows* > relationship, > instances of *ex:hasLabel* have been left out). > [image: cores.png] > The query engine would start the evaluation at a "person" node (*p1* in > the query) and then just do a closure of the *foaf:knows *relationship (*p1 > foaf:knows+ p2*). This would require shared memory between all the > threads. > I have three questions: > > 1. How would the SPARQL query engine know that it needs to split the > workload in the 'per root of the pattern' manner and not in a different > way? Is there a mechanism in the SPARQL interpreter for that? > 2. Can a single transaction be shared between multiple threads (cores)? > 3. Do I need a transaction if the threads I am running are guaranteed > to not modify anything? (the query is a SELECT so it is 'read-only') > > Best regards, > Jakub > > niedz., 31 paź 2021 o 11:29 Andy Seaborne <a...@apache.org> napisał(a): > >> Hi Jakub, >> >> The preferred way to have parallel actions on a dataset is via >> transactions. >> >> concurrency-howto covers threading within a transaction. Possible with >> further MRSW (multiple reader or single writer) locking. >> >> This is how Fuseki executes multiple requests. Each HTTP request that >> is executing in true parallel is executed on a separate thread and >> inside a transaction. >> >> So have each thread start a transaction, execute as many sequential >> queries as it needs and end the transaction. >> >> In fact, only TDB2 enforces this; TDB1 only enforces it if it has >> already been used transactionally. Other datasets are multiple-reader >> safe anyway. But placing inside a transaction is the correct way. >> >> Andy >> >> On 30/10/2021 15:44, Jakub Jałowiec wrote: >> > Dear community, >> > is there any high-level user interface to execute parallel SELECT >> queries in Apache Fuseki or the CLI of Apache Jena? >> > I've found a short note on parallelism in Apache Jena here: >> https://jena.apache.org/documentation/notes/concurrency-howto.html. But >> that is not really what I am looking for as it is a general note on how to >> implement low-level parallelism in Apache Jena. >> > I am interested in analytic benchmarking of Apache Jena. Ideally, I am >> looking for something that works out-of-the-box just for SELECT queries (no >> need to modify the model in a parallel fashion or synchronize state etc.). >> > >> > I'd appreciate any suggestions or pointing out to any resources as I am >> new to Apache Jena. I couldn't find a lot in the archives of the list using >> the keywords "parallelism" and "concurrent". >> > >> > Best regards, >> > Jakub >> > >> >