I agree with Dave, we should start with the most important things: i) what is the use case ii) what kind of inference is needed here
There is an obvious difference between a full OWL 2 compliant DL reasoner usually using tableau algorithm and a reasoner based on rules. Most common benchmarks I touched have been LUBM and UOBM to evaluated performance of large scale reasoner usually to some extended related to triple stores (or even integrated) I'd not go with the map-reduce way, there are already approaches based on Spark and Flink for some (sub)set of OWL/RDFS inference rules. Those tend to be faster due to benefits like in-memory processing especially when iterative algorithms like fix-point etc. come into play. Anyways, we should start with i) and ii) here. On 07.01.20 09:59, Dave Reynolds wrote: > On 07/01/2020 08:31, Luis Enrique Ramos García wrote: >> Dear friends, >> >> I am currently working in an application in where I have to implement a >> reasoner, in which I have had some experience, the difference is that >> this >> time i have to implement it in a big data environment, where I have >> to deal >> with a data set od some giga bytes. >> >> About that, my questions are the following: >> >> 1. is there a benchmark or evaluation of performance of jena with some >> reasoners, which consider memory or quantity of triples, and >> execution time?. > > Depends what sort of inference you are talking about. > > Apart from the OWL benchmarks you mention, some of the Sparql > benchmarks do require small amounts of reasoning loosely around > RDFS++. For example, I seem to remember LUBM requires this but I've > never worked with it. > > Jena's inference is not designed to scale to billons of triples, it's > a memory-only solution (though "giga byes" might mean just millions of > triples and might fit in memory). So reasoning at scale benchmarks on > Jena are not going to be much use to you. Look at the results for > commercial stores that do claim inference at scale. > >> 2. is elephas, and a map reduce approach a good alternative to deal >> with a >> big data environment? > > Depends what sort of inference you are talking about and whether you > care about latency or just overall throughput at scale. Map reduce is > not good for low latency interactive queries. > >> 3. is necessary a triple store to use with reasoner and rule engine?, in >> that case what do you recommend? > > Don't understand the question. Triple stores and reasoners are > different things. You can have reasoners that have nothing to do with > RDF/triple-stores and you can have triple stores with no reasoner. > There are fair number of commercial and open source tools in both > categories and in the overlap. > > Dave
