On 07/01/2020 08:31, Luis Enrique Ramos García wrote:
Dear friends,
I am currently working in an application in where I have to implement a
reasoner, in which I have had some experience, the difference is that this
time i have to implement it in a big data environment, where I have to deal
with a data set od some giga bytes.
About that, my questions are the following:
1. is there a benchmark or evaluation of performance of jena with some
reasoners, which consider memory or quantity of triples, and
execution time?.
Depends what sort of inference you are talking about.
Apart from the OWL benchmarks you mention, some of the Sparql benchmarks
do require small amounts of reasoning loosely around RDFS++. For
example, I seem to remember LUBM requires this but I've never worked
with it.
Jena's inference is not designed to scale to billons of triples, it's a
memory-only solution (though "giga byes" might mean just millions of
triples and might fit in memory). So reasoning at scale benchmarks on
Jena are not going to be much use to you. Look at the results for
commercial stores that do claim inference at scale.
2. is elephas, and a map reduce approach a good alternative to deal with a
big data environment?
Depends what sort of inference you are talking about and whether you
care about latency or just overall throughput at scale. Map reduce is
not good for low latency interactive queries.
3. is necessary a triple store to use with reasoner and rule engine?, in
that case what do you recommend?
Don't understand the question. Triple stores and reasoners are different
things. You can have reasoners that have nothing to do with
RDF/triple-stores and you can have triple stores with no reasoner. There
are fair number of commercial and open source tools in both categories
and in the overlap.
Dave