Dear friends, Thanks so much for your quick answer,
At first about our use case, I estimate that we will be working with around 100 millions triples at the beginning, thus according to the answer of Dave, this size should be manageable by Jena, or I am wrong?, of course surely we will grow quickly, and then I think we should have our eyes targeted in another stores, as you recommend. I think that this benchmark could be a good starting point [1]. Second, about the reasoning, our task is as follows: let us say we have a knowledgebase of people (p1, p2, pn) and friendships (f1, f2, fn). Where p1, p2, pn and f1, f2, fn are individuals of the respective concepts (people and friendship). People are related by friendships, every friendship occurs between two different people, has start date, and end date of the friendship, if any, and a validity, this validity is a Boolean. In our reasoning we want to get friendly people, and for us a friendly person would have more than "X" valid friendships. For this, I think i have to follow the following workflow: 1. Run a rule to evaluate the friendship validity, triggering it to true or false. 2. Perform inference on the result to get valid friendship, if any. About my third question: 3. is necessary a triple store to use with reasoner and rule engine?, in > that case what do you recommend? my most experience has been with protege, and owl api, and I understood that they recommended a back end repository for dealing with large datasets, perhaps I misunderstood it, and I know that stores and reasoners are different things. Well, I thank you in advance all the recommendations you could give me. best regards Luis Ramos [1] https://www.w3.org/wiki/LargeTripleStores El mar., 7 ene. 2020 a las 11:01, Lorenz Buehmann (< [email protected]>) escribió: > I agree with Dave, we should start with the most important things: > > i) what is the use case > ii) what kind of inference is needed here > > There is an obvious difference between a full OWL 2 compliant DL > reasoner usually using tableau algorithm and a reasoner based on rules. > > Most common benchmarks I touched have been LUBM and UOBM to evaluated > performance of large scale reasoner usually to some extended related to > triple stores (or even integrated) > > I'd not go with the map-reduce way, there are already approaches based > on Spark and Flink for some (sub)set of OWL/RDFS inference rules. Those > tend to be faster due to benefits like in-memory processing especially > when iterative algorithms like fix-point etc. come into play. > > Anyways, we should start with i) and ii) here. > > On 07.01.20 09:59, Dave Reynolds wrote: > > On 07/01/2020 08:31, Luis Enrique Ramos García wrote: > >> Dear friends, > >> > >> I am currently working in an application in where I have to implement a > >> reasoner, in which I have had some experience, the difference is that > >> this > >> time i have to implement it in a big data environment, where I have > >> to deal > >> with a data set od some giga bytes. > >> > >> About that, my questions are the following: > >> > >> 1. is there a benchmark or evaluation of performance of jena with some > >> reasoners, which consider memory or quantity of triples, and > >> execution time?. > > > > Depends what sort of inference you are talking about. > > > > Apart from the OWL benchmarks you mention, some of the Sparql > > benchmarks do require small amounts of reasoning loosely around > > RDFS++. For example, I seem to remember LUBM requires this but I've > > never worked with it. > > > > Jena's inference is not designed to scale to billons of triples, it's > > a memory-only solution (though "giga byes" might mean just millions of > > triples and might fit in memory). So reasoning at scale benchmarks on > > Jena are not going to be much use to you. Look at the results for > > commercial stores that do claim inference at scale. > > > >> 2. is elephas, and a map reduce approach a good alternative to deal > >> with a > >> big data environment? > > > > Depends what sort of inference you are talking about and whether you > > care about latency or just overall throughput at scale. Map reduce is > > not good for low latency interactive queries. > > > >> 3. is necessary a triple store to use with reasoner and rule engine?, in > >> that case what do you recommend? > > > > Don't understand the question. Triple stores and reasoners are > > different things. You can have reasoners that have nothing to do with > > RDF/triple-stores and you can have triple stores with no reasoner. > > There are fair number of commercial and open source tools in both > > categories and in the overlap. > > > > Dave > >
