On 21/12/2016 14:17, A. Soroka wrote: > DatsetGraph/Graph implementations are smart enough not to store > duplicate tuples. So adding (let's say) a graph with 50 triples to a > graph with 50 triples, of which 25 are common between the two, should > result in a graph with 75 triples to be searched. On the other hand, > a union graph between the two will have to search 100 triples. Is > that what you mean?
No. The graphs I'm merging will probably have less than 1% or triples duplicated (if the users do things as expected). The issue is that I want to merge just only for some SPARQL searches. Therefore I don't want a deep copy of the graphs, I just want to use the original graphs but as one. Let's say it's like pointers in C, I just one to keep the original graphs' pointers so no replication of data is done. For that, based on A.Seaborne answer it seems that createUnion will make it. I don't know what is the usual, but lets say I will have to deal with several thousands of triples, and this is why I don't want to copy them again and again. Hope the explanation is ok. Regards, Jorge > --- A. Soroka The University of Virginia Library > >> On Dec 21, 2016, at 8:13 AM, George News <george.n...@gmx.net> >> wrote: >> >> >> On 21/12/2016 13:54, Andy Seaborne wrote: >>> >>> >>> On 21/12/16 12:31, George News wrote: >>>> Hi, >>>> >>>> Today is the day of questions to the mailing list ;) Sorry for >>>> the "spam" ;) >>>> >>>> I would like to know what is the internal implementation of >>>> the functions used for merging graphs. >>>> >>>> 1) ModelFactory.createUnion(Model m1, Model m2) It seems from >>>> what I have read and inferred from some websites that there is >>>> not an actual copy of data on a new graph. It is more that >>>> internally the graph pointers (like in C) are linked, but the >>>> data is the original one and not copied. Is that right? >>> >>> Correct - it is a new model that internally provides the union >>> view of two other models. >> >> Great, no copy then ;) >> >>>> >>>> 2) org.apache.jena.graph.compose.MultiUnion How is the >>>> addGraph() works? Is it copying the original graph or it is >>>> just linking the data? I'm confused by the help : " Note that >>>> the requirement to remove duplicates from the union means that >>>> this will be an expensive operation for large (and especially >>>> for persistent) graphs. " >>> >>> That comment is on find() >> >> Upss my fault. You are completely right :( >> >>> A graph is a set of triples - the key here is "set" - only one >>> instance. >>> >>> To make that appear to be true in the union, the code needs to >>> remember what it has iterated over. if it is going (in extreme) >>> find(null,null,null) that's a lot of space. >>> >>> >>> >>>> Besides, how do I retrieved the merged/joint graph? Do I have >>>> to use option 1) in an iterative way, reusing the returned >>>> graph to add the additional one? >>> >>> add(Model) copies the one model into another - a true merge. >> >> That was what I thought. Now the confirmation from experts ;) >> >>> >>> from your previous question, you don't want this - you want >>> TDB's "default union graph" mode. It's a lot cheaper at scale. >>> >>> https://jena.apache.org/documentation/tdb/datasets.html >> >> I already have that for the whole dataset. However I was thinking >> on creating smaller named graphs. In my mind, this is going to make >> SPARQL sentences and calls to Jena API quicker as the bunch of data >> where to search from is smaller. Is this right? >> >> If it is I was thinking, based also on your response, to create a >> Model that is the union of all the ones I want (which should be >> quick), and the use this Model as the input for the SPARQL engine. >> >> Besides, I was thinking also on having multiple datasets (TDB) but >> I don't now if that would make any sense. >> >> The issue is that the amount of data that I will have to handle is >> quite huge, and I want as much as possible, to make the searchable >> sets the smaller possible. >> >>>> >>>> Thanks in advance for the help. Jorge >>>> >>> > >