Sure, that makes sense. And as Andy says, a union won't copy data. --- A. Soroka The University of Virginia Library
> On Dec 21, 2016, at 9:43 AM, George News <george.n...@gmx.net> wrote: > > On 21/12/2016 14:17, A. Soroka wrote: >> DatsetGraph/Graph implementations are smart enough not to store >> duplicate tuples. So adding (let's say) a graph with 50 triples to a >> graph with 50 triples, of which 25 are common between the two, should >> result in a graph with 75 triples to be searched. On the other hand, >> a union graph between the two will have to search 100 triples. Is >> that what you mean? > > No. The graphs I'm merging will probably have less than 1% or triples > duplicated (if the users do things as expected). > > The issue is that I want to merge just only for some SPARQL searches. > Therefore I don't want a deep copy of the graphs, I just want to use the > original graphs but as one. Let's say it's like pointers in C, I just > one to keep the original graphs' pointers so no replication of data is done. > > For that, based on A.Seaborne answer it seems that createUnion will make > it. I don't know what is the usual, but lets say I will have to deal > with several thousands of triples, and this is why I don't want to copy > them again and again. > > Hope the explanation is ok. > > Regards, > Jorge > >> --- A. Soroka The University of Virginia Library >> >>> On Dec 21, 2016, at 8:13 AM, George News <george.n...@gmx.net> >>> wrote: >>> >>> >>> On 21/12/2016 13:54, Andy Seaborne wrote: >>>> >>>> >>>> On 21/12/16 12:31, George News wrote: >>>>> Hi, >>>>> >>>>> Today is the day of questions to the mailing list ;) Sorry for >>>>> the "spam" ;) >>>>> >>>>> I would like to know what is the internal implementation of >>>>> the functions used for merging graphs. >>>>> >>>>> 1) ModelFactory.createUnion(Model m1, Model m2) It seems from >>>>> what I have read and inferred from some websites that there is >>>>> not an actual copy of data on a new graph. It is more that >>>>> internally the graph pointers (like in C) are linked, but the >>>>> data is the original one and not copied. Is that right? >>>> >>>> Correct - it is a new model that internally provides the union >>>> view of two other models. >>> >>> Great, no copy then ;) >>> >>>>> >>>>> 2) org.apache.jena.graph.compose.MultiUnion How is the >>>>> addGraph() works? Is it copying the original graph or it is >>>>> just linking the data? I'm confused by the help : " Note that >>>>> the requirement to remove duplicates from the union means that >>>>> this will be an expensive operation for large (and especially >>>>> for persistent) graphs. " >>>> >>>> That comment is on find() >>> >>> Upss my fault. You are completely right :( >>> >>>> A graph is a set of triples - the key here is "set" - only one >>>> instance. >>>> >>>> To make that appear to be true in the union, the code needs to >>>> remember what it has iterated over. if it is going (in extreme) >>>> find(null,null,null) that's a lot of space. >>>> >>>> >>>> >>>>> Besides, how do I retrieved the merged/joint graph? Do I have >>>>> to use option 1) in an iterative way, reusing the returned >>>>> graph to add the additional one? >>>> >>>> add(Model) copies the one model into another - a true merge. >>> >>> That was what I thought. Now the confirmation from experts ;) >>> >>>> >>>> from your previous question, you don't want this - you want >>>> TDB's "default union graph" mode. It's a lot cheaper at scale. >>>> >>>> https://jena.apache.org/documentation/tdb/datasets.html >>> >>> I already have that for the whole dataset. However I was thinking >>> on creating smaller named graphs. In my mind, this is going to make >>> SPARQL sentences and calls to Jena API quicker as the bunch of data >>> where to search from is smaller. Is this right? >>> >>> If it is I was thinking, based also on your response, to create a >>> Model that is the union of all the ones I want (which should be >>> quick), and the use this Model as the input for the SPARQL engine. >>> >>> Besides, I was thinking also on having multiple datasets (TDB) but >>> I don't now if that would make any sense. >>> >>> The issue is that the amount of data that I will have to handle is >>> quite huge, and I want as much as possible, to make the searchable >>> sets the smaller possible. >>> >>>>> >>>>> Thanks in advance for the help. Jorge >>>>> >>>> >> >>