On 21/12/2016 14:17, A. Soroka wrote:
> DatsetGraph/Graph implementations are smart enough not to store
> duplicate tuples. So adding (let's say) a graph with 50 triples to a
> graph with 50 triples, of which 25 are common between the two, should
> result in a graph with 75 triples to be searched. On the other hand,
> a union graph between the two will have to search 100 triples. Is
> that what you mean?

No. The graphs I'm merging will probably have less than 1% or triples
duplicated (if the users do things as expected).

The issue is that I want to merge just only for some SPARQL searches.
Therefore I don't want a deep copy of the graphs, I just want to use the
original graphs but as one. Let's say it's like pointers in C, I just
one to keep the original graphs' pointers so no replication of data is done.

For that, based on A.Seaborne answer it seems that createUnion will make
it. I don't know what is the usual, but lets say I will have to deal
with several thousands of triples, and this is why I don't want to copy
them again and again.

Hope the explanation is ok.


> --- A. Soroka The University of Virginia Library
>> On Dec 21, 2016, at 8:13 AM, George News <george.n...@gmx.net>
>> wrote:
>> On 21/12/2016 13:54, Andy Seaborne wrote:
>>> On 21/12/16 12:31, George News wrote:
>>>> Hi,
>>>> Today is the day of questions to the mailing list ;) Sorry for
>>>> the "spam" ;)
>>>> I would like to know what is the internal implementation of
>>>> the functions used for merging graphs.
>>>> 1) ModelFactory.createUnion(Model m1, Model m2) It seems from
>>>> what I have read and inferred from some websites that there is
>>>> not an actual copy of data on a new graph. It is more that 
>>>> internally the graph pointers (like in C) are linked, but the
>>>> data is the original one and not copied. Is that right?
>>> Correct - it is a new model that internally provides the union
>>> view of two other models.
>> Great, no copy then ;)
>>>> 2) org.apache.jena.graph.compose.MultiUnion How is the
>>>> addGraph() works? Is it copying the original graph or it is 
>>>> just linking the data? I'm confused by the help : " Note that
>>>> the requirement to remove duplicates from the union means that
>>>> this will be an expensive operation for large (and especially
>>>> for persistent) graphs. "
>>> That comment is on find()
>> Upss my fault. You are completely right :(
>>> A graph is a set of triples - the key here is "set" - only one
>>> instance.
>>> To make that appear to be true in the union, the code needs to
>>> remember what it has iterated over.  if it is going (in extreme) 
>>> find(null,null,null)  that's a lot of space.
>>>> Besides, how do I retrieved the merged/joint graph? Do I have
>>>> to use option 1) in an iterative way, reusing the returned
>>>> graph to add the additional one?
>>> add(Model) copies the one model into another - a true merge.
>> That was what I thought. Now the confirmation from experts ;)
>>> from your previous question, you don't want this - you want
>>> TDB's "default union graph" mode.  It's a lot cheaper at scale.
>>> https://jena.apache.org/documentation/tdb/datasets.html
>> I already have that for the whole dataset. However I was thinking
>> on creating smaller named graphs. In my mind, this is going to make
>> SPARQL sentences and calls to Jena API quicker as the bunch of data
>> where to search from is smaller. Is this right?
>> If it is I was thinking, based also on your response, to create a
>> Model that is the union of all the ones I want (which should be
>> quick), and the use this Model as the input for the SPARQL engine.
>> Besides, I was thinking also on having multiple datasets (TDB) but
>> I don't now if that would make any sense.
>> The issue is that the amount of data that I will have to handle is
>> quite huge, and I want as much as possible, to make the searchable
>> sets the smaller possible.
>>>> Thanks in advance for the help. Jorge

Reply via email to