Re: Persistent Model Implementation

Andy Seaborne Mon, 22 Oct 2018 05:21:37 -0700

Also:

There is a Union graph - this would be useful if you are not deletingtriples. It is keeps one of the graphs untouched.


    Andy

On 22/10/2018 12:36, Andy Seaborne wrote:

>>> I have an application using Jena where I frequently have to createcopies of Models in order to then process them individually, i.e. alltriples of one source Model are added to k new Models which are thenmutated.
When the lower level Graph (not Model) is copied in a JVM there is stillsharing. The RDF terms, Nodes, URIs, blank node literal, are notduplicated.
RDFNode (really, EnhNode) is a pair of pointers (graph, node) but it isnot used in the datastructures for the graph so they are transient andthe GC recycles them.
You can think of Model as a presentation of the basic storage - the Graph.

     Andy


On 22/10/2018 11:49, Kevin Dreßler wrote:
Thanks for your quick reply!
On 22. Oct 2018, at 12:19, ajs6f <aj...@apache.org> wrote:
The TIM dataset implementation [1] is backed by persistent datastructures (for the confused, the term "persistent" here means in thesense of immutable [2]-- it has nothing to do with disk storage).However, nothing there goes beyond the Node/Triple/Graph/DatasetGraphSPI-- the underlying structures aren't exposed and can't be reused byclients.
This looks interesting but I don't think it actually matches my usecase. However, I think I would want a transactional commit in myimplementation to improve performance so that I could collect a set ofstatements and only create a new immutable instance of the model whencommitting all of these together instead of after each single statement.
This sounds like an interesting and powerful use case, although I'mnot sure how easily it could be accomplished within the current API.For one thing, we don't have a good way of distinguishing mutable andimmutable models in Jena's type system right now.
Are the "k new Models" both adding and removing triples? If they'rejust adding triples, perhaps a clever wrapper might work.
Both addition and deletion of triples is possible. But the wrapperidea is nice and might actually work for both addition and deletion,as I could try to cache a set of Statements that have been deleted aslong as this caches size is under x% of the base models size.
Otherwise, have you tried using an intermediating caching setup,wherein statements that are copied are routed through a cache thatprevents duplication? I believe Andy deployed a similar technique forsome of the TDB loading code and saw great improvement therefrom.
I just started researching this so I haven't done anything in thisdirection. Do you believe the wrapper / caching approach would befeasible with the current API? I am not very familiar with Jenasimplementations but from my experience with the API it seems thatevery RDFNode has a reference to the model from which it was retrieved(if any). So in order to not violate API contracts I think I wouldalso need to wrap each resource upon retrieval to point to the wrappermodel instead of the base model?
ajs6f

[1] https://jena.apache.org/documentation/rdf/datasets.html
[2] https://en.wikipedia.org/wiki/Persistent_data_structure
On Oct 22, 2018, at 12:08 PM, Kevin Dreßler <kvndrs...@gmail.com>wrote:
Hello everyone,
I have an application using Jena where I frequently have to createcopies of Models in order to then process them individually, i.e.all triples of one source Model are added to k new Models which arethen mutated.
For larger Models this obviously takes some time and, more relevantfor me, creates a considerable amount of memory pressure.However, with a Model implementation based on persistent datastructures I could eliminate most of these issues as the amount ofdata changed is typically under 5% compared to the overall Model size.
Has anyone ever done something like this before, i.e. are thereimmutable Model implementations with structural sharing that someoneis aware of? If not what would be your advice on how one wouldapproach implementing this in their own code base?
Best regards,
Kevin

Re: Persistent Model Implementation

Reply via email to