The TIM dataset implementation [1] is backed by persistent data structures (for 
the confused, the term "persistent" here means in the sense of immutable [2]-- 
it has nothing to do with disk storage). However, nothing there goes beyond the 
Node/Triple/Graph/DatasetGraph SPI-- the underlying structures aren't exposed 
and can't be reused by clients.

This sounds like an interesting and powerful use case, although I'm not sure 
how easily it could be accomplished within the current API. For one thing, we 
don't have a good way of distinguishing mutable and immutable models in Jena's 
type system right now.

Are the "k new Models" both adding and removing triples? If they're just adding 
triples, perhaps a clever wrapper might work.

Otherwise, have you tried using an intermediating caching setup, wherein 
statements that are copied are routed through a cache that prevents 
duplication? I believe Andy deployed a similar technique for some of the TDB 
loading code and saw great improvement therefrom.

ajs6f

[1] https://jena.apache.org/documentation/rdf/datasets.html
[2] https://en.wikipedia.org/wiki/Persistent_data_structure

> On Oct 22, 2018, at 12:08 PM, Kevin Dreßler <kvndrs...@gmail.com> wrote:
> 
> Hello everyone,
> 
> I have an application using Jena where I frequently have to create copies of 
> Models in order to then process them individually, i.e. all triples of one 
> source Model are added to k new Models which are then mutated.
> 
> For larger Models this obviously takes some time and, more relevant for me, 
> creates a considerable amount of memory pressure.
> However, with a Model implementation based on persistent data structures I 
> could eliminate most of these issues as the amount of data changed is 
> typically under 5% compared to the overall Model size.
> 
> Has anyone ever done something like this before, i.e. are there immutable 
> Model implementations with structural sharing that someone is aware of? If 
> not what would be your advice on how one would approach implementing this in 
> their own code base?
> 
> Best regards,
> Kevin

Reply via email to