Hi,
I've just read this blog post from Andy:
http://www.epimorphics.com/web/wiki/epimorphics-builds-data-publish-platform-environment-agency

It describes a "quite simple" fault-tolerant and replicated data publishing 
solution using Apache Jena and Fuseki. Interesting.

It's a master/slave architecture. The master (called by Andy in his post 
'controller server') receives all updates and "calculates the triples to be 
added, the triples to be removed" so that changes
are 'idempotent' (i.e. they can be reapplied multiple times (in the same 
order!) with the same effect).

It would be interesting to know if the 'controller server' exposes a full 
SPARQL Update endpoint and/or the Graph Store HTTP Protocol and if that is the 
case how triples to be added/removed are
calculated. (This is something I wanted to learn for a while, but I still did 
not find the time... a small example would be wonderful! ;-)).

To conclude, I fully agree on the "quite simple design" and "simple systems are 
easier to operate". The approach described can work well in a lot of scenarios 
where the rate of updates/writes isn't
excessive and you have mostly reads (which I still believe to be the case most 
of the times when you have RDF data, since data is often human 
generated/curated data).
My hope is to see something similar in the 'open' so that other Apache Jena and 
Fuseki users can benefit from an highly available and open source publishing 
solution for RDF data (and they can focus
their energies/efforts elsewhere: on the quality of their data modeling, data, 
applications, user experience, etc.).

Paolo

PS:
Disclaimer: I don't work for Epimorphics, those are just my personal opinions 
and, last but not least, I love simplicity.

Reply via email to