Re: How to deploy a scalable SPARQL Jena service ?

Nicholas Car Sun, 08 Jan 2023 21:51:51 -0800

In case readers of this tread, or the list generally, are interested, we are 
testing out a virtual graph access control system that works nicely with 
Jena/Fuseki. We create Virtual Graphs that are Named Graphs with no content but 
are closures of other Named Graphs that do hold content. In this way, we can 
implement fancy access control - multiple users, groups and roles - to small 
graph parts, using just standard quad store elements + administration data 
holdings.


So here you would break the larger graph into a Named Graph per governance unit 
- whatever your smallest conception of that is - and then build back up access 
to multiple Named Graphs via Virtual Graphs. All done in Fuseki back-end + 
access control API.

Happy to share more details if anyone in interested here or directly.

Cheers, Nick

--
Dr Nicholas Car
Data Architect & Knowledge Graph Specialist
Kurrawong AI
n...@kurrawong.net
0477 560 177
https://kurrawong.net

Honorary Lecturer
College of Engineering, Computing & Cybernetics
Australian National University
https://cecc.anu.edu.au/people/nicholas-car

--


------- Original Message -------
On Monday, January 9th, 2023 at 07:01, Andy Seaborne <a...@apache.org> wrote:


> 
> On 06/01/2023 15:37, Jonathan MERCIER wrote:
> 
> > > Hi Jonathan,
> > 
> > Hi Andy,
> > 
> > > Could you say somnthing about the usage patterns you are interested in
> > > supporting? Size of data? Query load?
> > 
> > Yes of course, we aims to store Partially uniprot ontology in order to
> > study metabolism on multiple layer Organism/Gene/Protein/Reaction/Pathway.
> > Thus we will get a huge amount of public and private data (both academic
> > research and industrial).
> > So we have to use apache shiro to contol who can acces some data (by
> > tenant)
> 
> 
> Shiro will do the authentication and API security for authorization.
> 
> To get the access control on parts of the overall data, do you split the
> data into separate triplestores? Do you use the per-graph access control
> of Jena to get data level security?
> 
> The per-graph access control works if (1) you can manage the data that
> way with named graphs and (2) the access control is user, or role, based.
> 
> In dayjob, I'm working on another data access control system - we have
> existing data which does not decompose into named graphs very easily and
> the access control rules don't fit user/role bases (Role Based Access
> Control = RBAC).
> 
> Attribute Based Access Control (ABAC) can go down to labelling the
> access conditions on individual triples - and also provides of simple
> triple pattern matching (because sometimes, many triples have the same
> label e.g. they have the same property).
> 
> The "attribute" part comes from having key/value boolean expressions for
> access conditions, such as "department=engineering & status=employee"
> which can be moved around with the data when sharing across enterprise
> boundaries.
> 
> > Currently size of data is estimated around 1 To
> > We will provides a Knowledge release time to time so we will most of
> > time doing read only query and sometime we will push our new release (1
> > To).
> 
> 
> Then the full capabilities of RDF Delta may not be needed. Sounds like
> offline database build, copy DB to multiple triple stores behind a load
> balancer.
> 
> Full 24x7 update with no single point of failure is nice but it is
> complex. More servers (cost), more admin (more cost!).
> 
> Or for a few not-time critical incremental updates, a simple mode for
> RDF Delta is with a single patch manager with a replicated filesystem.
> This is a single point of failure for updates, but the Fuseki replicas
> can provide query service through-out. It is simpler to operate.
> 
> Andy
> 
> > > There is a Lucene based text index.
> > > Indeed I see this I will take a look, on how to enable lucene with TDB
> > 
> > Also we will take a look to the fuseki API in order to be able to use it
> > through our python application (more rarely Kotlin)
> > 
> > We aims to perform some GeoSpatial query (maybe we would have to make a
> > plugin) in order to have a dedicated algorithm to walk though our
> > knowledge graph
> > 
> > > 2) can we deploy a distributed TDB service, in order to have efficient
> > > query ?
> > > 
> > > It can scale sideways with multiple copies of the database kept
> > > consistent across a cluster of replicas using the separate project (it
> > > is not an Apache Foundation project) that provides high availability
> > > and multiple query
> > > 
> > > RDF Delta https://afs.github.io/rdf-delta
> > > Thanks Andy I will take a look

Re: How to deploy a scalable SPARQL Jena service ?

Reply via email to