On 06/01/2023 15:37, Jonathan MERCIER wrote:
Hi Jonathan,

Hi Andy,

Could you say somnthing about the usage patterns you are interested in supporting? Size of data? Query load?

Yes of course, we aims to store Partially uniprot ontology in order to study metabolism on multiple layer Organism/Gene/Protein/Reaction/Pathway. Thus we will get a huge amount of public and private data (both academic research and industrial). So we have to use apache shiro to contol who can acces some data (by tenant)

Shiro will do the authentication and API security for authorization.

To get the access control on parts of the overall data, do you split the data into separate triplestores? Do you use the per-graph access control of Jena to get data level security?

The per-graph access control works if (1) you can manage the data that way with named graphs and (2) the access control is user, or role, based.

In dayjob, I'm working on another data access control system - we have existing data which does not decompose into named graphs very easily and the access control rules don't fit user/role bases (Role Based Access Control = RBAC).

Attribute Based Access Control (ABAC) can go down to labelling the access conditions on individual triples - and also provides of simple triple pattern matching (because sometimes, many triples have the same label e.g. they have the same property).

The "attribute" part comes from having key/value boolean expressions for access conditions, such as "department=engineering & status=employee" which can be moved around with the data when sharing across enterprise boundaries.


Currently size of data is estimated around 1 To
We will provides a Knowledge release time to time so we will most of time doing read only query and sometime we will push our new release (1 To).

Then the full capabilities of RDF Delta may not be needed. Sounds like offline database build, copy DB to multiple triple stores behind a load balancer.

Full 24x7 update with no single point of failure is nice but it is complex. More servers (cost), more admin (more cost!).

Or for a few not-time critical incremental updates, a simple mode for RDF Delta is with a single patch manager with a replicated filesystem. This is a single point of failure for updates, but the Fuseki replicas can provide query service through-out. It is simpler to operate.

    Andy

There is a Lucene based text index.
Indeed I see this I will take a look, on how to enable lucene with TDB

Also we will take a look to the fuseki API in order to be able to use it through our python application (more rarely Kotlin)

We aims to perform some GeoSpatial query (maybe we would have to make a plugin) in order to have a dedicated algorithm to walk though our knowledge graph
2) can we  deploy a distributed TDB service, in order to have efficient query ?

It can scale sideways with multiple copies of the database kept consistent across a cluster of replicas using the separate project (it is not an Apache Foundation project) that provides high availability and multiple query

RDF Delta <https://afs.github.io/rdf-delta>
Thanks Andy I will take a look



Reply via email to