In case readers of this tread, or the list generally, are interested, we are testing out a virtual graph access control system that works nicely with Jena/Fuseki. We create Virtual Graphs that are Named Graphs with no content but are closures of other Named Graphs that do hold content. In this way, we can implement fancy access control - multiple users, groups and roles - to small graph parts, using just standard quad store elements + administration data holdings.
So here you would break the larger graph into a Named Graph per governance unit - whatever your smallest conception of that is - and then build back up access to multiple Named Graphs via Virtual Graphs. All done in Fuseki back-end + access control API. Happy to share more details if anyone in interested here or directly. Cheers, Nick -- Dr Nicholas Car Data Architect & Knowledge Graph Specialist Kurrawong AI n...@kurrawong.net 0477 560 177 https://kurrawong.net Honorary Lecturer College of Engineering, Computing & Cybernetics Australian National University https://cecc.anu.edu.au/people/nicholas-car -- ------- Original Message ------- On Monday, January 9th, 2023 at 07:01, Andy Seaborne <a...@apache.org> wrote: > > On 06/01/2023 15:37, Jonathan MERCIER wrote: > > > > Hi Jonathan, > > > > Hi Andy, > > > > > Could you say somnthing about the usage patterns you are interested in > > > supporting? Size of data? Query load? > > > > Yes of course, we aims to store Partially uniprot ontology in order to > > study metabolism on multiple layer Organism/Gene/Protein/Reaction/Pathway. > > Thus we will get a huge amount of public and private data (both academic > > research and industrial). > > So we have to use apache shiro to contol who can acces some data (by > > tenant) > > > Shiro will do the authentication and API security for authorization. > > To get the access control on parts of the overall data, do you split the > data into separate triplestores? Do you use the per-graph access control > of Jena to get data level security? > > The per-graph access control works if (1) you can manage the data that > way with named graphs and (2) the access control is user, or role, based. > > In dayjob, I'm working on another data access control system - we have > existing data which does not decompose into named graphs very easily and > the access control rules don't fit user/role bases (Role Based Access > Control = RBAC). > > Attribute Based Access Control (ABAC) can go down to labelling the > access conditions on individual triples - and also provides of simple > triple pattern matching (because sometimes, many triples have the same > label e.g. they have the same property). > > The "attribute" part comes from having key/value boolean expressions for > access conditions, such as "department=engineering & status=employee" > which can be moved around with the data when sharing across enterprise > boundaries. > > > Currently size of data is estimated around 1 To > > We will provides a Knowledge release time to time so we will most of > > time doing read only query and sometime we will push our new release (1 > > To). > > > Then the full capabilities of RDF Delta may not be needed. Sounds like > offline database build, copy DB to multiple triple stores behind a load > balancer. > > Full 24x7 update with no single point of failure is nice but it is > complex. More servers (cost), more admin (more cost!). > > Or for a few not-time critical incremental updates, a simple mode for > RDF Delta is with a single patch manager with a replicated filesystem. > This is a single point of failure for updates, but the Fuseki replicas > can provide query service through-out. It is simpler to operate. > > Andy > > > > There is a Lucene based text index. > > > Indeed I see this I will take a look, on how to enable lucene with TDB > > > > Also we will take a look to the fuseki API in order to be able to use it > > through our python application (more rarely Kotlin) > > > > We aims to perform some GeoSpatial query (maybe we would have to make a > > plugin) in order to have a dedicated algorithm to walk though our > > knowledge graph > > > > > 2) can we deploy a distributed TDB service, in order to have efficient > > > query ? > > > > > > It can scale sideways with multiple copies of the database kept > > > consistent across a cluster of replicas using the separate project (it > > > is not an Apache Foundation project) that provides high availability > > > and multiple query > > > > > > RDF Delta https://afs.github.io/rdf-delta > > > Thanks Andy I will take a look