Hi Vilnis,

On 07/04/2022 11:10, Vilnis Termanis wrote:
Hi,

In brief: Can Fuseki Data ACL be applied to text indexing?

As a general point - a text index itself is not ACL aware. It is setup ahead of time and does not index triples directly. The GeoSPARQL cache is probably similar (I'm less familiar with the GeoSPARQL code).

When the query is under the control of a trusted client, the pattern:

WHERE {
    ?s a ex:Product ;
       text:query (rdfs:label 'printer') ;
       rdfs:label ?lbl
}

can be check of the triple.

If the query isn't controlled, then that won't work.

(Has your usage style changed in the last year?)

And is it
possible to selectively expose text index access per service for a
shared dataset?

Yes.

The context setting can be set per dataset, per service or per endpoint with ja:context [ ja:cxtName "NAME" ; ja:cxtValue "VALUE" ] ;

E.g.
    fuseki:endpoint [
        fuseki:operation fuseki:query ;
        fuseki:name "sparql"
        ja:context [
           ja:cxtName "NAME" ;  ja:cxtValue "VALUE"
        ] ;
    ] ;


In detail:

We're using a single TDB dataset in unionDefaultGraph mode) with
multiple services, wrapped with both ACL (AccessControlledDataset) as
well as text indexing (TextDataset) and are hoping to provide the
following Fuseki services:

1. "full access" - a) Read/write everything b) including text index
2. "selected graphs only" - a) Read only from selected graphs b) no index access
3. "read all" - a) Read everything b) no index access

In the assembler configuration, datasets for the above services are
respectively defined as (where all use the same underlying dataset):
1. TextDataset(DatasetTDB)
2. AccessControlledDataset(DatasetTDB)
3. DatasetTDB

1a & 1b work as expected, as do 2a & 3a. 2b & 3b however still allow
access to text indexing, despite not being explicitly configured as
such in their respective services.

re: 2b/3b: That could be a bug or a configuration error.

The context value is set on the text dataset. So if the server configuration has a service that does not go through the text dataset, the index should not be visible. There will be an entry in the server log.

You don't actually need the DatasetGraphText if the index is only read (i.e. preloaded and no runtime updates).

 From looking at code, I can see that index availability is based on
the TextQuery.textIndex symbol in the execution context
(TextQueryPF.java). This means that, as long as at least one service
enabled text indexing on a dataset, any other services referencing the
same underlying store will also use it.
(Judging by comments in the code, the "instanceof DatasetGraphText"
check is deprecated, even if the logic for now remains in
chooseTextIndex()).

So our questions are:

I) Is it currently possible to disallow access to the text index for
some services but not others (using the same underlying dataset)?

Should be - see above.

II) If not, what might be best approach to implement such a
restriction? (Would traversal of DatasetGraphWrapper to explicitly
find a DatasetGraphText instance make sense?)
III) Or: Is there a different/better approach to solve the index
visibility need described above?

In addition, regarding spatial lookups:
IV) Would GeoSPARQL querying (and it's online caching) respect
AccessControlledDataset restrictions (when querying is performed over
multiple services with different levels of ACL)?

The GeoSPARQL cache is like the text index - not request principal sensitive. (see caveat!)

Regards,
Vilnis

    Andy

Reply via email to