On 14/04/2022 20:57, Vilnis Termanis wrote:
Hi,

Specifying the following in each service (which is to ignore text
indexing) now works (tested against 4.4.0):

ja:context [ ja:cxtName "http://jena.apache.org/text#index"; ;
ja:cxtValue false ] ;

Doesn't that cause warnings in the Fuseki log?

Is the stack of datasets still the same as earlier?

If you are saying that is necessary that is necessary, it looks like the text context contaminates the base dataset but the fix may break the reverse case (if anyone uses it) of direct access to the storage DB. The fix isn't a quick one, but it so happens the code (a version of Context that keeps changes does exists albeit not in that codebase).


... but not if the associated dataset is an AccessControlledDataset.

 From my understanding, the issue is to do with the fact that
fuseki-access uses QueryExecutionFactory whilst fuseki-core uses
QueryExecDatasetBuilder. The latter takes the HttpAction's context
into account (which presumably leads to the inclusion of the service
context values) while the former does not. (In addition, it would
appear that fuseki-access does not honour query-specific timeouts due
to a similar reason.)

Timeouts ignored because the context is skipped?

This patch seems to fix the issue:
https://github.com/vtermanis/jena/commit/e5cb112f829f305c1f76c8f5305f4394d8e9b04f

Would you be able to turn that into a PR?

I know that this most likely is not the right way to address it.
(Should fuseki-access re-use some common QueryExecution building code
from fuseki-core?) I also wasn't sure how to add an automated (to
jena-integration-tests or with mocking in fuseki-access?) test-case
for this, but I can provide minimal manual steps.

Should I create a Jira ticket for this?

There are several "this" here.

Regards,
Vilnis

    Andy



On Tue, 12 Apr 2022 at 21:44, Vilnis Termanis
<vilnis.terma...@iotics.com> wrote:

Hi Andy,

Thank you for the suggestion of in-config context overrides - I had
not realised that was possible (with the newer style of defining
Fuseki services) - that's really useful.
We'll re-rest the aforementioned 2b & 3b cases.

Regards,
Vilnis

On Fri, 8 Apr 2022 at 11:51, Andy Seaborne <a...@apache.org> wrote:

Hi Vilnis,

On 07/04/2022 11:10, Vilnis Termanis wrote:
Hi,

In brief: Can Fuseki Data ACL be applied to text indexing?

As a general point - a text index itself is not ACL aware. It is setup
ahead of time and does not index triples directly. The GeoSPARQL cache
is probably similar (I'm less familiar with the GeoSPARQL code).

When the query is under the control of a trusted client, the pattern:

WHERE {
      ?s a ex:Product ;
         text:query (rdfs:label 'printer') ;
         rdfs:label ?lbl
}

can be check of the triple.

If the query isn't controlled, then that won't work.

(Has your usage style changed in the last year?)

And is it
possible to selectively expose text index access per service for a
shared dataset?

Yes.

The context setting can be set per dataset, per service or per endpoint
with ja:context [ ja:cxtName "NAME" ;  ja:cxtValue "VALUE" ] ;

E.g.
      fuseki:endpoint [
          fuseki:operation fuseki:query ;
          fuseki:name "sparql"
          ja:context [
             ja:cxtName "NAME" ;  ja:cxtValue "VALUE"
          ] ;
      ] ;


In detail:

We're using a single TDB dataset in unionDefaultGraph mode) with
multiple services, wrapped with both ACL (AccessControlledDataset) as
well as text indexing (TextDataset) and are hoping to provide the
following Fuseki services:

1. "full access" - a) Read/write everything b) including text index
2. "selected graphs only" - a) Read only from selected graphs b) no index access
3. "read all" - a) Read everything b) no index access

In the assembler configuration, datasets for the above services are
respectively defined as (where all use the same underlying dataset):
1. TextDataset(DatasetTDB)
2. AccessControlledDataset(DatasetTDB)
3. DatasetTDB

1a & 1b work as expected, as do 2a & 3a. 2b & 3b however still allow
access to text indexing, despite not being explicitly configured as
such in their respective services.

re: 2b/3b: That could be a bug or a configuration error.

The context value is set on the text dataset. So if the server
configuration has a service that does not go through the text dataset,
the index should not be visible. There will be an entry in the server log.

You don't actually need the DatasetGraphText if the index is only read
(i.e. preloaded and no runtime updates).

  From looking at code, I can see that index availability is based on
the TextQuery.textIndex symbol in the execution context
(TextQueryPF.java). This means that, as long as at least one service
enabled text indexing on a dataset, any other services referencing the
same underlying store will also use it.
(Judging by comments in the code, the "instanceof DatasetGraphText"
check is deprecated, even if the logic for now remains in
chooseTextIndex()).

So our questions are:

I) Is it currently possible to disallow access to the text index for
some services but not others (using the same underlying dataset)?

Should be - see above.

II) If not, what might be best approach to implement such a
restriction? (Would traversal of DatasetGraphWrapper to explicitly
find a DatasetGraphText instance make sense?)
III) Or: Is there a different/better approach to solve the index
visibility need described above?

In addition, regarding spatial lookups:
IV) Would GeoSPARQL querying (and it's online caching) respect
AccessControlledDataset restrictions (when querying is performed over
multiple services with different levels of ACL)?

The GeoSPARQL cache is like the text index - not request principal
sensitive. (see caveat!)

Regards,
Vilnis

      Andy



--
Vilnis Termanis
Senior Software Developer

m | +44 (0) 7521 012309
e | vilnis.terma...@iotics.com
www.iotics.com

The information contained in this email is strictly confidential and
intended only for the parties noted. If this email was not intended
for your use, please contact Iotics. For more on our Privacy Policy
please visit https://www.iotics.com/legal/



Reply via email to