Hi, sorry to bring up old discussion.

One, maybe ideal for us, solution would be to index all properties (in addition to explicitly configured) that are result of a user defined sparql query. We would run a query on RDF schemas to get all string properties (property's range is xsd:string) etc.


On 28/02/2018 20:16, Osma Suominen wrote:
Hi Jim!

Your observation is correct. jena-text only indexes the RDF properties you have explicitly configured. The configuration for each property may be different. There is no wildcard setting that would cover all possible properties.

The thinking behind this is that for typical use cases of a text index, there is a fairly limited set of properties that may be relevant (e.g. rdfs:label, rdfs:comment, dc:title, dc:description, skos:prefLabel, skos:altLabel, schema:name) and indexing every possible property would just bloat the index. Other literal values are still in the triple store and can be searched (possibly inefficiently) using SPARQL features such as FILTER with e.g. a REGEX or CONTAINS function.

If you think that e.g. a wildcard property setting would be a useful, please open an issue in the Apache Jena JIRA (https://issues.apache.org/jira/projects/JENA/issues). Also, patches and pull requests welcome!

-Osma


McCusker, James Patrick kirjoitti 28.02.2018 klo 19:23:
 From what I can tell in the documentation, we have to configure Jena text to index a fixed set of predicates. The examples give rdfs:label, and from what I see I can add more, but there are a lot of potential properties in the world. Is there a way to simply index all predicates into a field? It seems strange that I would have to enumerate over the tons of text predicates that are used in the world in order to do a proper *full* text search of my graph.

This is a capability that is covered by other SPARQL implementations (Blazegraph, Virtuoso).

Theoretically, the predicate should just be another field in the lucene document that can be filtered on, like with graph.

Thanks,
Jim

Jim McCusker, Ph.D.
mccu...@rpi.edu
http://tw.rpi.edu/web/person/JamesMcCusker
Director, Data Operations
Tetherless World Constellation
Department of Computer Science
Rensselaer Polytechnic Institute





--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Reply via email to