Hi Mikael!
Thanks for the idea. However, I don't quite see the picture. Where would
you put that user defined SPARQL query? In the Jena assembler
configuration file? How is that better than just defining the properties
in that same file?
Also, jena-text needs to know a bit more than just the list of
properties to index. Each property has to be configured with field name
(in the index) and possibly an analyzer too if you want to override the
analyzer setting for the index.
But since the Jena assembler configuration is just RDF triples, you
could actually generate it (or perhaps just the entity map part) using a
SPARQL CONSTRUCT query. Store the result in a file and include it as
part of your configuration.
Also see the last part of my message that you quoted: if you think
something in Jena (or jena-text) needs to be changed, please open an
issue on JIRA and submit a pull request.
-Osma
Mikael Pesonen kirjoitti 23.1.2019 klo 14.09:
Hi, sorry to bring up old discussion.
One, maybe ideal for us, solution would be to index all properties (in
addition to explicitly configured) that are result of a user defined
sparql query.
We would run a query on RDF schemas to get all string properties
(property's range is xsd:string) etc.
On 28/02/2018 20:16, Osma Suominen wrote:
Hi Jim!
Your observation is correct. jena-text only indexes the RDF properties
you have explicitly configured. The configuration for each property
may be different. There is no wildcard setting that would cover all
possible properties.
The thinking behind this is that for typical use cases of a text
index, there is a fairly limited set of properties that may be
relevant (e.g. rdfs:label, rdfs:comment, dc:title, dc:description,
skos:prefLabel, skos:altLabel, schema:name) and indexing every
possible property would just bloat the index. Other literal values are
still in the triple store and can be searched (possibly inefficiently)
using SPARQL features such as FILTER with e.g. a REGEX or CONTAINS
function.
If you think that e.g. a wildcard property setting would be a useful,
please open an issue in the Apache Jena JIRA
(https://issues.apache.org/jira/projects/JENA/issues). Also, patches
and pull requests welcome!
-Osma
McCusker, James Patrick kirjoitti 28.02.2018 klo 19:23:
From what I can tell in the documentation, we have to configure Jena
text to index a fixed set of predicates. The examples give
rdfs:label, and from what I see I can add more, but there are a lot
of potential properties in the world. Is there a way to simply index
all predicates into a field? It seems strange that I would have to
enumerate over the tons of text predicates that are used in the world
in order to do a proper *full* text search of my graph.
This is a capability that is covered by other SPARQL implementations
(Blazegraph, Virtuoso).
Theoretically, the predicate should just be another field in the
lucene document that can be filtered on, like with graph.
Thanks,
Jim
Jim McCusker, Ph.D.
mccu...@rpi.edu
http://tw.rpi.edu/web/person/JamesMcCusker
Director, Data Operations
Tetherless World Constellation
Department of Computer Science
Rensselaer Polytechnic Institute
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi