Hi Good Question ...
Text Fields are indexed by using tokenizers in Solr. Therefore a search for "Apache" will find all documents (entities) that have this token for the skos:prefLabel field. This is the reason why you also get "Apache fondation", "Apache bylaw", etc... even if the PatternType is set to "none". As far as I know the only way to go around this is to deactivate any tokenizers for such filed. However without a tokenizer a query for "Westenthaler" would not return "Rupert Westenthaler", what would also be seen as strange by a lot of users. To deactivate Tokenizers for a natural language field one needs to modify the solr schema (schema.xml). Having both (tokenized and un-tokenized) versions is currently not possible. Here are the necessary additions to the schema.xml to deactivate tokenizing for the skos:prefLabel To get this you would need to add <!-- one field for each language --> <field name="@en/skos:prefLabel/" type="lowercase" indexed="true" stored="true" multiValued="true"/> <field name="@de/skos:prefLabel/" type="lowercase" indexed="true" stored="true" multiValued="true"/> <field name="@it/skos:prefLabel/" type="lowercase" indexed="true" stored="true" multiValued="true"/> <field name="@fr/skos:prefLabel/" type="lowercase" indexed="true" stored="true" multiValued="true"/> <field name="@/skos:prefLabel/" type="lowercase" indexed="true" stored="true" multiValued="true"/> <!-- used for multi lingual searches --> <field name="_!@/skos:prefLabel/" type="lowercase" indexed="true" stored="false" multiValued="true"/> If this is a frequent feature I could modify the SolrYard to use suffixes for languages. This would allow to index multiple versions of natural language texts with different prefixes. The prefixes would than indicate if a tokenizer should be used or not. However I could imagine that this would require a lot of changes to the current code, because currently the code assumes that only one of language and data type is present at the same time. best Rupert Westenthaler On Fri, Jun 10, 2011 at 11:07 AM, florent andré <[email protected]> wrote: > Hi Rupert, *, > > As promise in Berlin, I have a question for you ! :) > > I have this query : > > FieldQuery query = site.getQueryFactory().createFieldQuery(); > > query.setConstraint(NamespaceEnum.skos + "prefLabel", > new TextConstraint(signToFind)); > > query.addSelectedField(NamespaceEnum.skos + "related"); > query.addSelectedField(NamespaceEnum.skos + "narrower"); > query.addSelectedField(NamespaceEnum.skos + "broader"); > query.addSelectedField(NamespaceEnum.skos + "inScheme"); > > query.setLimit(this.numSuggestions); > > > When the signToFind is a one word term eg "Apache", I get all composed term > that contain this word eg "Apache fondation", "Apache bylaw", etc... > > That could be interesting in some case, but not always. > > As I read in your documentation, there is : > - patternType: one of "wildcard", "regex" or "none" (default is "none") > > As I don't define a pattern type, this could be in "none", so it could be a > strict matching, right ? > > So, in this case I could have only one word term matching entity, or I miss > something ? > > > Thanks > ++ > -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
