[jira] [Created] (STANBOL-331) The default SolrYard configuration should have support for i18n analyzers (stemming and accents removal)

Olivier Grisel (JIRA) Mon, 26 Sep 2011 08:04:51 -0700

The default SolrYard configuration should have support for i18n analyzers 
(stemming and accents removal)
--------------------------------------------------------------------------------------------------------


                 Key: STANBOL-331
                 URL: https://issues.apache.org/jira/browse/STANBOL-331
             Project: Stanbol
          Issue Type: Bug
            Reporter: Olivier Grisel


For instance some news papers use a french spelling of foreign names with 
accents (e.g. Benyamin Nétanyahou) while DBpedia uses a non accentued version 
in the rdfs:label field (with the @fr literal), e.g. Benyamin Netanyahou.

I think Solr and Lucene provide a variety of analyzers to deal with such 
language specific variability of the tokens. However they are currently not 
enabled in the default configuration of the SolrYard hence the recall is can be 
very bad for enhancers able to deal with i18n input.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (STANBOL-331) The default SolrYard configuration should have support for i18n analyzers (stemming and accents removal)

Reply via email to