[
https://issues.apache.org/jira/browse/STANBOL-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olivier Grisel updated STANBOL-331:
-----------------------------------
Description:
For instance some French newspapers use a spelling of foreign names with
accents (e.g. Benyamin Nétanyahou) while DBpedia uses a non accented variant in
the rdfs:label field (with the @fr literal), e.g. Benyamin Netanyahou.
I think Solr and Lucene provide a variety of analyzers to deal with such
language specific variability of the tokens. However they are currently not
enabled in the default configuration of the SolrYard hence the recall can be
very bad for enhancers able to deal with i18n input.
was:
For instance some French newspapers use a spelling of foreign names with
accents (e.g. Benyamin Nétanyahou) while DBpedia uses a non accented variant in
the rdfs:label field (with the @fr literal), e.g. Benyamin Netanyahou.
I think Solr and Lucene provide a variety of analyzers to deal with such
language specific variability of the tokens. However they are currently not
enabled in the default configuration of the SolrYard hence the recall is can be
very bad for enhancers able to deal with i18n input.
> The default SolrYard configuration should have support for i18n analyzers
> (stemming and accents removal)
> --------------------------------------------------------------------------------------------------------
>
> Key: STANBOL-331
> URL: https://issues.apache.org/jira/browse/STANBOL-331
> Project: Stanbol
> Issue Type: Bug
> Reporter: Olivier Grisel
>
> For instance some French newspapers use a spelling of foreign names with
> accents (e.g. Benyamin Nétanyahou) while DBpedia uses a non accented variant
> in the rdfs:label field (with the @fr literal), e.g. Benyamin Netanyahou.
> I think Solr and Lucene provide a variety of analyzers to deal with such
> language specific variability of the tokens. However they are currently not
> enabled in the default configuration of the SolrYard hence the recall can be
> very bad for enhancers able to deal with i18n input.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira