[
https://issues.apache.org/jira/browse/STANBOL-331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123951#comment-13123951
]
Rupert Westenthaler commented on STANBOL-331:
---------------------------------------------
I have started to look at adding support for additional languages to the
default Solr Schema for the Entityhub.
I will also update the modified schema for indexing dbpedia accordingly
The work will be based on http://wiki.apache.org/solr/LanguageAnalysis
> The default SolrYard configuration should have support for i18n analyzers
> (stemming and accents removal)
> --------------------------------------------------------------------------------------------------------
>
> Key: STANBOL-331
> URL: https://issues.apache.org/jira/browse/STANBOL-331
> Project: Stanbol
> Issue Type: Bug
> Reporter: Olivier Grisel
>
> For instance some French newspapers use a spelling of foreign names with
> accents (e.g. Benyamin Nétanyahou) while DBpedia uses a non accented variant
> in the rdfs:label field (with the @fr literal), e.g. Benyamin Netanyahou.
> I think Solr and Lucene provide a variety of analyzers to deal with such
> language specific variability of the tokens. However they are currently not
> enabled in the default configuration of the SolrYard hence the recall can be
> very bad for enhancers able to deal with i18n input.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira