[ 
https://issues.apache.org/jira/browse/STANBOL-331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123951#comment-13123951
 ] 

Rupert Westenthaler commented on STANBOL-331:
---------------------------------------------

I have started to look at adding support for additional languages to the 
default Solr Schema for the Entityhub.
I will also update the modified schema for indexing dbpedia accordingly

The work will be based on http://wiki.apache.org/solr/LanguageAnalysis
                
> The default SolrYard configuration should have support for i18n analyzers 
> (stemming and accents removal)
> --------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-331
>                 URL: https://issues.apache.org/jira/browse/STANBOL-331
>             Project: Stanbol
>          Issue Type: Bug
>            Reporter: Olivier Grisel
>
> For instance some French newspapers use a spelling of foreign names with 
> accents (e.g. Benyamin Nétanyahou) while DBpedia uses a non accented variant 
> in the rdfs:label field (with the @fr literal), e.g. Benyamin Netanyahou.
> I think Solr and Lucene provide a variety of analyzers to deal with such 
> language specific variability of the tokens. However they are currently not 
> enabled in the default configuration of the SolrYard hence the recall can be 
> very bad for enhancers able to deal with i18n input.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to