[ 
https://issues.apache.org/jira/browse/STANBOL-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler resolved STANBOL-331.
-----------------------------------------

    Resolution: Fixed
      Assignee: Rupert Westenthaler

fixed with revision #1183014

* Added support for German ("@de" language prefixes)
* Improved indexing for generic text (languages without dedicated support) by
    * using the ICU Tokenizer (tokenizes also languages that do not use spaces)
    * ASCIIFoldingFilterFactory to convert e.g. é to e
* All natural language fields now also support removal of Hyphens
                
> The default SolrYard configuration should have support for i18n analyzers 
> (stemming and accents removal)
> --------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-331
>                 URL: https://issues.apache.org/jira/browse/STANBOL-331
>             Project: Stanbol
>          Issue Type: Bug
>            Reporter: Olivier Grisel
>            Assignee: Rupert Westenthaler
>
> For instance some French newspapers use a spelling of foreign names with 
> accents (e.g. Benyamin Nétanyahou) while DBpedia uses a non accented variant 
> in the rdfs:label field (with the @fr literal), e.g. Benyamin Netanyahou.
> I think Solr and Lucene provide a variety of analyzers to deal with such 
> language specific variability of the tokens. However they are currently not 
> enabled in the default configuration of the SolrYard hence the recall can be 
> very bad for enhancers able to deal with i18n input.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to