[
https://issues.apache.org/jira/browse/STANBOL-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler resolved STANBOL-331.
-----------------------------------------
Resolution: Fixed
Assignee: Rupert Westenthaler
fixed with revision #1183014
* Added support for German ("@de" language prefixes)
* Improved indexing for generic text (languages without dedicated support) by
* using the ICU Tokenizer (tokenizes also languages that do not use spaces)
* ASCIIFoldingFilterFactory to convert e.g. é to e
* All natural language fields now also support removal of Hyphens
> The default SolrYard configuration should have support for i18n analyzers
> (stemming and accents removal)
> --------------------------------------------------------------------------------------------------------
>
> Key: STANBOL-331
> URL: https://issues.apache.org/jira/browse/STANBOL-331
> Project: Stanbol
> Issue Type: Bug
> Reporter: Olivier Grisel
> Assignee: Rupert Westenthaler
>
> For instance some French newspapers use a spelling of foreign names with
> accents (e.g. Benyamin Nétanyahou) while DBpedia uses a non accented variant
> in the rdfs:label field (with the @fr literal), e.g. Benyamin Netanyahou.
> I think Solr and Lucene provide a variety of analyzers to deal with such
> language specific variability of the tokens. However they are currently not
> enabled in the default configuration of the SolrYard hence the recall can be
> very bad for enhancers able to deal with i18n input.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira