[ 
https://issues.apache.org/jira/browse/SOLR-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756177#action_12756177
 ] 

Stanislaw Osinski commented on SOLR-1336:
-----------------------------------------

Keeping the Chinese analyzer JAR optional sounds good. As Carrot2 also uses it, 
I'd need to make sure the clustering contrib doesn't fail when the JAR is not 
there and clustering in Chinese is requested (I think I'd simply log a WARN 
saying that the Chinese analyzer JAR is required for best clustering results).

> Add support for lucene's SmartChineseAnalyzer
> ---------------------------------------------
>
>                 Key: SOLR-1336
>                 URL: https://issues.apache.org/jira/browse/SOLR-1336
>             Project: Solr
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Robert Muir
>         Attachments: SOLR-1336.patch, SOLR-1336.patch, SOLR-1336.patch
>
>
> SmartChineseAnalyzer was contributed to lucene, it indexes simplified chinese 
> text as words.
> if the factories for the tokenizer and word token filter are added to solr it 
> can be used, although there should be a sample config or wiki entry showing 
> how to apply the built-in stopwords list.
> this is because it doesn't contain actual stopwords, but must be used to 
> prevent indexing punctuation... 
> note: we did some refactoring/cleanup on this analyzer recently, so it would 
> be much easier to do this after the next lucene update.
> it has also been moved out of -analyzers.jar due to size, and now builds in 
> its own smartcn jar file, so that would need to be added if this feature is 
> desired.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to