Hi Vineeth,
I haven't looked at the plugin Bryan has created ,
However creating a plugin for special characters gives better performance 
over patter tokenizer or custom filters.
Regards,
Raj

On Tuesday, September 9, 2014 9:06:08 AM UTC+5:30, vineeth mohan wrote:

> Hello Bryan ,
>
> Congrats on your first plugin. 
> I have a question here - Can you implement the whole plugin by using 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
>  
> tokenizer ? 
>
> Is your plugin providing any advantage over going this approach ?
>
> Thanks
>           Vineeth
>
> On Tue, Sep 9, 2014 at 7:56 AM, Bryan Warner <bryan....@gmail.com 
> <javascript:>> wrote:
>
>> Hi all,
>>
>> Recently, I've been working on an extension to Lucene's Standard 
>> Tokenizer that allows the user to customize / override the default word 
>> boundary break rules for Unicode characters. The Standard Tokenizer 
>> implements the word break rules from the Unicode Text segmentation 
>> <http://www.unicode.org/reports/tr29/> algorithm where most punctuation 
>> symbols (except for underscore '_') are treated as hard word breaks (e.g. 
>> "@foo" , "#foo" are tokenized to "foo"). While the Standard Tokenizer works 
>> great in most cases, I found that being unable to override the default word 
>> break rules was quite limiting especially since a lot of these punctuation 
>> symbols have important meaning now on the web (@ - mentions, # - hashtags, 
>> etc.)
>>
>> I've wrapped this extension to the Standard Tokenizer in an ElasticSearch 
>> plugin, which can be found at - 
>> https://github.com/bbguitar77/elasticsearch-analysis-standardext ... 
>> definitely looking for feedback as this is my first go at an ElasticSearch 
>> plugin!
>>
>> I'm hoping other ElasticSearch / Lucene users find this helpful.
>>
>> Cheers!
>> Bryan
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/929dc7c3-ff99-43a4-a287-1a8f89d86e3f%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/929dc7c3-ff99-43a4-a287-1a8f89d86e3f%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dd2fd3b0-f6c1-40e0-b2d7-723084027354%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to