Hi Vineeth, I haven't looked at the plugin Bryan has created , However creating a plugin for special characters gives better performance over patter tokenizer or custom filters. Regards, Raj
On Tuesday, September 9, 2014 9:06:08 AM UTC+5:30, vineeth mohan wrote: > Hello Bryan , > > Congrats on your first plugin. > I have a question here - Can you implement the whole plugin by using > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html > > tokenizer ? > > Is your plugin providing any advantage over going this approach ? > > Thanks > Vineeth > > On Tue, Sep 9, 2014 at 7:56 AM, Bryan Warner <bryan....@gmail.com > <javascript:>> wrote: > >> Hi all, >> >> Recently, I've been working on an extension to Lucene's Standard >> Tokenizer that allows the user to customize / override the default word >> boundary break rules for Unicode characters. The Standard Tokenizer >> implements the word break rules from the Unicode Text segmentation >> <http://www.unicode.org/reports/tr29/> algorithm where most punctuation >> symbols (except for underscore '_') are treated as hard word breaks (e.g. >> "@foo" , "#foo" are tokenized to "foo"). While the Standard Tokenizer works >> great in most cases, I found that being unable to override the default word >> break rules was quite limiting especially since a lot of these punctuation >> symbols have important meaning now on the web (@ - mentions, # - hashtags, >> etc.) >> >> I've wrapped this extension to the Standard Tokenizer in an ElasticSearch >> plugin, which can be found at - >> https://github.com/bbguitar77/elasticsearch-analysis-standardext ... >> definitely looking for feedback as this is my first go at an ElasticSearch >> plugin! >> >> I'm hoping other ElasticSearch / Lucene users find this helpful. >> >> Cheers! >> Bryan >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/929dc7c3-ff99-43a4-a287-1a8f89d86e3f%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/929dc7c3-ff99-43a4-a287-1a8f89d86e3f%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd2fd3b0-f6c1-40e0-b2d7-723084027354%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.