Hi Todd,
All of these sound good. Personally, I think analyzers like these
belong in Lucene's contrib/analyzers package, with Solr factory
implementations built on those, but that's your call.
As for the Protocol Buffers, I am assuming you mean: http://code.google.com/p/protobuf/
That is an Apache license, so it is fine to incorporate. Sounds
like it might be a contrib to start, but that's just my take.
Sounds like they might be worth using in SolrJ and for distributed,
but am interested in how it compares to other similar technologies.
Can you share your use case for them?
-Grant
On Oct 15, 2008, at 2:48 PM, Feak, Todd wrote:
Reposting, as I inadvertently thread hijacked on the first one. My
bad.
Hi all,
I have a handful of custom classes that we've created for our purposes
here. I'd like to share them if you think they have value for the rest
of the community, but I wanted to check here before creating JIRA
tickets and patches.
Here's what I have:
1. DoubleMetaphoneFilter and Factory. This replaces usage of the
PhoneticFilter and Factory allowing access to set maxCodeLength() on
the
DoubleMetaphone encoder and access to the "alternate" encodings that
the
encoder provides for some words.
2. JapaneseHalfWidthFilter and Factory. Some Japanese characters (and
Latin alphabet) exist in both a FullWidth and HalfWidth form. This
filter normalizes by switching to the FullWidth form for all the
characters. I have seen at least one JIRA ticket about this issue.
This
implementation doesn't rely on Java 1.6.
3. JapaneseHiraganaFilter and Factory. Japanese Hiragana can be
translated to Katakana. This filter normalizes to Katakana so that
data
and queries can come in either way and get hits.
Also, I have been requested to create a prototype that you may be
interested in. I'm to construct a QueryResponseWriter that returns
documents using Google's Protocol Buffers. This would rely on an
existing patch that exposes the OutputStream, but I would like to
start
the work soon. Are there license concerns that would block sharing
this
with you? Is there any interest in this?
Thanks for your consideration,
Todd Feak
--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ