Shai, I was referring to your #2, which you already indicated in your
reply wasn't part of the discussion.
Itamar.
On 26/9/2010 10:10 AM, Shai Erera wrote:
The mapping is simply about returning the right Analyzer for the given
Locale. You decide up front (as the Factory developer) what Analyze
The mapping is simply about returning the right Analyzer for the given
Locale. You decide up front (as the Factory developer) what Analyzer /
Tokenizer + TokenFilters combination you want to return for each language,
and then when that language is input, you return it. That's it.
Can you define mi
I may be missing the point here, but how do you define an analyzer <->
language match? What do you do in cases of mixed content, for example?
Itamar.
On 25/9/2010 10:27 PM, Shai Erera wrote:
Shai Erera brought a similar idea up before, to use Locale, but my concerns
are it would be limited by
>
> Shai Erera brought a similar idea up before, to use Locale, but my concerns
> are it would be limited by javas Locale mechanism... but we can figure this
> out.
>
It really depends how sophisticated you want such an AnalyzerFactory
(that's how I call it in my code) to be. We can
define it to
Robert Muir wrote:
> On Fri, Sep 24, 2010 at 9:58 PM, Bill Janssen wrote:
>
> > I thought that since I'm updating UpLib's Lucene code, I should tackle
> > the issue of document languages, as well. Right now I'm using an
> > off-the-shelf language identifier, textcat, to figure out which langua
On Fri, Sep 24, 2010 at 9:58 PM, Bill Janssen wrote:
> I thought that since I'm updating UpLib's Lucene code, I should tackle
> the issue of document languages, as well. Right now I'm using an
> off-the-shelf language identifier, textcat, to figure out which language
> a Web page or PDF is (main
I thought that since I'm updating UpLib's Lucene code, I should tackle
the issue of document languages, as well. Right now I'm using an
off-the-shelf language identifier, textcat, to figure out which language
a Web page or PDF is (mainly) written in. I then want to analyze that
document with an a