On Fri, Sep 24, 2010 at 9:58 PM, Bill Janssen wrote:
> I thought that since I'm updating UpLib's Lucene code, I should tackle
> the issue of document languages, as well. Right now I'm using an
> off-the-shelf language identifier, textcat, to figure out which language
> a Web page or PDF is (main
Robert Muir wrote:
> On Fri, Sep 24, 2010 at 9:58 PM, Bill Janssen wrote:
>
> > I thought that since I'm updating UpLib's Lucene code, I should tackle
> > the issue of document languages, as well. Right now I'm using an
> > off-the-shelf language identifier, textcat, to figure out which langua
>
> Shai Erera brought a similar idea up before, to use Locale, but my concerns
> are it would be limited by javas Locale mechanism... but we can figure this
> out.
>
It really depends how sophisticated you want such an AnalyzerFactory
(that's how I call it in my code) to be. We can
define it to
I may be missing the point here, but how do you define an analyzer <->
language match? What do you do in cases of mixed content, for example?
Itamar.
On 25/9/2010 10:27 PM, Shai Erera wrote:
Shai Erera brought a similar idea up before, to use Locale, but my concerns
are it would be limited by
hello all,
I am here to ask about lucene in flushing indexes.
below is a pseudocode I get from the book lucene in action.
FSDirectory fsDir = FSDirectory.getDirectory("/tmp/index",
true);
RAMDirectory ramDir = new RAMDirectory();
IndexWriter fsWriter = IndexWriter(fsDir,
new SimpleAnalyzer(), true
You should close the ramwriter before doing the addindexes call. Else you
simply copy only *committed* changes, but there are none, as the index is
initially empty and stays empty for an outside reader (your addindexes call
is the outside reader) until the ramwriter is closed or committed. :-)
---