Re: finding the analyzer for a language...

2010-09-25 Thread Robert Muir
On Fri, Sep 24, 2010 at 9:58 PM, Bill Janssen wrote: > I thought that since I'm updating UpLib's Lucene code, I should tackle > the issue of document languages, as well. Right now I'm using an > off-the-shelf language identifier, textcat, to figure out which language > a Web page or PDF is (main

Re: finding the analyzer for a language...

2010-09-25 Thread Bill Janssen
Robert Muir wrote: > On Fri, Sep 24, 2010 at 9:58 PM, Bill Janssen wrote: > > > I thought that since I'm updating UpLib's Lucene code, I should tackle > > the issue of document languages, as well. Right now I'm using an > > off-the-shelf language identifier, textcat, to figure out which langua

Re: finding the analyzer for a language...

2010-09-25 Thread Shai Erera
> > Shai Erera brought a similar idea up before, to use Locale, but my concerns > are it would be limited by javas Locale mechanism... but we can figure this > out. > It really depends how sophisticated you want such an AnalyzerFactory (that's how I call it in my code) to be. We can define it to

Re: finding the analyzer for a language...

2010-09-25 Thread Itamar Syn-Hershko
I may be missing the point here, but how do you define an analyzer <-> language match? What do you do in cases of mixed content, for example? Itamar. On 25/9/2010 10:27 PM, Shai Erera wrote: Shai Erera brought a similar idea up before, to use Locale, but my concerns are it would be limited by

flushing index

2010-09-25 Thread Yakob
hello all, I am here to ask about lucene in flushing indexes. below is a pseudocode I get from the book lucene in action. FSDirectory fsDir = FSDirectory.getDirectory("/tmp/index", true); RAMDirectory ramDir = new RAMDirectory(); IndexWriter fsWriter = IndexWriter(fsDir, new SimpleAnalyzer(), true

RE: flushing index

2010-09-25 Thread Uwe Schindler
You should close the ramwriter before doing the addindexes call. Else you simply copy only *committed* changes, but there are none, as the index is initially empty and stays empty for an outside reader (your addindexes call is the outside reader) until the ramwriter is closed or committed. :-) ---