finding the analyzer for a language...

2010-09-24 Thread Bill Janssen
I thought that since I'm updating UpLib's Lucene code, I should tackle the issue of document languages, as well. Right now I'm using an off-the-shelf language identifier, textcat, to figure out which language a Web page or PDF is (mainly) written in. I then want to analyze that document with an a

RE: In lucene 2.3.2, needs to stop optimization?

2010-09-24 Thread Zhang, Lisheng
Hi, Thanks very much, I definitely plan to upgrade lucene. I did not keep IndexWriter open partly because in our app we have more than 3K independent lucene directories, so it would be hard to put them all into memory, but I may cache some busiest ones. Best regards, Lisheng -Original Messa

Re: Checksum and transactional safety for lucene indexes

2010-09-24 Thread Yonik Seeley
On Tue, Sep 21, 2010 at 12:53 AM, Lance Norskog wrote: > If an index file is not completely written to disk, it never become > available. Lucene has a file describing the current active index segments. > It writes all new files to the disk, and changes the description file > (segments.gen) only af

Re: Checksum and transactional safety for lucene indexes

2010-09-24 Thread Pulkit Singhal
In order to determine the integrity of an index file, I found that the easiest way was to use IndexReader.open(directory) and if there were any problems with the data then catch the exceptions and make a new one. I also see that the API offers IndexReader.indexExists() ... would that be a better a

Re: How to count entries in an index file?

2010-09-24 Thread Pulkit Singhal
Is using IndexReader.numDocs() on the Directory instance, the only way to count the indexed entries? On Fri, Sep 24, 2010 at 9:40 AM, Pulkit Singhal wrote: > Hello Everyone, > > I want to load the indexed data from the file system using FSDirectory. > But I also want to be sure if something was a

How to count entries in an index file?

2010-09-24 Thread Pulkit Singhal
Hello Everyone, I want to load the indexed data from the file system using FSDirectory. But I also want to be sure if something was actually loaded or if a new empty directory was created and returned to me. How can I count the # of entries in the Directory object returned to me? Thanks! - Pulkit

Re: ArrayIndexOutOfBoundsException when iterating over TermDocs

2010-09-24 Thread Simon Willnauer
Cool thanks! On Fri, Sep 24, 2010 at 11:07 AM, Shay Banon wrote: > Sure, opened https://issues.apache.org/jira/browse/LUCENE-2666, wanted to > ping the list first to see if someone knows about it. > > On Fri, Sep 24, 2010 at 7:12 AM, Simon Willnauer > wrote: >> >> Shay, >> >> would you mind open

Re: ArrayIndexOutOfBoundsException when iterating over TermDocs

2010-09-24 Thread Shay Banon
Sure, opened https://issues.apache.org/jira/browse/LUCENE-2666, wanted to ping the list first to see if someone knows about it. On Fri, Sep 24, 2010 at 7:12 AM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > Shay, > > would you mind open a jira issue for that? > > simon > > On Fri, Se

Re: In lucene 2.3.2, needs to stop optimization?

2010-09-24 Thread Danil ŢORIN
Is it possible for you to migrate to 2.9.x ? Or even 3.x ? There are some huge optimization in 2.9 on reopening indexes that significantly improve search speed. I'm not sure..but I think indexWriter.getReader() for almost realtime was added to 2.9, so you can keep your writer always open and get v