I thought that since I'm updating UpLib's Lucene code, I should tackle
the issue of document languages, as well. Right now I'm using an
off-the-shelf language identifier, textcat, to figure out which language
a Web page or PDF is (mainly) written in. I then want to analyze that
document with an a
Hi,
Thanks very much, I definitely plan to upgrade lucene.
I did not keep IndexWriter open partly because in our app we
have more than 3K independent lucene directories, so it would
be hard to put them all into memory, but I may cache some
busiest ones.
Best regards, Lisheng
-Original Messa
On Tue, Sep 21, 2010 at 12:53 AM, Lance Norskog wrote:
> If an index file is not completely written to disk, it never become
> available. Lucene has a file describing the current active index segments.
> It writes all new files to the disk, and changes the description file
> (segments.gen) only af
In order to determine the integrity of an index file, I found that the
easiest way was to use IndexReader.open(directory) and if there were
any problems with the data then catch the exceptions and make a new
one.
I also see that the API offers IndexReader.indexExists() ... would
that be a better a
Is using IndexReader.numDocs() on the Directory instance, the only way
to count the indexed entries?
On Fri, Sep 24, 2010 at 9:40 AM, Pulkit Singhal wrote:
> Hello Everyone,
>
> I want to load the indexed data from the file system using FSDirectory.
> But I also want to be sure if something was a
Hello Everyone,
I want to load the indexed data from the file system using FSDirectory.
But I also want to be sure if something was actually loaded or if a
new empty directory was created and returned to me.
How can I count the # of entries in the Directory object returned to me?
Thanks!
- Pulkit
Cool thanks!
On Fri, Sep 24, 2010 at 11:07 AM, Shay Banon wrote:
> Sure, opened https://issues.apache.org/jira/browse/LUCENE-2666, wanted to
> ping the list first to see if someone knows about it.
>
> On Fri, Sep 24, 2010 at 7:12 AM, Simon Willnauer
> wrote:
>>
>> Shay,
>>
>> would you mind open
Sure, opened https://issues.apache.org/jira/browse/LUCENE-2666, wanted to
ping the list first to see if someone knows about it.
On Fri, Sep 24, 2010 at 7:12 AM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:
> Shay,
>
> would you mind open a jira issue for that?
>
> simon
>
> On Fri, Se
Is it possible for you to migrate to 2.9.x ? Or even 3.x ?
There are some huge optimization in 2.9 on reopening indexes that
significantly improve search speed.
I'm not sure..but I think indexWriter.getReader() for almost realtime
was added to 2.9, so you can keep your writer always open and get v