Re: Lucene 4.4.0 mergeSegments OutOfMemoryError

2013-10-07 Thread Michael van Rooyen
part of a forced merge, then it may also be able to happen as part of normal merges as the index grows. I'd be grateful if someone who's grokked the code for segment merges could shed some light on whether I'm worrying unnecessarily... Thanks, Michael. On 2013/09/26 01:43 PM,

Re: Lucene 4.4.0 mergeSegments OutOfMemoryError

2013-09-26 Thread Michael van Rooyen
since around Lucene 3.2, prefers merging segments with many deletions, so forceMerge(1) is not needed. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael van Rooyen [mailto:mich...@loot.co.za

Re: Lucene 4.4.0 mergeSegments OutOfMemoryError

2013-09-26 Thread Michael van Rooyen
(1)? You really shouldn't have to use that. If the index grows forever without it then something else is going on which you might wish to report separately. -- Ian. On Wed, Sep 25, 2013 at 12:35 PM, Michael van Rooyen wrote: We've recently upgraded to Lucene 4.4.0 and mergeSegments now

Lucene 4.4.0 mergeSegments OutOfMemoryError

2013-09-25 Thread Michael van Rooyen
We've recently upgraded to Lucene 4.4.0 and mergeSegments now causes an OOM error. As background, our index contains about 14 million documents (growing slowly) and we process about 1 million updates per day. It's about 8GB on disk. I'm not sure if the Lucene segments merge the way they used

Re: Document boosting and native ordering of results

2013-08-28 Thread Michael van Rooyen
chindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael van Rooyen [mailto:mich...@loot.co.za] Sent: Monday, August 26, 2013 6:39 PM To: java-user@lucene.apache.org Subject: Re: Document boosting and native ordering of

Altering field info without building index from scratch

2013-08-26 Thread Michael van Rooyen
Hello. We got the error: java.lang.IllegalStateException: field "xxx" was indexed without position data; cannot run PhraseQuery What I suspect is happening is that field xxx was first indexed as a StringField (untokenized), and subsequently changed to TextField (tokenized and analyzed). Ev

Re: Document boosting and native ordering of results

2013-08-26 Thread Michael van Rooyen
Not sure if there are any thoughts on this. It definitely makes sense to assign a rank to each document in the index, so that all else being equal, documents are returned in order of rank. This is exactly what the page rank is in Google's index, and Google would be lost without it. This used

Document boosting and native ordering of results

2013-08-20 Thread Michael van Rooyen
Hello. We've just upgraded to 4.3.1 from 2.9.2 and are having a problem with native ordering of search results. We always want documents returned in order of "rank", which for us is a float value that we assign to each document at index time. Rank depends in whether, for example, the item is

Re: If you could have one feature in Lucene...

2010-02-24 Thread Michael van Rooyen
On 2010/02/24 03:42 PM, Grant Ingersoll wrote: What would it be? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Stop words counting when i

Re: java.io.IOException: read past EOF since migration to 2.9.1

2010-02-17 Thread Michael van Rooyen
Toke Eskildsen wrote: On Wed, 2010-02-17 at 15:18 +0100, Michael van Rooyen wrote: I recently upgraded from version 2.3.2 to 2.9.1. [...] Since going live a few days ago, however, we've twice had read past EOF exceptions. The first thing to do is check the Java version. If you're

Re: java.io.IOException: read past EOF since migration to 2.9.1

2010-02-17 Thread Michael van Rooyen
Toke Eskildsen wrote: On Wed, 2010-02-17 at 15:18 +0100, Michael van Rooyen wrote: I recently upgraded from version 2.3.2 to 2.9.1. [...] Since going live a few days ago, however, we've twice had read past EOF exceptions. The first thing to do is check the Java version. If y

java.io.IOException: read past EOF since migration to 2.9.1

2010-02-17 Thread Michael van Rooyen
Hello all! We've been using Lucene for a few years and it's worked without a murmur. I recently upgraded from version 2.3.2 to 2.9.1. We didn't need to make any code changes for the upgrade - apart from the deprecation warnings, the code compiled cleanly and 2.9.1 worked fine in testing.

Re: Index missing documents

2006-02-22 Thread Michael van Rooyen
I'm using Lucene 1.4.3, and maxBufferedDocs only appears to be in the new (unreleased?) version of IndexWriter in CVS. Looking at the code though, setMaxBufferedDocs(n) just translates to minMergeDocs = n. My index was constructed using the default minMergeDocs = 10, so somehow this doesn't s

Re: Index missing documents

2006-02-20 Thread Michael van Rooyen
Thanks Otis. All the documents were written in a using the same IndexWriter, without ever closing it. Is this what could be responsible for the documents not being in the segmens file, or is this bad practice? Maybe I should use a writer for a batch of documents (1000 or so maybe?), and then

Index missing documents

2006-02-19 Thread Michael van Rooyen
While building a large index, we had a power outage. Over 2 million documents had been added, each document with up to about 20 fields. The size of the index on disk is ~500MB. When I started the process up again, I noticed that documents that should have been in the index were missing. In