Re: waaaay too many files in the index!

2009-02-04 Thread John Byrne
MergeFactor and MergeDocs are left at default values. The indexing is incremental, i.e. whenever someone adds or modifys a file to in svn repository, the lucene index is updated, and the writer/reader/searcher are refreshed (closed and opened again)., According to the svn logs for the time the

Re: Poor QPS with highlighting

2009-02-04 Thread Michael Stoppelman
Thanks Mark for the explanation. I think your solution would definitely change the tf-idf scoring for documents since your field is now split up over multiple docs. One option to get around the changing scoring would be to to run a completely separate index for highlighting (with the overlapping d

Re: waaaay too many files in the index!

2009-02-04 Thread Michael McCandless
These files are normal Lucene segment files (in compound file format). What's odd is that Lucene is not merging them down to a smaller set of segments. Have you done any advanced things, like customize the deletion or merge policy? When you close you writer, are you using just close()

Re: Lunene 2.3-2.4 switch: Scoring change

2009-02-04 Thread Grant Ingersoll
How are you using the score? The fact that you want them back to the old way implies to me that you are using them for something other than for sorting the results. On Jan 29, 2009, at 8:21 PM, AlexElba wrote: Hello, I have project which I am trying to switch from lucene 2.3.2 to 2.4 I

Re: TopDocCollector vs Hits: TopDocCollector slowing....

2009-02-04 Thread Grant Ingersoll
I presume they are both now slower, right? Otherwise you wouldn't mind the speedup on the bigger one. Hits did caching and prefetched things, which has it's tradeoffs. Can you describe how you were measuring the queries? How many results were you getting? -Grant On Feb 3, 2009, at 8:

Re: waaaay too many files in the index!

2009-02-04 Thread Michael McCandless
OK thanks for bringing closure. Mike John Byrne wrote: No I'm not messing with the delete or merge policy - but I think I know what went wrong though... We have 2 instances of the application, for failover. They are never supposed to be active at the same time, but I just discovered a

Re: MergePolicy$MergeException during IndexWriter.addIndexesNoOptimize

2009-02-04 Thread Michael McCandless
Hmm... this is not in fact considered a fatal error to addIndexesNoOptimize. If you were to optimize(), you would then see an exception thrown. Here's why: when addIndexesNoOptimize runs, it simply appends the imported segment description to the internal SegmentInfos and then asks the MergePolic

Re: waaaay too many files in the index!

2009-02-04 Thread John Byrne
No I'm not messing with the delete or merge policy - but I think I know what went wrong though... We have 2 instances of the application, for failover. They are never supposed to be active at the same time, but I just discovered a condition that can cause exactly that to happen. When we detec

FieldCache Question

2009-02-04 Thread Todd Benge
Hi, I've been looking into the FieldCache API because of memory problems we've been seeing in our production environment. We use various different sorts so over time the cache builds up and servers stop responding. I decided to apply the patch for JIRA 831: https://issues.apache.org/jira/browse

Re: FieldCache Question

2009-02-04 Thread Mark Miller
Todd Benge wrote: Hi, I've been looking into the FieldCache API because of memory problems we've been seeing in our production environment. We use various different sorts so over time the cache builds up and servers stop responding. I decided to apply the patch for JIRA 831: https://issues.ap

Re: FieldCache Question

2009-02-04 Thread Todd Benge
On Wed, Feb 4, 2009 at 10:01 AM, Mark Miller wrote: > Todd Benge wrote: >> >> Hi, >> >> I've been looking into the FieldCache API because of memory problems >> we've been seeing in our production environment. >> >> We use various different sorts so over time the cache builds up and >> servers stop

Re: FieldCache Question

2009-02-04 Thread Mark Miller
Todd Benge wrote: The intent is to reduce the amount of memory that is held in cache. As it is now, it looks like there is an array of comparators for each index reader. Most of the data in the array appears to be the same for each cache so there is duplication for each type ( string, float).

Re: FieldCache Question

2009-02-04 Thread Todd Benge
On Wed, Feb 4, 2009 at 10:41 AM, Mark Miller wrote: > Todd Benge wrote: > >> >> The intent is to reduce the amount of memory that is held in cache. As it >> is now, it looks like there is an array of comparators for each index >> reader. Most of the data in the array appears to be the same for

How to index correctly taking in account the synonyms using Wordnet ???

2009-02-04 Thread Ariel
Hi every body: I am using wordnet to index my document taking in account the synonyms with wordnet. After I indexed the whole documents collections I made a query with the word "snort" but documents that contain the word bird are retrieved, I don't understand this because snort and bird are not sy

Re: How to index correctly taking in account the synonyms using Wordnet ???

2009-02-04 Thread Erick Erickson
The first thing I'd do is get a copy of luke (google lucene luke) and examine your index to see what's actually there in the document you claim in incorrectly returned. If that doesn't enlighten you, you really have to provide more details and code examples, because your question is unanswerable as

Re: How to index correctly taking in account the synonyms using Wordnet ???

2009-02-04 Thread Ariel
Well, I have the luke 0.8, I opened my index with that tool but there is not any clue of synonyms in the field I have indexed with the synonym analyzer. I don't know how can I see the group of synonyms of each term, sould somebody tell me hot to do that ??? On Wed, Feb 4, 2009 at 5:09 PM, Erick

Re: How to index correctly taking in account the synonyms using Wordnet ???

2009-02-04 Thread Ariel
How can I see the senses of a word with wordnet ??? And How could I select the most populars ??? Is there a way to make queries ignoring the synonyms I have added to the index ??? I hope you can help me. Regards Ariel On Wed, Feb 4, 2009 at 7:46 PM, Manu Konchady wrote: > > > > --- On Wed, 4/

Re: TopDocCollector vs Hits inquiry

2009-02-04 Thread Jay Malaluan
Hi, As I was reading the post "Re: TopDocCollector vs Hits: TopDocCollector slowing", I just got curious on how he explained his change from Hits to TopDocCollector. I'm assuming that the Hits is returned from a call of: Searcher searcher = new Searcher(); searcher.search(xxx, xxx) - that wil