Re: Poor QPS with highlighting

2009-02-03 Thread mark harwood
>>My documents are quite big sometimes up to 300ktokens. You could look at indexing them as seperate documents using overlapping sections of text. Erik used this for one of his projects. Cheers Mark - Original Message From: Michael Stoppelman To: java-user@lucene.apache.org Sent: Tu

Re: How to extract Document object after the search?

2009-02-03 Thread Ian Lea
> I have not seen much time difference between when I load the single field & > all the fields of a document. That's fine - sometimes it helps, sometimes it doesn't. Depends on the structure of your documents, maybe your hardware, maybe more. And sometimes a small difference, over many documents

Re: Performance issue

2009-02-03 Thread mittals
Hi, All of the fields are text. We have 3 @IndexedEmbedded object with in a object. users can search for any string, few of the cases are morgan, john, orcl, healthcare, pa ma, ma fin. here is e.g. of our one query: (+(trading.tn:pa^150.0 lastName:pa*^45.0 firstName:pa*^30.0 roleDesc:pa^3.0 rol

Re: How to extract Document object after the search?

2009-02-03 Thread Erick Erickson
Here's a writeup I did a couple of years ago that might help... http://wiki.apache.org/lucene-java/FieldSelectorPerformance?highlight=(fieldselector) Best Erick On Tue, Feb 3, 2009 at 5:06 AM, Ian Lea wrote: > > I have not seen much time difference between when I load the single field > & > >

waaaay too many files in the index!

2009-02-03 Thread John Byrne
Hi, I've got a weird problem with a lucene index, using 2.3.1. The index contains 6660 files. I don't know how this happened.Maybe somone can tell me something about the files themselves? (examples below) On one day, between 10 and 40 of these files were being created every minute. The index

term frequency normalization

2009-02-03 Thread Jochen Wersdörfer
Hi, i'd like to use the term frequency normalization described in http://wiki.apache.org/lucene-java/TREC%202007%20Million%20Queries%20Track%20-%20IBM%20Haifa%20Team so that the term frequency tf becomes tf(f, d) = log(1 + feq(t, d)) / log(1 + avgFreq(d)) The easiest way to change the tf calcu

Re: waaaay too many files in the index!

2009-02-03 Thread Matthew Hall
Did you optimize your index? If not, depending on your merge factor, this could be a very normal index for you. -Matt John Byrne wrote: Hi, I've got a weird problem with a lucene index, using 2.3.1. The index contains 6660 files. I don't know how this happened.Maybe somone can tell me som

Re: waaaay too many files in the index!

2009-02-03 Thread Michael Stoppelman
On Tue, Feb 3, 2009 at 7:26 AM, John Byrne wrote: > Hi, > > I've got a weird problem with a lucene index, using 2.3.1. The index > contains 6660 files. I don't know how this happened.Maybe somone can tell me > something about the files themselves? (examples below) > > On one day, between 10 and 4

Re: waaaay too many files in the index!

2009-02-03 Thread Erick Erickson
What are your IndexWriter MergFactor and MergeDocs set to? Also, are the dates on all these files indicative of all being create during the same indexing run? Finally, how many documents are you indexing? Best Erick On Tue, Feb 3, 2009 at 10:26 AM, John Byrne wrote: > Hi, > > I've got a weird

term frequency normalization

2009-02-03 Thread Jochen Wersdörfer
Hi, i'd like to use the term frequency normalization described in http://wiki.apache.org/lucene-java/TREC%202007%20Million%20Queries%20Track%20-%20IBM%20Haifa%20Team so that the term frequency tf becomes tf(f, d) = log(1 + feq(t, d)) / log(1 + avgFreq(d)) The easiest way to change the tf calcu

Re: Poor QPS with highlighting

2009-02-03 Thread Michael Stoppelman
On Tue, Feb 3, 2009 at 1:14 AM, mark harwood wrote: > >>My documents are quite big sometimes up to 300ktokens. > > You could look at indexing them as seperate documents using overlapping > sections of text. Erik used this for one of his projects. > Can you describe this in a little more detail; I

Re: Set a field as required in a MultiFieldQueryParser

2009-02-03 Thread Sylvain
What I wanted to do was to set at least one "SHOULD" field as required in a MFQP. Because of the MUST field I had, even if no SHOULD field matched the query, all the documents which had the corresponding MUST field were returned. I found a very easy solution to solve my problem :: BooleanQuery myQ

RE: Lunene 2.3-2.4 switch: Scoring change

2009-02-03 Thread Chris Hostetter
: To get the normalized scores use: ... : float score = hits[1].score / td.getMaxScore(); Strictly speaking, this code will not return the exact same scores as the deprecated Hits API. the Hits class only normalizes the scores if the max score is greater then 1.0f (yet another one of

Re: Poor QPS with highlighting

2009-02-03 Thread markharw00d
Can you describe this in a little more detail; I'm not exactly sure what you mean. Break your large text documents into multiple Lucene documents. Rather than dividing them up into entirely discreet chunks of text consider storing/indexing *overlapping* sections of text with an overlap as

TopDocCollector vs Hits: TopDocCollector slowing....

2009-02-03 Thread AlexElba
Hello, I was using lucene 2.3.2 with hits and switch to lucene 2.4.0 and now I am using TopDocCollector. I have two queries which are running against the same index. One query is returning 80bytes information other one is returning 2000bytes With old Hits the query which was returning smaller d