Re: HighFreqTerms for results set

2011-07-20 Thread Erik Fäßler
I might be mistaken here, but why exactly wouldn't you use the facet approach? I don't know exactly about how to do this in core Lucene but with Solr it works very well also for multi-valued fields. You could just say "give me the 100 most frequent terms in field X" for each field you're inter

Re: use of FieldInvertState class

2011-07-20 Thread bryant88
Ok. I need to use this class to get length of a given field at index time to pass it as parameter to the computeNorm method from a custom Similarity. The issue is that i don't know how to get a FieldInvertState object and precisely i didn't understand if I have to create and populate it myself o

Re: please kick me out off lucene email group

2011-07-20 Thread G.Long
Hi :) You can send an email to java-user-unsubscr...@lucene.apache.org to unsuscribe from the Lucene java users mailing list :) Regards, Gary Le 20/07/2011 03:46, 郭大伟 a écrit : Hello, I'm receiving more than 50 e-mails per day, which are sended by java-user-return-50172-kavguodawei=126...

Re: use of FieldInvertState class

2011-07-20 Thread Ian Lea
FieldInvertState is passed to Similarity.computeNorm, as Robert said. So your custom Similarity just has to override computeNorm(String field, FieldInvertState state) and can extract whatever it wants from the FieldInvertState passed in. -- Ian. On Wed, Jul 20, 2011 at 9:14 AM, bryant88 wrote:

Re: use of FieldInvertState class

2011-07-20 Thread bryant88
Ok thanks, maybe i understood. I will try to store the field length in a Map in my custom Similarity and then pick them with a getLength() method and then divide it by the number of docs so i can have the average length of a field. thank you Raffaele Branda -- View this message in context:

Re: FW: Indexer Threads Getting Into BLOCKED State While Optimization Taking Place On Large Indexes Of Size > 2GB

2011-07-20 Thread Michael McCandless
Hmm can you double-check your Lucene version? SerialMergeScheduler wasn't added until 2.3, so you are at least at that version. It looks like you are using SerialMergeScheduler, which, by design, can only do one merge at a time (this is why you see the threads BLOCKED). You can try switching to

ignore boolean clauses QueryParser

2011-07-20 Thread Raffaele Branda
Dear Lucene developers, is there a way to let the QueryParser ignore the boolean operators? I need to send as query the source code of a class but obviously when in the string given as input to the QueryParser there is a "and" or "or" or "not" it sees them like they are boolean operators and it gi

optimize with num segments > 1 index keeps growing

2011-07-20 Thread v . sevel
Hi, I index several millions small documents per day. each day, I remove some of the older documents to keep the index at a stable number of documents. after each purge, I commit then I optimize the index. what I found is that if I keep optimizing with max num segments = 2, then the index keeps

Search within a sentence (revisited)

2011-07-20 Thread Peter Keegan
I have browsed many suggestions on how to implement 'search within a sentence', but all seem to have drawbacks. For example, from http://lucene.472066.n3.nabble.com/Issue-with-sentence-specific-search-td1644352.html#a1645072 Steve Rowe writes: -- One common technique, instead of using a l

Re: Search within a sentence (revisited)

2011-07-20 Thread darren
I just parse the text into sentences and put those in a multi-valued field and then search that. On Wed, 20 Jul 2011 11:27:38 -0400, Peter Keegan wrote: > I have browsed many suggestions on how to implement 'search within a > sentence', but all seem to have drawbacks. For example, from > http:/

Re: Search within a sentence (revisited)

2011-07-20 Thread Peter Keegan
It seems to me that to constrain the search to a sentence this way, you'd have to override 'getPositionIncrementGap', which would then break phrase searches across the field values (sentences). Peter On Wed, Jul 20, 2011 at 11:33 AM, wrote: > > I just parse the text into sentences and put those

Short circuiting Collector

2011-07-20 Thread Chris Bamford
Hi there, I have my own Collector implementation which I use for searching, something like this skeleton: public class LightweightHitCollector extends Collector { private int maxHits; private int numHits; private int docBase; private boolean collecting; private Scorer scorer

Re: Short circuiting Collector

2011-07-20 Thread Devon H. O'Dell
2011/7/20 Chris Bamford : > Hi there, > > I have my own Collector implementation which I use for searching, something > like this skeleton: [snip] > Question: is there a way to prevent collect() being called after it has > collected its quota  (i.e. when collecting becomes false)?  On large dat

SweetSpotSimilarity

2011-07-20 Thread Tajti Ákos
Dear List, in our application there are many long documents that we index. Previously we had a problem with lucene's scoring: some documents got low scores because their lengths. Then we started to use SweetSpotSimilarity and it seemed to solve the problem. But now we face an other difficulty:

Re: Short circuiting Collector

2011-07-20 Thread Simon Willnauer
you can advance the scorer to NO_MORE_DOCS if you have collected enough documents this will stop the loop. scorer.advance(Scorer.NO_MORE_DOCS); simon On Wed, Jul 20, 2011 at 6:53 PM, Devon H. O'Dell wrote: > 2011/7/20 Chris Bamford : >> Hi there, >> >> I have my own Collector implementation whi

Re: optimize with num segments > 1 index keeps growing

2011-07-20 Thread Simon Willnauer
On Wed, Jul 20, 2011 at 2:00 PM, wrote: > Hi, > > I index several millions small documents per day. each day, I remove some > of the older documents to keep the index at a stable number of documents. > after each purge, I commit then I optimize the index. what I found is that > if I keep optimizi

Different Index Reader creation method affecting result

2011-07-20 Thread Saurabh Gokhale
Hi All, I am using Lucene 3.1 in the project. *Background for the question:* I am working on the application which starts with 2 threads, one performs indexing activity and other performs searching activity (I create searcher object from reader object). Both these threads run periodically and ind

Re: Different Index Reader creation method affecting result

2011-07-20 Thread Simon Willnauer
On Wed, Jul 20, 2011 at 11:50 PM, Saurabh Gokhale wrote: > Hi All, > > I am using Lucene 3.1 in the project. > > *Background for the question:* > I am working on the application which starts with 2 threads, one performs > indexing activity and other performs searching activity (I create searcher >

Re: Search within a sentence (revisited)

2011-07-20 Thread Mark Miller
On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote: > Mark Miller's 'SpanWithinQuery' patch > seems to have the same issue. If I remember right (It's been more the a couple years), I did index the sentence markers at the same position as the last word in the sentence. And I think the limitation

Re: Search within a sentence (revisited)

2011-07-20 Thread Mark Miller
On Jul 20, 2011, at 7:44 PM, Mark Miller wrote: > > On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote: > >> Mark Miller's 'SpanWithinQuery' patch >> seems to have the same issue. > > If I remember right (It's been more the a couple years), I did index the > sentence markers at the same positio

Re: HighFreqTerms for results set

2011-07-20 Thread Israel Tsadok
This is very interesting. Do you know how query faceting is implemented?