Different hit counts for same qyery string with same index files

2006-04-03 Thread nirma prasad
Hi All, I am using Lucene for indexing and searching. I am getting different hit counts for same query string. For example first time it gives 10 hits then second time it gives 20 hits with same index files. If any one has any idea then please let me know. I am adding the code: IndexSearcher

Re: Benchmarkers

2006-04-03 Thread Marvin Humphrey
On Apr 3, 2006, at 5:43 PM, Doug Cutting wrote: Marvin Humphrey wrote: Plucene is a Lucene 1.3 port, so it doesn't have max_buffered_docs -- but I can set merge_factor to 1000. I would not recommend that. With a merge factor that high you may run out of file handles, and, moreover, I do

Re: Benchmarkers

2006-04-03 Thread Doug Cutting
Marvin Humphrey wrote: Plucene is a Lucene 1.3 port, so it doesn't have max_buffered_docs -- but I can set merge_factor to 1000. I would not recommend that. With a merge factor that high you may run out of file handles, and, moreover, I doubt that disks are very efficient when reading from

Re: Benchmarkers

2006-04-03 Thread Marvin Humphrey
On Apr 3, 2006, at 11:11 AM, Doug Cutting wrote: You might still, if you have time, try swapping in something like StopAnalyzer and/or turning off Field.Store.YES. The relative speeds of the various implementations may vary in interesting ways, since these paramters may emphasize differen

Re: Benchmarkers

2006-04-03 Thread Marvin Humphrey
On Apr 3, 2006, at 10:36 AM, Doug Cutting wrote: Marvin Humphrey wrote: IndexWriter writer = new IndexWriter(indexDir, new WhitespaceAnalyzer(), true); Please make sure that analyzers are comparable between the various engines you benchmark. WhitespaceAnalyzer is efficient, but

Re: Benchmarkers

2006-04-03 Thread Marvin Humphrey
On Apr 3, 2006, at 6:57 AM, Yonik Seeley wrote: A couple of points: - Are all the lucene variations using the same index parameters? max buffered docs, index format (compound or not), mergeFactor, etc I personally use non-compound index format, max buffered docs=1000, mergeFactor=10

Re: Semantics of a closed IndexInput

2006-04-03 Thread Doug Cutting
Grant Ingersoll wrote: Should it be the case that you can clone a closed IndexInput and get a valid object that is capable of reading? B/c this is what I am seeing in my Lazy implementation (note, it seems to work fine...) I am just not sure if it should work or if it is a bug. Cloned Index

Semantics of a closed IndexInput

2006-04-03 Thread Grant Ingersoll
Should it be the case that you can clone a closed IndexInput and get a valid object that is capable of reading? B/c this is what I am seeing in my Lazy implementation (note, it seems to work fine...) I am just not sure if it should work or if it is a bug. Also, would it be useful to have a i

Re: [jira] Commented: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans

2006-04-03 Thread Paul Elschot
On Monday 03 April 2006 21:15, Yonik Seeley (JIRA) wrote: > [ http://issues.apache.org/jira/browse/LUCENE-413?page=comments#action_12372987 ] > > Yonik Seeley commented on LUCENE-413: > - > > Excellent news! Yes, after all those months :) > > I'll do

Re: [newbie]problem about range query

2006-04-03 Thread Raghavendra Prabhu
I think it is because the range query exceeds org.apache.lucene.search.BooleanQuery$TooManyClauses Try a small query range. What kind of a field is id. Anyway i see that the range is less in your case. Sometimes big ranges go out of the maximum which lucene can handle and boolean query exceeds t

[jira] Commented: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans

2006-04-03 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-413?page=comments#action_12372987 ] Yonik Seeley commented on LUCENE-413: - Excellent news! I'll do a quick review of the changes to SpanScorer, clean up the tests, and commit (just so no one is duplicating

[newbie]problem about range query

2006-04-03 Thread Dedian Guo
I used following code to do my range searching: IndexSearcher searcher = new IndexSearcher("index"); String qstr = "id:[1 TO 2]"; Analyzer analyzer = new StandardAnalyzer(); QueryParser qp = new QueryParser("title",analyzer); Query query = qp.parse(qstr);

[jira] Commented: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans

2006-04-03 Thread Dallan Quass (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-413?page=comments#action_12372983 ] Dallan Quass commented on LUCENE-413: - That fixed it! I made the patch in DisjunctionSumScorerPath5.txt and used the posted SpanScorer, and I'm no longer experiencing the

Re: Benchmarkers

2006-04-03 Thread Doug Cutting
Doug Cutting wrote: Please make sure that analyzers are comparable between the various engines you benchmark. I just went back and re-read what you're benchmarking, and they're all versions of Lucene, so you're probably already using comparable analyzers! Sorry for not noticing that the firs

[jira] Commented: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans

2006-04-03 Thread paul.elschot (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-413?page=comments#action_12372978 ] paul.elschot commented on LUCENE-413: - >> The same documents still match in the tests, only some score values change. > Could you elaborate on this point? Are the old or n

Re: Benchmarkers

2006-04-03 Thread Doug Cutting
Marvin Humphrey wrote: IndexWriter writer = new IndexWriter(indexDir, new WhitespaceAnalyzer(), true); Please make sure that analyzers are comparable between the various engines you benchmark. WhitespaceAnalyzer is efficient, but results in far more tokens and terms than, e.g., Sto

[jira] Commented: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans

2006-04-03 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-413?page=comments#action_12372961 ] Yonik Seeley commented on LUCENE-413: - > The same documents still match in the tests, only some score values change. Could you elaborate on this point? Are the old or new

Re: Benchmarkers

2006-04-03 Thread karl wettin
3 apr 2006 kl. 16.50 skrev Grant Ingersoll: And if possible, it would be very interesting to see results using -d64 and -d32. And different platforms. So far I've got best results in decending order on Solaris, OS X and last(!) Linux. Solaris is straight out amazing under heavy load. Might

Re: Benchmarkers

2006-04-03 Thread Grant Ingersoll
karl wettin wrote: And if possible, it would be very interesting to see results using -d64 and -d32. And different platforms. So far I've got best results in decending order on Solaris, OS X and last(!) Linux. Solaris is straight out amazing under heavy load. Might even do the switch next

Re: Benchmarkers

2006-04-03 Thread karl wettin
3 apr 2006 kl. 15.57 skrev Yonik Seeley: - use enough heap so too much time isn't taken in GC I recommend -XX:+AggressiveHeap. And if possible, it would be very interesting to see results using - d64 and -d32. And different platforms. So far I've got best results in decending order on

Re: Benchmarkers

2006-04-03 Thread Yonik Seeley
Hi Marvin, A couple of points: - Are all the lucene variations using the same index parameters? max buffered docs, index format (compound or not), mergeFactor, etc I personally use non-compound index format, max buffered docs=1000, mergeFactor=10 - reading in the file line by line probably