Re: Another way to handle large numeric range queries

2004-06-08 Thread Morus Walter
Don Gilbert writes: > > I ran into this problem using current Lucene implementation > of rangeQuery applied to genome data (search a chromosome > range from 1..20MB). We wanted to use lucene queries like > > +organism:fruitfly +chromosome:X +location:[100 500] > > to find all the ge

Another way to handle large numeric range queries

2004-06-08 Thread Don Gilbert
I ran into this problem using current Lucene implementation of rangeQuery applied to genome data (search a chromosome range from 1..20MB). We wanted to use lucene queries like +organism:fruitfly +chromosome:X +location:[100 500] to find all the genome features (1000s to 100,000s) th

Re: Setting Similarity in IndexWriter and IndexSearcher

2004-06-08 Thread Grant Ingersoll
I do these kind of things as part of a layer between Lucene and my application, but often have thought it would be nice to have a metadata layer available that wasn't part of the Lucene core, but was packaged w/ Lucene. It could provide the information necessary and have tools for updating with

Re: Performance: compound vs. multi-file index, indexing and searching

2004-06-08 Thread Doug Cutting
Otis Gospodnetic wrote: Can anyone comment on performance differences? I'd expect multi-threaded performance to be a bit worse with the compound format, but single-threaded performance should be nearly identical. Doug - To unsub

Re: Setting Similarity in IndexWriter and IndexSearcher

2004-06-08 Thread Doug Cutting
David Spencer wrote: Does it ever make sense to set the Similartity obj in either (only one of..) IndexWriter or IndexSearcher? i.e. If I set it in IndexWriter can I avoid setting it in IndexSearcher? Also, can I avoid setting it in IndexWriter and only set it in IndexSearcher? I noticed Nutch s

RE: Does Lucene support UNICODE?

2004-06-08 Thread Eric Isakson
org.apache.lucene.demo.FileDocument.Document(File) is invoked from IndexFiles and does: Reader reader = new BufferedReader(new InputStreamReader(is)); Notice that the InputStreamReader does not specify an encoding so your default encoding is being used. You should probably write your own gl

Re: lucene scoring

2004-06-08 Thread Ram Subbaroyan
Uddam answers inline: > - in DefaultSimilarity.queryNorm(float sumOfSquareWeights) : how does it > compute the query weight? To understand Lucene scoring it is easiest if you follow a query with only one term on a searchable. Here is the general flow of control for such a query: -IndexSearcher

Lucene Scoring question

2004-06-08 Thread Ram Subbaroyan
I have been trying to follow Lucene scoring across multiple searchables. And I do not see where the IDF gets normalized between searchables? (Sum DF across searchables in first half of query and use in second half of query execution to calculate right IDF across searchables.) Lets say you have one

RE: Performance: compound vs. multi-file index, indexing and searching

2004-06-08 Thread hui
I did the test earlier on 1.3 http://issues.apache.org/eyebrowse/[EMAIL PROTECTED] he.org&msgId=1408808 Regards, Hui -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 08, 2004 5:23 AM To: Lucene Users List Subject: Performance: compound vs. multi-f

Re: out of memory while indexing one single file

2004-06-08 Thread Otis Gospodnetic
Hello, I don't know if the author of CLucene is on this list. You may get better help on CLucene mailing list or forum on sf.net. Otis --- Yue Sun <[EMAIL PROTECTED]> wrote: > Hi, > > First, I am not sure if I should post my question here, since I am > using > CLucene (C++ port of Lucene) to

out of memory while indexing one single file

2004-06-08 Thread Yue Sun
Hi, First, I am not sure if I should post my question here, since I am using CLucene (C++ port of Lucene) to build indexes. Hope someone here could help me. I am indexing at a solaris machine with 1G memory. I use ram writer and fs writer, and write into fs index once a while. Now I am testing

Re: Performance: compound vs. multi-file index, indexing and searching

2004-06-08 Thread Eric Jain
Can anyone comment on performance differences? I just ran a comparison, indexing about 250'000 small documents. Both the time for indexing (239s) and the final disk space used (16.6MB) were identical. Haven't compared search performance, though I suspect I can save myself the effort...

lucene scoring

2004-06-08 Thread uddam chukmol
Hi all, It's so confusing the way Lucence computes the score. I tried to see what happaned but am blocked with some parameters' mystery. - in DefaultSimilarity.queryNorm(float sumOfSquareWeights) : how does it compute the query weight? - How does it compute the weight of each field in the ind

Re: Too many open files error occurs when changing 1.3 final to 1.4 rc2

2004-06-08 Thread Otis Gospodnetic
I am not 100% certain now, but I _think_ there were some changes that required that you re-index your data when upgrading to 1.4rc2. I would check the CHANGES file (link on the site, just look at the complete file). Otis --- juan lu <[EMAIL PROTECTED]> wrote: > I had been using 1.3 final for 1 m

Performance: compound vs. multi-file index, indexing and searching

2004-06-08 Thread Otis Gospodnetic
Hello, I was wondering if anyone can comment on the performance difference of compound versus multi-file indices. I am interested in both indexing and searching performance, and have tried testing indexing performance of both formats. My tests so far show no indexing performance differences betw

Re: Zilverline release candidate 1.0-rc3 available

2004-06-08 Thread Peter Becker
Hi Michael, I wonder if you would be interested in cooperating on the extracting/index management bit. We use Lucene and our own extractor plugins for a Swing-application: http://tockit.sf.net/docco Code can be found here: http://cvs.sourceforge.net/viewcvs.py/toscanaj/docco/ It is BSD-Style l

Re: problems with lucene in multithreaded environment

2004-06-08 Thread Jayant Kumar
--- Doug Cutting <[EMAIL PROTECTED]> wrote: > Jayant Kumar wrote: > > Thanks for the patch. It helped in increasing the > > search speed to a good extent. > > Good. I'll commit it. Thanks for testing it. > > > But when we tried to > > give about 100 queries in 10 seconds, then again > we > > f