Re: BUG ? - lucene multisearcher / sorting

2006-12-01 Thread Daniel Naber
On Friday 01 December 2006 15:16, Kai R. Emde wrote: > When we search "material" as an example, we found 207 hits in the the > index. When we search this index in the multisearcher, with 3 index, > there 206 hits contiguous and one after the next. OK bookA1, bookA2, > bookA3 ... bookA206, bookB1,

Re: an alternative to optimize?

2006-12-01 Thread Yonik Seeley
On 12/1/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote: Are the search statistics the same for the MultiReader? That is, would a search on a MultiReader over several small indexes necessarily return the same ranking as a single IndexReader on an optimized reader? Would they return the same actua

Re: an alternative to optimize?

2006-12-01 Thread Grant Ingersoll
Are the search statistics the same for the MultiReader? That is, would a search on a MultiReader over several small indexes necessarily return the same ranking as a single IndexReader on an optimized reader? Would they return the same actual scores? Just curious, I haven't tried MultiRead

Re: an alternative to optimize?

2006-12-01 Thread Chris Hostetter
: I haven't tried it, but according to http://lucene.apache.org/java/ : docs/fileformats.html, each segment is a complete sub index. I : _wonder_ if you couldn't manage your own merges by using : IndexWriter.addIndexes() where you load each segment in separately : (this may mean copying the segme

Re: Full text searching on documents saved in database as BLOB

2006-12-01 Thread Marcelo Ochoa
Hi Inderjeet: I am working in a full text searching implementation for Oracle Databases running Lucene on the Oracle JVM. The text searching functionality is ready yet, you can get latest code uploaded on Tuesday, see the attachment text for the detail of the new functionality included: http://i

Re: Incremental Index and Comparing different Scores from different Index

2006-12-01 Thread Chris Hostetter
: I have some questions about the scoring function and about how different : scores can be compared. ... : Querying indexday1 gives me some Hits with the best having score a : Querying indexday2 gives me some Hits with the best having score b : and so on . : : Now how can I compare tho

Re: Full text searching on documents saved in database as BLOB

2006-12-01 Thread Chris Lu
Lucene can work for this case, in case you can extract data out in plain text format. So whether your data is in BLOB or on file disk does not really matter, but you need to detect or tell what type of BLOB content it is, either by filename or binary format. Usually when storing BLOB content, the

Re: an alternative to optimize?

2006-12-01 Thread Grant Ingersoll
I haven't tried it, but according to http://lucene.apache.org/java/ docs/fileformats.html, each segment is a complete sub index. I _wonder_ if you couldn't manage your own merges by using IndexWriter.addIndexes() where you load each segment in separately (this may mean copying the segments

Re: 2.1-dev memory leak?

2006-12-01 Thread Michael McCandless
Otis Gospodnetic wrote: Hi Mike, Thanks for looking into this. I think your stress test may match my production environment. I think System.gc() never guarantees anything will happen, it's just a hint. I've got the following in one of my classes now. Maybe you can stick it in your stress te

Re: 2.1-dev memory leak?

2006-12-01 Thread Michael McCandless
Otis Gospodnetic wrote: Yeah, in this case, I'm running out of memory, and open file descriptors are, I think, just an indicator that IndexSearchers are not getting closed properly. I've already increased the open file descriptors limit, but I'm limited to 2GB of RAM on a 32-bit box. I'll tr

AW: BUG ? - lucene multisearcher / sorting

2006-12-01 Thread Kai R. Emde
Thanks for the reply, I try to explain what's happens - we use lucene 2.0 - is a little difficult to show a case, but it seems that the problem began, when the hits of one index reach 200 and above. At this moment at the end of the list there are like bits lost, of this index 1,2,3 hits. When we

Re: Full text searching on documents saved in database as BLOB

2006-12-01 Thread Erick Erickson
See this thread for an interesting Oracle-based solution. I admit I just skimmed it, so it may not really apply *Oracle and Lucene Integration* In general, you can't reach into an Oracle BLOB with lucene to search it because you haven't created a Lucene index. One approach is to index the BLO

an alternative to optimize?

2006-12-01 Thread Stanislav Jordanov
Guys, I've already asked this question but nobody answered: Suppose we have a relatively big index which is continuously updated - i.e. new docs get added while some of the old docs get deleted. For pragmatic reasons we have a restriction on maxMergeDocs so that segment files don't get enormou

Incremental Index and Comparing different Scores from different Index

2006-12-01 Thread Nils Höller
Hi, I have some questions about the scoring function and about how different scores can be compared. I use Lucene for indexing an archive of the web. /archive/day1/differentsites => indexday1 /archive/day2/differentsites => indexday2 /archive/day3/differentsites => indexday3 /archive/day4/differ

Re: any ides on this type of analyzer?

2006-12-01 Thread Soeren Pekrul
Hello Van, it looks like splitting of compound words. This topic was discussed in the thread "Analysis/tokenization of compound words" (http://www.gossamer-threads.com/lists/lucene/java-user/40164?do=post_view_threaded). The main idea is as follow: You have a corpus (lexicon/dictionary). You