On Friday 01 December 2006 15:16, Kai R. Emde wrote:
> When we search "material" as an example, we found 207 hits in the the
> index. When we search this index in the multisearcher, with 3 index,
> there 206 hits contiguous and one after the next. OK bookA1, bookA2,
> bookA3 ... bookA206, bookB1,
On 12/1/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
Are the search statistics the same for the MultiReader? That is,
would a search on a MultiReader over several small indexes
necessarily return the same ranking as a single IndexReader on an
optimized reader? Would they return the same actua
Are the search statistics the same for the MultiReader? That is,
would a search on a MultiReader over several small indexes
necessarily return the same ranking as a single IndexReader on an
optimized reader? Would they return the same actual scores?
Just curious, I haven't tried MultiRead
: I haven't tried it, but according to http://lucene.apache.org/java/
: docs/fileformats.html, each segment is a complete sub index. I
: _wonder_ if you couldn't manage your own merges by using
: IndexWriter.addIndexes() where you load each segment in separately
: (this may mean copying the segme
Hi Inderjeet:
I am working in a full text searching implementation for Oracle
Databases running Lucene on the Oracle JVM.
The text searching functionality is ready yet, you can get latest
code uploaded on Tuesday, see the attachment text for the detail of
the new functionality included:
http://i
: I have some questions about the scoring function and about how different
: scores can be compared.
...
: Querying indexday1 gives me some Hits with the best having score a
: Querying indexday2 gives me some Hits with the best having score b
: and so on .
:
: Now how can I compare tho
Lucene can work for this case, in case you can extract data out in
plain text format. So whether your data is in BLOB or on file disk
does not really matter, but you need to detect or tell what type of
BLOB content it is, either by filename or binary format.
Usually when storing BLOB content, the
I haven't tried it, but according to http://lucene.apache.org/java/
docs/fileformats.html, each segment is a complete sub index. I
_wonder_ if you couldn't manage your own merges by using
IndexWriter.addIndexes() where you load each segment in separately
(this may mean copying the segments
Otis Gospodnetic wrote:
Hi Mike,
Thanks for looking into this. I think your stress test may match my production
environment.
I think System.gc() never guarantees anything will happen, it's just a hint.
I've got the following in one of my classes now. Maybe you can stick it in
your stress te
Otis Gospodnetic wrote:
Yeah, in this case, I'm running out of memory, and open file descriptors are, I
think, just an indicator that IndexSearchers are not getting closed properly.
I've already increased the open file descriptors limit, but I'm limited to 2GB
of RAM on a 32-bit box.
I'll tr
Thanks for the reply,
I try to explain what's happens -
we use lucene 2.0 - is a little difficult to show a case, but it seems that
the problem began, when the hits of one index reach 200 and above.
At this moment at the end of the list there are like bits lost, of this
index 1,2,3 hits.
When we
See this thread for an interesting Oracle-based solution. I admit I just
skimmed it, so it may not really apply
*Oracle and Lucene Integration*
In general, you can't reach into an Oracle BLOB with lucene to search it
because you haven't created a Lucene index. One approach is to index the
BLO
Guys,
I've already asked this question but nobody answered:
Suppose we have a relatively big index which is continuously updated -
i.e. new docs get added while some of the old docs get deleted.
For pragmatic reasons we have a restriction on maxMergeDocs so that
segment files don't get enormou
Hi,
I have some questions about the scoring function and about how different
scores can be compared.
I use Lucene for indexing an archive of the web.
/archive/day1/differentsites => indexday1
/archive/day2/differentsites => indexday2
/archive/day3/differentsites => indexday3
/archive/day4/differ
Hello Van,
it looks like splitting of compound words. This topic was discussed in
the thread "Analysis/tokenization of compound words"
(http://www.gossamer-threads.com/lists/lucene/java-user/40164?do=post_view_threaded).
The main idea is as follow:
You have a corpus (lexicon/dictionary). You
15 matches
Mail list logo