Hi Steve,
thanks for your reply , i know farsi is written and read right-to-left.
i am using RangeOuery class and it's rewrite(IndexReader reader) method
decides if the word is in range or not by compareTo method and this decision
is made by using unicodes.
while searching for د-ژ range the
Even if they're in multiple indexes, the doc IDs being ints
will still prevent
it going past 2Gi unless you wrap your own framework around it.
Hm. Does this mean that a MultiReader has the int-limit too?
I thought that this limit applies to a single index only...
Hi there,
We're using lucene with Hibernate search and we're very happy so far
with the performance and the usability of lucene. We have however a
specific use cases that prevent us to use only lucene: spatial
queries. I already sent a mail on this list a while back about the
problem and we
Hi,
document's encoding is UTF-8.
i tried the explain() method and the result for د-ژ range searching is:
fieldWeight(keywordIndex:ساب وو�ر in 0), product of:
1.0 = tf(termFreq(keywordIndex:ساب وو�ر)=1)
0.30685282 = idf(docFreq=1)
1.0 = fieldNorm(field=keywordIndex,
The issue here is a general one of trying to perform an efficient join between
an external resource (rdbms) and Lucene.
This experiment may be of interest:
http://issues.apache.org/jira/browse/LUCENE-434
KeyMap.java embodies the core service which translates from lucene doc ids to
DB
Hi Esra,
Going back to the original problem statement, I see something that looks
illogical to me - please correct me if I'm wrong:
On Apr 30, 2008, at 3:21 AM, esra wrote:
i am using lucene's IndexSearcher to search the given xml by
keyword which contains farsi information.
while searching
On May 1, 2008, at 4:36 AM, esra wrote:
Hi,
document's encoding is UTF-8.
i tried the explain() method and the result for د-ژ range
searching is:
fieldWeight(keywordIndex:ساب وو�ر in 0),
product of:
1.0 = tf(termFreq(keywordIndex:ساب وو�ر)=1)
0.30685282 =
On Wed, Apr 30, 2008 at 10:52 PM, Rajesh parab [EMAIL PROTECTED] wrote:
Can we somehow keep
internal document id same after updating (i.e. delete
and re-insert) index document?
No. ParallelReader is not a general solution, it's an expert-level
solution that leaves the task of keeping the
Thanks Yonik.
So, if rebuilding the second index is not an option
due to large no of documents, then ParallelReader will
not work :-(
And I believe there is no other way than
parallelReader to search across multiple indexes that
contain related data. Is there any other alternative?
I think,
I am not sure why this is the case, docid is internal to the sub index. As
long as the sub index size is below 2 bil, there is no need for docid to be
long. With multiple indexes, I was thinking having an aggregater which
merges maybe only a page of search result.
Example:
sub index 1: 1 billion
From: John Wang [EMAIL PROTECTED]
[...]
sub index 1: 1 billion docs
sub index 2: 1 billion docs
sub index 3: 1 billion docs
federating search to these subindexes, you represent an index of 3
billiondocs, and all internal doc ids are of type int.
That falls under Daniel's ...unless you
That's correct, Rajesh. ParallelReader has its uses, but I guess your case is
not one of them, unless we are all missing some key aspect of PR or a trick to
make it work in your case.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Rajesh
Right. And the typical answer to that is:
- If your terms are roughly equally distributed in all N indices (e.g. random
doc-index/shard assignment), the relevance score will roughly match.
- If you have business rules for doc-index/shard distribution, then your
relevance scores will not be
One trick I can think of is somehow keeping internal
document id of Lucene document same after document is
updated (i.e. deleted and re-inserted). I am not sure
if we have this capability in Lucene.
Regards,
Rajesh
--- Otis Gospodnetic [EMAIL PROTECTED]
wrote:
That's correct, Rajesh.
14 matches
Mail list logo