RE: lucene farsi problem

2008-05-01 Thread esra
Hi Steve, thanks for your reply , i know farsi is written and read right-to-left. i am using RangeOuery class and it's rewrite(IndexReader reader) method decides if the word is in range or not by compareTo method and this decision is made by using unicodes. while searching for د-ژ range the

RE: Does Lucene Supports Billions of data

2008-05-01 Thread spring
Even if they're in multiple indexes, the doc IDs being ints will still prevent it going past 2Gi unless you wrap your own framework around it. Hm. Does this mean that a MultiReader has the int-limit too? I thought that this limit applies to a single index only...

hybrid query (lucene + db)

2008-05-01 Thread Stephane Nicoll
Hi there, We're using lucene with Hibernate search and we're very happy so far with the performance and the usability of lucene. We have however a specific use cases that prevent us to use only lucene: spatial queries. I already sent a mail on this list a while back about the problem and we

Re: lucene farsi problem

2008-05-01 Thread esra
Hi, document's encoding is UTF-8. i tried the explain() method and the result for د-ژ range searching is: fieldWeight(keywordIndex:ساب ووÙ�ر in 0), product of: 1.0 = tf(termFreq(keywordIndex:ساب ووÙ�ر)=1) 0.30685282 = idf(docFreq=1) 1.0 = fieldNorm(field=keywordIndex,

Re: hybrid query (lucene + db)

2008-05-01 Thread mark harwood
The issue here is a general one of trying to perform an efficient join between an external resource (rdbms) and Lucene. This experiment may be of interest: http://issues.apache.org/jira/browse/LUCENE-434 KeyMap.java embodies the core service which translates from lucene doc ids to DB

RE: lucene farsi problem

2008-05-01 Thread Steven A Rowe
Hi Esra, Going back to the original problem statement, I see something that looks illogical to me - please correct me if I'm wrong: On Apr 30, 2008, at 3:21 AM, esra wrote: i am using lucene's IndexSearcher to search the given xml by keyword which contains farsi information. while searching

Re: lucene farsi problem

2008-05-01 Thread Grant Ingersoll
On May 1, 2008, at 4:36 AM, esra wrote: Hi, document's encoding is UTF-8. i tried the explain() method and the result for د-ژ range searching is: fieldWeight(keywordIndex:ساب ووÙ�ر in 0), product of: 1.0 = tf(termFreq(keywordIndex:ساب ووÙ�ر)=1) 0.30685282 =

Re: ParalleReader and synchronization between indexes

2008-05-01 Thread Yonik Seeley
On Wed, Apr 30, 2008 at 10:52 PM, Rajesh parab [EMAIL PROTECTED] wrote: Can we somehow keep internal document id same after updating (i.e. delete and re-insert) index document? No. ParallelReader is not a general solution, it's an expert-level solution that leaves the task of keeping the

Re: ParalleReader and synchronization between indexes

2008-05-01 Thread Rajesh parab
Thanks Yonik. So, if rebuilding the second index is not an option due to large no of documents, then ParallelReader will not work :-( And I believe there is no other way than parallelReader to search across multiple indexes that contain related data. Is there any other alternative? I think,

Re: Does Lucene Supports Billions of data

2008-05-01 Thread John Wang
I am not sure why this is the case, docid is internal to the sub index. As long as the sub index size is below 2 bil, there is no need for docid to be long. With multiple indexes, I was thinking having an aggregater which merges maybe only a page of search result. Example: sub index 1: 1 billion

Re: Does Lucene Supports Billions of data

2008-05-01 Thread Toke Eskildsen
From: John Wang [EMAIL PROTECTED] [...] sub index 1: 1 billion docs sub index 2: 1 billion docs sub index 3: 1 billion docs federating search to these subindexes, you represent an index of 3 billiondocs, and all internal doc ids are of type int. That falls under Daniel's ...unless you

Re: ParalleReader and synchronization between indexes

2008-05-01 Thread Otis Gospodnetic
That's correct, Rajesh. ParallelReader has its uses, but I guess your case is not one of them, unless we are all missing some key aspect of PR or a trick to make it work in your case. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Rajesh

Re: Does Lucene Supports Billions of data

2008-05-01 Thread Otis Gospodnetic
Right. And the typical answer to that is: - If your terms are roughly equally distributed in all N indices (e.g. random doc-index/shard assignment), the relevance score will roughly match. - If you have business rules for doc-index/shard distribution, then your relevance scores will not be

Re: ParalleReader and synchronization between indexes

2008-05-01 Thread Rajesh parab
One trick I can think of is somehow keeping internal document id of Lucene document same after document is updated (i.e. deleted and re-inserted). I am not sure if we have this capability in Lucene. Regards, Rajesh --- Otis Gospodnetic [EMAIL PROTECTED] wrote: That's correct, Rajesh.