date:20080501

Re: hybrid query (lucene + db)

2008-05-01 Thread Stephane Nicoll

Well for the moment we don't. The lucene index only contains the full text content (indexed, not stored). We use lucene to perform full text and fuzzy searches on the keywords field. Once we have the result, we match them with the geospatial box provided by the user (we use Oracle Spatial for that)

Re: ParalleReader and synchronization between indexes

2008-05-01 Thread Rajesh parab

One trick I can think of is somehow keeping internal document id of Lucene document same after document is updated (i.e. deleted and re-inserted). I am not sure if we have this capability in Lucene. Regards, Rajesh --- Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > That's correct, Rajesh. Paral

Re: Does Lucene Supports Billions of data

2008-05-01 Thread Otis Gospodnetic

Right. And the typical answer to that is: - If your terms are roughly equally distributed in all N indices (e.g. random doc->index/shard assignment), the relevance score will roughly match. - If you have business rules for doc->index/shard distribution, then your relevance scores will not be c

Re: ParalleReader and synchronization between indexes

2008-05-01 Thread Otis Gospodnetic

That's correct, Rajesh. ParallelReader has its uses, but I guess your case is not one of them, unless we are all missing some key aspect of PR or a trick to make it work in your case. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Rajesh

Re: Does Lucene Supports Billions of data

2008-05-01 Thread Toke Eskildsen

From: John Wang <[EMAIL PROTECTED]> [...] > sub index 1: 1 billion docs > sub index 2: 1 billion docs > sub index 3: 1 billion docs > > federating search to these subindexes, you represent an index of 3 > billiondocs, and all internal doc ids are of type int. That falls under Daniel's "...unless

Re: Does Lucene Supports Billions of data

2008-05-01 Thread John Wang

I am not sure why this is the case, docid is internal to the sub index. As long as the sub index size is below 2 bil, there is no need for docid to be long. With multiple indexes, I was thinking having an aggregater which merges maybe only a page of search result. Example: sub index 1: 1 billion

Re: hybrid query (lucene + db)

2008-05-01 Thread Michael Stoppelman

Stephane, Could you describe how you setup the spatial area? Having BooleanQuery with 200 terms in it definitely slows things down (I'm not sure exactly why yet -- it seems like it shouldn't be "that" slow). If you can describe your spatial area in fewer terms you can get much better performance.

Re: ParalleReader and synchronization between indexes

2008-05-01 Thread Rajesh parab

Thanks Yonik. So, if rebuilding the second index is not an option due to large no of documents, then ParallelReader will not work :-( And I believe there is no other way than parallelReader to search across multiple indexes that contain related data. Is there any other alternative? I think, Multi

Re: ParalleReader and synchronization between indexes

2008-05-01 Thread Yonik Seeley

On Wed, Apr 30, 2008 at 10:52 PM, Rajesh parab <[EMAIL PROTECTED]> wrote: > Can we somehow keep > internal document id same after updating (i.e. delete > and re-insert) index document? No. ParallelReader is not a general solution, it's an expert-level solution that leaves the task of keeping t

Re: lucene farsi problem

2008-05-01 Thread Grant Ingersoll

On May 1, 2008, at 4:36 AM, esra wrote: Hi, document's encoding is "UTF-8". i tried the explain() method and the result for "د-ژ" range searching is: fieldWeight(keywordIndex:Ø³Ø§Ø¨ ÙˆÙˆÙ�Ø± in 0), product of: 1.0 = tf(termFreq(keywordIndex:Ø³Ø§Ø¨ ÙˆÙˆÙ�Ø±)=1) 0.30685282 = idf(do

RE: lucene farsi problem

2008-05-01 Thread Steven A Rowe

Hi Esra, Going back to the original problem statement, I see something that looks illogical to me - please correct me if I'm wrong: On Apr 30, 2008, at 3:21 AM, esra wrote: > i am using lucene's "IndexSearcher" to search the given xml by > keyword which contains farsi information. > while search

Re: hybrid query (lucene + db)

2008-05-01 Thread mark harwood

The issue here is a general one of trying to perform an efficient join between an external resource (rdbms) and Lucene. This experiment may be of interest: http://issues.apache.org/jira/browse/LUCENE-434 KeyMap.java embodies the core service which translates from lucene doc ids to DB primary

Re: lucene farsi problem

2008-05-01 Thread esra

Hi, document's encoding is "UTF-8". i tried the explain() method and the result for "د-ژ" range searching is: fieldWeight(keywordIndex:Ø³Ø§Ø¨ ÙˆÙˆÙ�Ø± in 0), product of: 1.0 = tf(termFreq(keywordIndex:Ø³Ø§Ø¨ ÙˆÙˆÙ�Ø±)=1) 0.30685282 = idf(docFreq=1) 1.0 = fieldNorm(field=keywordIndex,

hybrid query (lucene + db)

2008-05-01 Thread Stephane Nicoll

Hi there, We're using lucene with Hibernate search and we're very happy so far with the performance and the usability of lucene. We have however a specific use cases that prevent us to use only lucene: spatial queries. I already sent a mail on this list a while back about the problem and we starte

RE: Does Lucene Supports Billions of data

2008-05-01 Thread spring

> Even if they're in multiple indexes, the doc IDs being ints > will still prevent > it going past 2Gi unless you wrap your own framework around it. Hm. Does this mean that a MultiReader has the int-limit too? I thought that this limit applies to a single index only...

RE: lucene farsi problem

2008-05-01 Thread esra

Hi Steve, thanks for your reply , i know farsi is written and read right-to-left. i am using RangeOuery class and it's rewrite(IndexReader reader) method decides if the word is in range or not by compareTo method and this decision is made by using unicodes. while searching for "د-ژ" range the lo

Re: hybrid query (lucene + db)

Re: ParalleReader and synchronization between indexes

Re: Does Lucene Supports Billions of data

Re: ParalleReader and synchronization between indexes

Re: Does Lucene Supports Billions of data

Re: Does Lucene Supports Billions of data

Re: hybrid query (lucene + db)

Re: ParalleReader and synchronization between indexes

Re: ParalleReader and synchronization between indexes

Re: lucene farsi problem

RE: lucene farsi problem

Re: hybrid query (lucene + db)

Re: lucene farsi problem

hybrid query (lucene + db)

RE: Does Lucene Supports Billions of data

RE: lucene farsi problem

16 matches

Site Navigation

Mail list logo

Footer information