We also use similar kind of technique, breaking indexes in to smaller and
search using ParallelMultiSearcher. We have to do incremental indexing and the
records older than 6 months or 1 year (based on ageout setting) should be
deleted. Having multiple small indexes is really fast in terms of in
Hi,
Can I remove the filler token _ from the n-gram-tokens that are generated by
a ShingleFilter?
I'm using a chain of filters: ClassicFilter, StopFilter, LowerCaseFilter,
and ShingleFilter to create phrase n-grams. The ShingleFilter inserts
FILLER_TOKENs in place of the stopwords, but I don't w
Thanks for your suggestion!
I try to set document boost factor when indexing document. In order to
bubble up recent documents' scores, I set last three month's documents'
boost to 2 , and set other documents' boost factor to 0.5. The I search
index sorting by two fields, lucene default score and
: I attach a junit test which shows strange behaviour of the inOrder
: parameter on the SpanNearQuery constructor, using Lucene 2.9.4.
:
: My understanding of this parameter is that true forces the order and
: false doesn't care about the order.
:
: Using true always works. However using false
Since no one else is jumping in, I'll say that I suspect that the span
query code does not bother to check to see if two of the terms are the
same.
I think that would account for the behavior you are seeing. Since the
second SpanTermQuery would match the same term the first one did.
Note that I'm
Hi,
In the Lucene 2.9.4 project, there is a requirement to boost some of the
keywords in the document using payload.
Now while searching, is there a way I can boost the MoreLikeThis result
using the index time payload values?
Or can I merge MoreLikeThis output and PayloadTermQuery output somehow
Hi Samar,
>>Normal queries go fine under 500 ms but when people start searching
>>"anything" some queries take up to > 100 seconds. Don't you think
>>distributing smaller indexes on different machines would reduce the average
>>.search time. (Although I have a feeling that search time for smaller
Hi Mike,
*"I think the usual approach is to create multiple mirrored copies (slaves)
rather than sharding"*
This is where my eyes stuck.
We do have mirrors and in-fact a good number of those. 6 servers are being
used for serving regular queries (2 are for specific queries that do take
time) and e
Down to basics, Lucene searches work by locating terms and resolving
documents from them. For standard term queries, a term is located by a
process akin to binary search. That means that it uses log(n) seeks to
get the term. Let's say you have 10M terms in your corpus. If you stored
that in a si
Anyone able to help me with the problem below?
Thanks
Greg
-Original Message-
From: Gregory Tarr [mailto:gregory.t...@detica.com]
Sent: 09 May 2011 12:33
To: java-user@lucene.apache.org
Subject: RE: SpanNearQuery - inOrder parameter
Attachment didn't work - test below:
import org.ap
A full stack trace dump is always helpful. Are the three instances on
one server with a local index directory, or on different servers
accessing a network drive (how?) or what? If the index is locked it
would be surprising that you could update it from 2 of the instances.
--
Ian.
On Tue, May
Three Instance of My application & lucene index directory shared for all
instance
Lucene version 3.1
Lock factory:- NativeFSLockFactory
Instance1 jdk64 ,64 os
Instance2 jdk64 ,64 os
Instance3 jdk32 ,32 os
When I try to search the data from the index directory from Instance1
I got
Hi all,
in our Lucene 3.0.3-based web application when a user clicks on a hit
link the targeted PDF should be opened in the browser with highlighted hits.
For this purpose using the Acrobat Highlight File (Parameter xml, see
http://www.pdfbox.org/userguide/highlighting.html and
http://partne
Thanks
to Johannes - I am looking into katta. Seems promising.
to Toke - Great explanation. That's what I was looking for.
I'll come back and share my experience.
Thank you very much.
On Tue, May 10, 2011 at 1:31 PM, Toke Eskildsen wrote:
> On Mon, 2011-05-09 at 13:56 +0200, Samarendra Prata
On Mon, 2011-05-09 at 13:56 +0200, Samarendra Pratap wrote:
> We have an index directory of 30 GB which is divided into 3 subdirectories
> (idx1, idx2, idx3) which are again divided into 21 sub-subdirectories
> (idx1-1, idx1-2, , idx2-1, , idx3-1, , idx3-21).
So each part is about ½ G
On May 10, 2011, at 9:42 AM, Samarendra Pratap wrote:
> Hi,
> Though we have 30 GB total index, size of the indexes that are used
> in 75%-80% searches is 5 GB. and we have average search time around 700 ms.
> (yes, we have optimized index).
>
> Could someone please throw some light on my origin
Hi,
Though we have 30 GB total index, size of the indexes that are used
in 75%-80% searches is 5 GB. and we have average search time around 700 ms.
(yes, we have optimized index).
Could someone please throw some light on my original doubt!!!
If I want to keep smaller indexes on different servers
17 matches
Mail list logo