We trying to find are any implementation for Lucene - detection index
duclicates.
Assuming we have a set of documents and a document is a bunch of words.
After we created indexec for the same document we need to knwo that all
ideces will be uniq for specific document. (lexical equivalency).
Mark,
thank you for this. I will wait for your other responses.
This will keep me going on :-)
I didn't know that there is a design restriction in Lucene that the text and
TokenStream must be exactly the same (still this seems redundant, I will
dive into Lucene API more).
BR
Lukas
On 7/29/07, Ma
I'm am going to try and write up some more info for you tomorrow, but
just to point out: I do think there is a bug in the way offsets are
being handled. I don't think this is causing your current problem (what
I mentioned is) but it will prob cause you problems down the road. I
will look into t
Karl,
And by the way we created one of the solution - but we need to have more
embedded interfaces / implementation fucntions to the PDM Widchill.
(article you can find on www.profilesmagazine.com for one of the solution).
Thanks,
DT,
www.ejinz.com
Search Engine Platform News
- Origina
Karl,
thanks for help.
I will try to explain requirements. There is system PDM - product Data
Management System - which manages the data related to products, supports
precudures druing the product lifecycle deals with the development and
production infrastructure.
There following design pater
Hey Lukas,
Sorry I havn't gotten back to you on this sooner. Been meaning too, but
I have been busy. Still am a little, but here is some to get you started:
The token stream you send to the highlighter must match the text you
send to the highlighter.
Your token stream is this:
(example,0,7
Hi Lucene experts,
The following is a simple Lucene code which generates
StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0 official
releasse. Can anyone tell me what is wrong with this code? Is this a bug or
a feature of Lucene? Any comments/hits highly welcommed!
In a nutshell I
Since you say you're new, I'll risk stating the obvious . Do you
know about the BooleanQuery class?
But do note that the TermQuery isn't quite what
you might expect. For instance, making a TermQuery from the
string "this is junk" is different from making three TermQuery
objects, one for each word.
Why do you believe that it's the gc? I admit i just scanned your
e-mail, but I *do* know that the first search (especially sorts) on
a newly-opened IndexReader incure a bunch of overhead. Could
that be what you're seeing?
I'm not sure there is a "best practice", but I have seen two
solutions menti
You might try lazy loading. See
IndexReader.*document
*(int n,
FieldSelector
fieldSelector),
particularly the FieldSelector. It allows you to selectively load only the
fields you want.
Otherwise, I'm sure if you looked in the unit tests you'd find examples
of how to use FieldCache. If your field
Kent,
I have not seen anyone do this, but I know Kevin Burton of TailRank (BCCed) has
been drooling over the same idea (check his blog(s)). :)
Otis
--
Lucene Consulting -- http://lucene-consulting.com/
- Original Message
From: Kent Fitch <[EMAIL PROTECTED]>
To: java-user@lucene.apache
Very odd...looks to me like one of the subreaders in the subreader array
is null. Very odd because at one point it must not have been null to get
past the MultiReader creation...
testn wrote:
Every once in a while I got the following exception with Lucene 2.2. Do you
have any idea?
Thanks,
j
12 matches
Mail list logo