Detection of index dublicates in Lucene

2007-07-28 Thread Dmitry
We trying to find are any implementation for Lucene - detection index duclicates. Assuming we have a set of documents and a document is a bunch of words. After we created indexec for the same document we need to knwo that all ideces will be uniq for specific document. (lexical equivalency).

Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-28 Thread Lukas Vlcek
Mark, thank you for this. I will wait for your other responses. This will keep me going on :-) I didn't know that there is a design restriction in Lucene that the text and TokenStream must be exactly the same (still this seems redundant, I will dive into Lucene API more). BR Lukas On 7/29/07, Ma

Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-28 Thread Mark Miller
I'm am going to try and write up some more info for you tomorrow, but just to point out: I do think there is a bug in the way offsets are being handled. I don't think this is causing your current problem (what I mentioned is) but it will prob cause you problems down the road. I will look into t

Re: lucene integration with PDM Windchill (Product Data Management System)

2007-07-28 Thread Dmitry
Karl, And by the way we created one of the solution - but we need to have more embedded interfaces / implementation fucntions to the PDM Widchill. (article you can find on www.profilesmagazine.com for one of the solution). Thanks, DT, www.ejinz.com Search Engine Platform News - Origina

Re: lucene integration with PDM Windchill (Product Data Management System)

2007-07-28 Thread Dmitry
Karl, thanks for help. I will try to explain requirements. There is system PDM - product Data Management System - which manages the data related to products, supports precudures druing the product lifecycle deals with the development and production infrastructure. There following design pater

Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-28 Thread Mark Miller
Hey Lukas, Sorry I havn't gotten back to you on this sooner. Been meaning too, but I have been busy. Still am a little, but here is some to get you started: The token stream you send to the highlighter must match the text you send to the highlighter. Your token stream is this: (example,0,7

Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-28 Thread Lukas Vlcek
Hi Lucene experts, The following is a simple Lucene code which generates StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0 official releasse. Can anyone tell me what is wrong with this code? Is this a bug or a feature of Lucene? Any comments/hits highly welcommed! In a nutshell I

Re: Assembling a query from multiple fields

2007-07-28 Thread Erick Erickson
Since you say you're new, I'll risk stating the obvious . Do you know about the BooleanQuery class? But do note that the TermQuery isn't quite what you might expect. For instance, making a TermQuery from the string "this is junk" is different from making three TermQuery objects, one for each word.

Re: java gc with a frequently changing index?

2007-07-28 Thread Erick Erickson
Why do you believe that it's the gc? I admit i just scanned your e-mail, but I *do* know that the first search (especially sorts) on a newly-opened IndexReader incure a bunch of overhead. Could that be what you're seeing? I'm not sure there is a "best practice", but I have seen two solutions menti

Re: FieldCache for Search

2007-07-28 Thread Erick Erickson
You might try lazy loading. See IndexReader.*document *(int n, FieldSelector fieldSelector), particularly the FieldSelector. It allows you to selectively load only the fields you want. Otherwise, I'm sure if you looked in the unit tests you'd find examples of how to use FieldCache. If your field

Re: Lucene performance using a solid state disk (SSD)

2007-07-28 Thread Otis Gospodnetic
Kent, I have not seen anyone do this, but I know Kevin Burton of TailRank (BCCed) has been drooling over the same idea (check his blog(s)). :) Otis -- Lucene Consulting -- http://lucene-consulting.com/ - Original Message From: Kent Fitch <[EMAIL PROTECTED]> To: java-user@lucene.apache

Re: NPE in MultiReader

2007-07-28 Thread Mark Miller
Very odd...looks to me like one of the subreaders in the subreader array is null. Very odd because at one point it must not have been null to get past the MultiReader creation... testn wrote: Every once in a while I got the following exception with Lucene 2.2. Do you have any idea? Thanks, j