Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
> Hmmm ... how many chunks of "about 50 pages" do you do before hitting this? > Roughly how many docs are in the index when it happens? Oh, gosh, not sure. I'm guessing it's about half done. > Can you describe the docs/fields you're adding? I've got 1735 documents, 18969 pages -- average page s

RE: Lucene highlighting

2007-11-28 Thread Scott Smith
Since what I'm dealing with is well-formed html, I wonder if I could modify the tokenizer to skip the html elements and then use the NullFragmenter. I can probably isolate the html text. Sounds like I have a plan or at least something to try. Thanks From: M

RE: Lucene highlighting

2007-11-28 Thread Scott Smith
xml with embedded xhtml From: Matthijs Bierman [mailto:[EMAIL PROTECTED] Sent: Wed 11/28/2007 3:26 AM To: java-user@lucene.apache.org Subject: Re: Lucene highlighting Hi Scott, The highlighter code does not do this. You need to implement your own highlighter.

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
> I'm going to run the same software on an > Intel machine and see what happens. So, I ran the same codebase with lucene-core-2.2.0.jar on an Intel Mac Pro, OS X 10.5.0, Java 1.5, and no exception is raised. Different corpus, about 5 pages instead of 2. This is reinforcing my thinking th

CorruptIndexException

2007-11-28 Thread Melanie Langlois
Hi, I use Lucli to optimize my index, when my application was stopped. And after restarting my application, I could not serahc my index anymore, I got the following exception : org.apache.lucene.index.CorruptIndexException: Unknown format version: -4 at org.apache.lucene.index.Se

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
> You are not hitting any other exception before this one right? > > Can you change your test case so that the "catch" clause is run > before the "finally" clause? I wonder if you are hitting some > interesting exception and then trying to optimize, which then > masks the original exception. Yes

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Michael McCandless
Hmmm ... how many chunks of "about 50 pages" do you do before hitting this? Roughly how many docs are in the index when it happens? Can you describe the docs/fields you're adding? You are not hitting any other exception before this one right? Can you change your test case so that the "catch" cl

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
> Are you really sure in your 2.2 test you are starting with no prior > index? I'd ask that too, but yes, I'm really really sure. Building a completely new index each time. Works with 2.0.0. Fails with 2.2.0. Works with 2.2.0 *if* I remove the optimization step. Bill ---

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Michael McCandless
Are you really sure in your 2.2 test you are starting with no prior index? 2.2 should in fact work fine with a 2.0 index but it's possible there was some latent corruption in the 2.0 index if you are accidentally using it. That exception looks alot like this dreaded bug: https://issues.apache.

Re: prefix query search problem if a hyphen exist in the search word

2007-11-28 Thread Chris Hostetter
: Search query is like this ttl:co-operative it returns more than 50 results, : but if i convert the query like this ttl:co-operat* it returns no result. : again i entered a query ttl:11-amino it returns some results, then changed : the above query into ttl:11-amino* it will return some more res

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
I just tried re-indexing with lucene-core-2.0.0.jar and the same indexing code; works great. So what am I doing wrong with 2.2? Bill - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
Here's the code I'm using: try { // Now add the documents to the index IndexWriter writer = new IndexWriter(index_loc, new StandardAnalyzer(), !index_loc.exists()); writer.setMaxFieldLength(Integer.MAX_VALUE); try { for (in

lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
I've got a DB of about 2 pages which I thought I'd update to Lucene 2.2. I removed the old index (2.0 based) completely, and started re-indexing all the documents. I do this in stages, of about 50 pages at a time, serially, starting a new JVM each time, and reading in the existing index, then

Re: Compute the co-occurence beteen a phrase and a word

2007-11-28 Thread bigdoginuk
Hi, thanks for the reply. But can anyone give me some more hints? I have checked SpanQuery, but still haven't found out a solution. Thanks. Grant Ingersoll-6 wrote: > > Have a look at SpanQuery and it's derivatives. You will need to do > some post-processing as well. > > -Grant > > On

Re: CheckIndex tool issues

2007-11-28 Thread Michael McCandless
Super! Thanks for catching this. Mike "Bogdan Ghidireac" <[EMAIL PROTECTED]> wrote: > Great, everything runs fine now.. Thank you. > > Bogdan > > On 11/27/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > > > > > > OK I opened this JIRA issue to track this: > > > > https://issues.apache.o

Re: CheckIndex tool issues

2007-11-28 Thread Bogdan Ghidireac
Great, everything runs fine now.. Thank you. Bogdan On 11/27/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > > > OK I opened this JIRA issue to track this: > > https://issues.apache.org/jira/browse/LUCENE-1069 > > Mike > > "Michael McCandless" <[EMAIL PROTECTED]> wrote: > > > > Woops! You

Re: Maven Lucene plugin ?

2007-11-28 Thread Patrick
Hi, Take a look at Proximity (http://proximity.abstracthorizon.org/px1/) a Maven Proxy that include Lucene search. Patrick Olivier Dehon wrote: Hello, Has anyone worked on a lucene maven plugin? I am thinking of embedding a lucene index as part of a maven artifact, so that artifact reposi

Maven Lucene plugin ?

2007-11-28 Thread Olivier Dehon
Hello, Has anyone worked on a lucene maven plugin? I am thinking of embedding a lucene index as part of a maven artifact, so that artifact repository managers can do a better job of searching repositories, by exploiting the index that is customized/tailored for every type of artifact. It will al

Re: Lucene highlighting

2007-11-28 Thread Matthijs Bierman
This would only highlight plaintext though, not in the original document as I suspect the TS would like. Matthijs markharw00d wrote: I need to highlight an entire document as it is displayed See NullFragmenter - To uns

Re: Compute the co-occurence beteen a phrase and a word

2007-11-28 Thread Grant Ingersoll
Have a look at SpanQuery and it's derivatives. You will need to do some post-processing as well. -Grant On Nov 28, 2007, at 6:41 AM, bigdoginuk wrote: Hi all, I want to compute the co-occurence frequency between a word and a phrase( this phrase contains some words, and the words in it sh

Re: Lucene or nutch for indexing web documents

2007-11-28 Thread Grant Ingersoll
Seems reasonable to me, but I guess I wonder what kind of control you have that you don't in Nutch? Maybe worth asking on Nutch. Also, it is fairly easy in Nutch to separate the crawling aspect from the indexing aspect, such that you could use all of Nutch's power for crawling and extract

Compute the co-occurence beteen a phrase and a word

2007-11-28 Thread bigdoginuk
Hi all, I want to compute the co-occurence frequency between a word and a phrase( this phrase contains some words, and the words in it should be successive and in order). It's like an NEAR operation (like setting slop at 3...) Does anyone know how to implement this? Thanks in advance. Rooney

Re: Lucene highlighting

2007-11-28 Thread markharw00d
I need to highlight an entire document as it is displayed See NullFragmenter - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene highlighting

2007-11-28 Thread Matthijs Bierman
Hi Scott, The highlighter code does not do this. You need to implement your own highlighter. What kind of documents are you indexing? Matthijs Scott Smith wrote: I've been looking at the highlighter examples. All of them seem to deal with fragments. I need to highlight an entire document