> Hmmm ... how many chunks of "about 50 pages" do you do before hitting this?
> Roughly how many docs are in the index when it happens?
Oh, gosh, not sure. I'm guessing it's about half done.
> Can you describe the docs/fields you're adding?
I've got 1735 documents, 18969 pages -- average page s
Since what I'm dealing with is well-formed html, I wonder if I could modify the
tokenizer to skip the html elements and then use the NullFragmenter. I can
probably isolate the html text. Sounds like I have a plan or at least
something to try.
Thanks
From: M
xml with embedded xhtml
From: Matthijs Bierman [mailto:[EMAIL PROTECTED]
Sent: Wed 11/28/2007 3:26 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene highlighting
Hi Scott,
The highlighter code does not do this. You need to implement your own
highlighter.
> I'm going to run the same software on an
> Intel machine and see what happens.
So, I ran the same codebase with lucene-core-2.2.0.jar on an Intel Mac
Pro, OS X 10.5.0, Java 1.5, and no exception is raised. Different
corpus, about 5 pages instead of 2. This is reinforcing my
thinking th
Hi,
I use Lucli to optimize my index, when my application was stopped. And after
restarting my application, I could not serahc my index anymore, I got the
following exception :
org.apache.lucene.index.CorruptIndexException: Unknown format version: -4
at org.apache.lucene.index.Se
> You are not hitting any other exception before this one right?
>
> Can you change your test case so that the "catch" clause is run
> before the "finally" clause? I wonder if you are hitting some
> interesting exception and then trying to optimize, which then
> masks the original exception.
Yes
Hmmm ... how many chunks of "about 50 pages" do you do before hitting this?
Roughly how many docs are in the index when it happens?
Can you describe the docs/fields you're adding?
You are not hitting any other exception before this one right?
Can you change your test case so that the "catch" cl
> Are you really sure in your 2.2 test you are starting with no prior
> index?
I'd ask that too, but yes, I'm really really sure. Building a
completely new index each time.
Works with 2.0.0. Fails with 2.2.0. Works with 2.2.0 *if* I remove
the optimization step.
Bill
---
Are you really sure in your 2.2 test you are starting with no prior
index?
2.2 should in fact work fine with a 2.0 index but it's possible there
was some latent corruption in the 2.0 index if you are accidentally
using it. That exception looks alot like this dreaded bug:
https://issues.apache.
: Search query is like this ttl:co-operative it returns more than 50 results,
: but if i convert the query like this ttl:co-operat* it returns no result.
: again i entered a query ttl:11-amino it returns some results, then changed
: the above query into ttl:11-amino* it will return some more res
I just tried re-indexing with lucene-core-2.0.0.jar and the same
indexing code; works great. So what am I doing wrong with 2.2?
Bill
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Here's the code I'm using:
try {
// Now add the documents to the index
IndexWriter writer = new IndexWriter(index_loc, new
StandardAnalyzer(), !index_loc.exists());
writer.setMaxFieldLength(Integer.MAX_VALUE);
try {
for (in
I've got a DB of about 2 pages which I thought I'd update to
Lucene 2.2. I removed the old index (2.0 based) completely, and
started re-indexing all the documents. I do this in stages, of about
50 pages at a time, serially, starting a new JVM each time, and reading
in the existing index, then
Hi, thanks for the reply.
But can anyone give me some more hints? I have checked SpanQuery, but still
haven't found out a solution.
Thanks.
Grant Ingersoll-6 wrote:
>
> Have a look at SpanQuery and it's derivatives. You will need to do
> some post-processing as well.
>
> -Grant
>
> On
Super! Thanks for catching this.
Mike
"Bogdan Ghidireac" <[EMAIL PROTECTED]> wrote:
> Great, everything runs fine now.. Thank you.
>
> Bogdan
>
> On 11/27/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
> >
> >
> > OK I opened this JIRA issue to track this:
> >
> > https://issues.apache.o
Great, everything runs fine now.. Thank you.
Bogdan
On 11/27/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
>
>
> OK I opened this JIRA issue to track this:
>
> https://issues.apache.org/jira/browse/LUCENE-1069
>
> Mike
>
> "Michael McCandless" <[EMAIL PROTECTED]> wrote:
> >
> > Woops! You
Hi,
Take a look at Proximity (http://proximity.abstracthorizon.org/px1/) a
Maven Proxy that include Lucene search.
Patrick
Olivier Dehon wrote:
Hello,
Has anyone worked on a lucene maven plugin?
I am thinking of embedding a lucene index as part of a maven artifact,
so that artifact reposi
Hello,
Has anyone worked on a lucene maven plugin?
I am thinking of embedding a lucene index as part of a maven artifact,
so that artifact repository managers can do a better job of searching
repositories, by exploiting the index that is customized/tailored for
every type of artifact.
It will al
This would only highlight plaintext though, not in the original document
as I suspect the TS would like.
Matthijs
markharw00d wrote:
I need to highlight an entire document as it is displayed
See NullFragmenter
-
To uns
Have a look at SpanQuery and it's derivatives. You will need to do
some post-processing as well.
-Grant
On Nov 28, 2007, at 6:41 AM, bigdoginuk wrote:
Hi all,
I want to compute the co-occurence frequency between a word and a
phrase(
this phrase contains some words, and the words in it sh
Seems reasonable to me, but I guess I wonder what kind of control you
have that you don't in Nutch? Maybe worth asking on Nutch. Also, it
is fairly easy in Nutch to separate the crawling aspect from the
indexing aspect, such that you could use all of Nutch's power for
crawling and extract
Hi all,
I want to compute the co-occurence frequency between a word and a phrase(
this phrase contains some words, and the words in it should be successive
and in order). It's like an NEAR operation (like setting slop at 3...)
Does anyone know how to implement this?
Thanks in advance.
Rooney
I need to highlight an entire document as it is displayed
See NullFragmenter
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi Scott,
The highlighter code does not do this. You need to implement your own
highlighter. What kind of documents are you indexing?
Matthijs
Scott Smith wrote:
I've been looking at the highlighter examples. All of them seem to deal with
fragments. I need to highlight an entire document
24 matches
Mail list logo