On Jan 28, 2007, at 11:23 PM, maureen tanuwidjaja wrote:
I think so ...btw may I ask the opinion, will it be useful to
optimize let say every 50,000-60,000 documents? I have total of
660,000 docs...
Lucene automatically merges segments periodically during large
indexing runs. Look at the
On Jan 26, 2007, at 5:28 PM, Chris Hostetter wrote:
: LIA2 will happen, but Lucene is undergoing a lot of changes, so
Erik and
: I are going to wait a little more for development to calm down
: (utopia?).
you're waiting for Lucene development to calm down? ... that could
be a
long wait.
On Jan 26, 2007, at 1:56 PM, Bill Taylor wrote:
I notice that the Lucene book offered by Amazon was published in
2004. I saw some mail on the subject of a new edition.
Is the new edition available in any form?
I promise to buy the new edition as soon as it comes out even if I
get some of
I think so ...btw may I ask the opinion, will it be useful to optimize let say
every 50,000-60,000 documents? I have total of 660,000 docs...
Erik Hatcher <[EMAIL PROTECTED]> wrote:
On Jan 28, 2007, at 9:15 PM, maureen tanuwidjaja wrote:
> OK,This is the printout of the stack trace while failin
On Jan 28, 2007, at 9:15 PM, maureen tanuwidjaja wrote:
OK,This is the printout of the stack trace while failing to
indexing the 190,000th ocument
java.io.IOException: There is not enough space on the disk
Can anyone help?
Ummm get more disk space?!
Erik
On Jan 26, 2007, at 2:30 PM, Otis Gospodnetic wrote:
It really all dependsright Erik?
Ha! Looks like I've earned a tag line around here, eh?! :)
On the hardware you are using, complexity of queries, query
concurrency, query latency you are willing to live with, the size
of the index
OK,This is the printout of the stack trace while failing to indexing the
190,000th ocument
Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491886.xml
Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491887.xml
Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491891.xml
Indexin
I do use the NullFragmenter now. I have no interest in the fragments at
the moment, just in showing hits on the source document. It would be
great if I could just show the real hits though. The span approach seems
to work fine for me. I have even tested the highlighting using my
sentence and pa
great suggestion and Eric's also earlier. Thank you.
Felix
"Michael D. Curtin" <[EMAIL PROTECTED]> wrote: Felix Litman wrote:
> We want to be able to return a result regardless if users use a colon or not
> in the query. So 'work:' and 'work' query should still return same result.
>
> With th
Felix Litman wrote:
We want to be able to return a result regardless if users use a colon or not in
the query. So 'work:' and 'work' query should still return same result.
With the current parser if a user enters 'work:' with a ":" , Lucene does not return
anything :-(. It seems to me the
On Jan 28, 2007, at 3:47 PM, Felix Litman wrote:
We want to be able to return a result regardless if users use a
colon or not in the query. So 'work:' and 'work' query should
still return same result.
With the current parser if a user enters 'work:' with a ":" ,
Lucene does not return a
We want to be able to return a result regardless if users use a colon or not in
the query. So 'work:' and 'work' query should still return same result.
With the current parser if a user enters 'work:' with a ":" , Lucene does not
return anything :-(. It seems to me the Lucene parser issue...
Correction:
We only do the euclidan computation during sorting. For filtering, a simple
bounding box is computed to approximate the radius, and 2 range comparisons
are made to exclude documents. Because these comparisons are done outside of
Lucene as integer comparisons, it is pretty fast. With 13
>>For what it's worth Mark (Miller), there *is* a need for "just
highlight the query terms without trying to get excerpts" functionality
>>- something a la Google cache (different colours...mmm, nice).
FWIW, the existing highlighter doesn't *have* to fragment - just pass a
NullFragmenter to the
StandardAnalyzer should not be indexing punctuation from my
experience...instead something like old:fart would be indexed as old and
fart. QueryParser will then generate a query of old within 1 of fart for
the query old:fart. This is the case for all punctuation I have run
into. Things like f.b
28 jan 2007 kl. 05.54 skrev Doron Cohen:
karl wettin <[EMAIL PROTECTED]> wrote on 27/01/2007 13:49:24:
In essence, should I return
index.getDocumentsByNumber().size() -
index.getDeletedDocuments().size() +
unflushedDocuments.size();
or
index.getDocumentsByNumber().size() +
unfl
Yes, thank you. That would be a good solution. But we are using Lucene's
Standard Analyzer. It seems to index words with colons ":" and other
punctuation by default. Is there a simple way to have the Analyzer not to
index colons specifically and punctuation in general?
Erick Erickson <[EMAIL
Hi...
I'm sorry,I just found out and realize that it is NOT the 10,000th documents
that raise the exception when IndexWriter.add(Document) is calledbut it is
the 180,000+ 10,000 document,so the 190,000th documents.
Now I am running the program again and put the code to print the
Maureen:
I lost the e-mail where you re-throw the exception. But you'd get a *lot*
more information if you'd print the stacktrace via
(catch Exception e) {
e.printStackTrace();
throw e;
}
And that would allow the folks who understand Lucene to give you a LOT more
help ...
Best
Erick
On 1/27/07
I've got to ask why you'd want to search on colons. Why not just index the
words without colons and search without them too? Let's say you index the
word "work:" Do you really want to have a search on "work" fail?
By and large, you're better off indexing and searching without
punctuation
Bes
Is there a simple way to turn off field-search syntax in the Lucene parser, and
have Lucene recognize words ending in a colon ":" as search terms instead?
Such words are very common occurrences for our documents (or any plain text),
but Lucene does not seem to find them. :-(
Thank you,
Felix
I finally rerun the program and it stops at exactly the sampe place.This time
the exception came out.Writer cant add the 1th document to the index...
Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491886.xml
Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491887.xml
Indexin
22 matches
Mail list logo