I have a Lucene related questions/problem.
My search results can potentially get very large 200,000+. I want to
categorize my results. So for example if I have an indexed field "type" that
has such things as CDs, books, videos, power drills, or anything else in the
world, I would want to displa
Thanks for your reply. I tried to check if the I/O and Parsing is taking time
separately and Indexing time also. I observed that I/O and Parsing 70 files
totally takes 80 minutes where as when I combine this with Indexing for a
single Metadata file it nearly 2 to 3 hours. So looks like IndexWriter
Ack! ... this is what happens when i only skim a patch and then write with
my odd mix of authority and childlike speling
: * it creates a single (static) timer thread, which counts the "ticks",
: every couple hundred ms (configurable). It uses a volatile int counter,
: therefore avoiding the
: > this is something anyone using the Lucene API can do as long as they use a
: > HitCollector ... the Nutch impl seems to ctually spin up a seperate thread
: >
:
: I'm keen to understand the pros and cons of these two approaches.
to clarify, it's really just one approach, with an extension: Nut
markharw00d wrote:
Chris Hostetter wrote:
this is something anyone using the Lucene API can do as long as they
use a
HitCollector ... the Nutch impl seems to ctually spin up a seperate
thread
I'm keen to understand the pros and cons of these two approaches.
With the HitCollector approach
See below...
On 3/17/07, Lokeya <[EMAIL PROTECTED]> wrote:
Hi,
I am trying to index the content from XML files which are basically the
metadata collected from a website which have a huge collection of
documents.
This metadata xml has control characters which causes errors while trying
to
pars
17 mar 2007 kl. 10.07 skrev markharw00d:
Chris Hostetter wrote:
this is something anyone using the Lucene API can do as long as
they use a
HitCollector ... the Nutch impl seems to ctually spin up a
seperate thread
I'm keen to understand the pros and cons of these two approaches.
With t
Chris Hostetter wrote:
this is something anyone using the Lucene API can do as long as they use a
HitCollector ... the Nutch impl seems to ctually spin up a seperate thread
I'm keen to understand the pros and cons of these two approaches.
With the HitCollector approach is this just engineer