categorizing results

2007-03-17 Thread Dima May
I have a Lucene related questions/problem. My search results can potentially get very large 200,000+. I want to categorize my results. So for example if I have an indexed field "type" that has such things as CDs, books, videos, power drills, or anything else in the world, I would want to displa

Re: Issue while parsing XML files due to control characters, help appreciated.

2007-03-17 Thread Lokeya
Thanks for your reply. I tried to check if the I/O and Parsing is taking time separately and Indexing time also. I observed that I/O and Parsing 70 files totally takes 80 minutes where as when I combine this with Indexing for a single Metadata file it nearly 2 to 3 hours. So looks like IndexWriter

Re: search timeout

2007-03-17 Thread Chris Hostetter
Ack! ... this is what happens when i only skim a patch and then write with my odd mix of authority and childlike speling : * it creates a single (static) timer thread, which counts the "ticks", : every couple hundred ms (configurable). It uses a volatile int counter, : therefore avoiding the

Re: search timeout

2007-03-17 Thread Chris Hostetter
: > this is something anyone using the Lucene API can do as long as they use a : > HitCollector ... the Nutch impl seems to ctually spin up a seperate thread : > : : I'm keen to understand the pros and cons of these two approaches. to clarify, it's really just one approach, with an extension: Nut

Re: search timeout

2007-03-17 Thread Andrzej Bialecki
markharw00d wrote: Chris Hostetter wrote: this is something anyone using the Lucene API can do as long as they use a HitCollector ... the Nutch impl seems to ctually spin up a seperate thread I'm keen to understand the pros and cons of these two approaches. With the HitCollector approach

Re: Issue while parsing XML files due to control characters, help appreciated.

2007-03-17 Thread Erick Erickson
See below... On 3/17/07, Lokeya <[EMAIL PROTECTED]> wrote: Hi, I am trying to index the content from XML files which are basically the metadata collected from a website which have a huge collection of documents. This metadata xml has control characters which causes errors while trying to pars

Re: search timeout

2007-03-17 Thread karl wettin
17 mar 2007 kl. 10.07 skrev markharw00d: Chris Hostetter wrote: this is something anyone using the Lucene API can do as long as they use a HitCollector ... the Nutch impl seems to ctually spin up a seperate thread I'm keen to understand the pros and cons of these two approaches. With t

Re: search timeout

2007-03-17 Thread markharw00d
Chris Hostetter wrote: this is something anyone using the Lucene API can do as long as they use a HitCollector ... the Nutch impl seems to ctually spin up a seperate thread I'm keen to understand the pros and cons of these two approaches. With the HitCollector approach is this just engineer