Re: Slightly off topic, I need to have luke use my Analyzer

2004-07-22 Thread Rob Jose
Thanks Kannan Rob - Original Message - From: Chellappa, Kannan [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 12:19 PM Subject: RE: Slightly off topic, I need to have luke use my Analyzer Sorry typo in the version date in my previous mail -- I

Unnecesary scan with required terms

2004-07-22 Thread John Patterson
Hi, I have been looking at how Lucene operates with queries where all terms are required. I expected that the algorithm would step through each term to confirm that it did exist in the index and as soon as a clause is found that does not occur, the search would be aborted. As far as I can tell

rebuild index

2004-07-22 Thread Sergiu Gordea
Hi all, I have a question related to reindexing of documents with lucene. We want to implement the functinality of rebuilding lucene index. That means I want to delete all documents in the index and to add newer versions. All information I need to reindex is kept in the database so that I have a

RE: rebuild index

2004-07-22 Thread Aviran
Why don't you just build a new index in a different location and at the end add the missing documents from the old index to the new one, and then delete the old index. Aviran -Original Message- From: Sergiu Gordea [mailto:[EMAIL PROTECTED] Sent: Thursday, July 22, 2004 10:49 AM To:

Re: rebuild index

2004-07-22 Thread Sergiu Gordea
Because on the other hand I want to have a clean index, without any kind of garbage. This is the requested funtionality of the rebuild index function. Clean Index and don't loose data. I was also thinking that I can delete the index location and create a new index, this may have the same effect

Re: Can I retrieve token offsets from Hits?

2004-07-22 Thread Roy
Hi, Lucene Guru: I wonder if the information in termPositions or termVector can be used to restore token position from indicies? Thanks! Roy On Wed, 21 Jul 2004 21:32:10 +0100, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I need these values for hihglighting. I've already looked to

Re: Can I retrieve token offsets from Hits?

2004-07-22 Thread Roy
Thanks for the pointer to Luke. It's a very useful tool. I did more research but don't think Lucene stores the token position in indicies. Token position is different from term position. So, for highlighting, the original text has to be retokenized again. On Thu, 22 Jul 2004 19:44:14 +0200,

RE: Lucene vs. MySQL Full-Text

2004-07-22 Thread wallen
I also question whether it could handle extreme volume with such good query speed. Has anyone done numbers with 1+ million documents? -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 20, 2004 5:44 PM To: Lucene Users List Subject: Re: Lucene vs. MySQL

Re: Lucene vs. MySQL Full-Text

2004-07-22 Thread John Patterson
I used the MySQL full text search to index about 70K business directory records. It became impossibly slow and I ended up creating my own text search engine similar in concept to Lucene but database driven. It worked much faster than the native MySQL full text search. Other limitations of MySQL

Re: Can I retrieve token offsets from Hits?

2004-07-22 Thread markharw00d
I wonder if the information in termPositions or termVector can be used to restore token position from indicies? TermFreqVector gives you term frequencies (not positions). This can be of use in computing document similarities. TermPositions gives you the sequence number . eg in the last

Re: Can I retrieve token offsets from Hits?

2004-07-22 Thread Grant Ingersoll
I am sensing a common theme throughout a variety of threads here: Namely, a need for a pluggable set of Reader's and Writers (think Interface) that can write metadata about an Index/Document/Field/Term (which I see the TermVector stuff as being a instance of) and can be given to Lucene from

Re: Very slow IndexReader.open() performance

2004-07-22 Thread Byron Miller
On Thu, 22 Jul 2004 14:19:21 -0400, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: It could also be that your disk space is filling up and the OS runs out of swap room. If you run Fedora you will also need to upgrade your kernel. There is a severe bug with Java crashing on the default kernels.. If

Re: speeding up lucene search

2004-07-22 Thread Byron Miller
On Wed, 21 Jul 2004 22:13:32 +1000, Anson Lau [EMAIL PROTECTED] wrote: Has anyone tried splitting up an index into smaller chunks, without putting the different indicies on a different physical disk/box? What sort of performance gain do you get from it? The best advantage to this would be

Re: Very slow IndexReader.open() performance

2004-07-22 Thread Ram Subbaroyan
Hi Byron, I am planning on benchmarking Nutch on Opteron box ( 2 CPU, 2 TB, 2 Gig RAM) using Fedora Core rc2 and jdk 1.5 beta 2. Are there any issues I should be aware of? Thanks for the help, ram On 7/22/04 4:56 PM, Byron Miller [EMAIL PROTECTED] wrote: On Thu, 22 Jul 2004 14:19:21 -0400,