Re: Serious Index Corruption Error - FileNotFoundException

2008-05-08 Thread Jamie
Hi Mike Thanks for the suggestions. I've implemented all of them. The main reason why I manually deleted the lock file was that sometimes users kill the server process manually or there is a hard reboot without any warning. In such circumstances, Lucene leaves a lock file lying around as it

Re: Serious Index Corruption Error - FileNotFoundException

2008-05-08 Thread Michael McCandless
OK, that sounds like a legitimate reason to forcibly remove the write lock, but it would be better to do that only on startup of your process rather than in every openIndex() call. If ever you hit LockObtainFailedException in openIndex, even after having deleted the write lock on

Re: Serious Index Corruption Error - FileNotFoundException

2008-05-08 Thread Jamie
Hi Michael I had in fact preempted you and moved the delete lock code to a startup function. However, I found a nice little optimization that seems to force the writer to close when the process is manually killed. I added a JVM shutdown hook (i.e. using

Using single or multiple indices for searching different entity types

2008-05-08 Thread JP O'Gorman
Hello, Just to provide some backround information. We have 4 main entities that we can search over in our product. These entities represent... * Client Information (Has 30+ fields storing client information so user can search clientName:Ian) * Project Information (Has 20+ fields) * Contact

RE: lucene farsi problem

2008-05-08 Thread Vizzini
Dear Steven Thanks for reply. I've just checked the link and it was working. Anyway, you are right, but my point is to use the correct term for main 3 reasons: 1. Respect the host language, i.e. English 3. Apparently the Islamic regime in Tehran is against the word ‘Persian’, and we as the

Re: Serious Index Corruption Error - FileNotFoundException

2008-05-08 Thread Michael McCandless
It would make me nervous to have Lucene insert that shutdown hook. EG closing the IndexWriter could in general be a time-consuming process. But if it's working for you, that's great. Though, if you explicitly kill the JVM (eg kill -9) those shutdown hooks won't run. You should use

Re: lucene farsi problem

2008-05-08 Thread Grant Ingersoll
Point #2 does not belong on this forum. This is a forum for Lucene Java, not for political views. There are plenty of other places for that, so let's close this discussion off on this particular point and simply address the issue at hand with Lucene and LUCENE-1279. Cheers, Grant On

Re: Using single or multiple indices for searching different entity types

2008-05-08 Thread Karl Wettin
JP O'Gorman skrev: All this information is stored within the one index. All the documents have a type and an id field to uniquely identify the entity instance. We allow the user to choose what entity types to search over e.g. Just Client and Project information or all information etc. We use

Limit of Lucene

2008-05-08 Thread Michael Siu
What is the limit of Lucene: # of docs per index? If RangeFilter.Bits(), for example, it initializes a bitset to the size of maxDoc from the indexReader. I wonder what happen if the # of docs is huge, say MaxInt (4G in 32bit or 2^63 in 64 bit)?

Re: Limit of Lucene

2008-05-08 Thread Karl Wettin
Michael Siu skrev: What is the limit of Lucene: # of docs per index? Integer.MAX_VALUE Multiple indices joined in a single MultiWhatNot is still limited to that number. If RangeFilter.Bits(), for example, it initializes a bitset to the size of maxDoc from the indexReader. I wonder what

RE: Limit of Lucene

2008-05-08 Thread Michael Siu
The # of documents that we are going to index could be potentially more than 2G. So I guess I have to split the index file into multiple of files with each contain up to 2G files. Any other suggestion? Thanks. -Original Message- From: Karl Wettin [mailto:[EMAIL PROTECTED] Sent:

Re: Limit of Lucene

2008-05-08 Thread Grant Ingersoll
In practice, you will more than likely have to distribute your index across multiple nodes once you get somewhere in the range of tens of millions of documents, but it all depends on your hardware, documents, throughput needs, etc. On May 8, 2008, at 2:13 PM, Michael Siu wrote: The # of