Search while indexing

2009-03-06 Thread sonfon
Dear All, Now, I'm considering to build index for my application with lucene. However, as the document sources I'm going to index has many duplications, so before adding a document to an IndexWriter, I hope search in the index database first to see if a same document copy has already been ad

Re: deletion of index-files fails

2009-03-06 Thread rolarenfan
FWIW, +1 from me on all this: when I started poking at my little problem I found as you said that there was really no way to trace the issue (one can use the debugger of course and I did, which is how I found the problem). So, getRefCount() would be good! thanks, Paul -Original Message--

Re: deletion of index-files fails

2009-03-06 Thread Erick Erickson
OK, I understand now. Like I said, anything you deem appropriate. Best Erick On Fri, Mar 6, 2009 at 5:45 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > If we changed the signature (return value) then on dropping in the JAR > you'd have to recompile your code, which violates our bac

Re: deletion of index-files fails

2009-03-06 Thread Michael McCandless
If we changed the signature (return value) then on dropping in the JAR you'd have to recompile your code, which violates our back compat goals, ie "drop in JAR and run". Mike Erick Erickson wrote: Why would it break back compat? They just return void now, so IndexReader.incRef(); should

ZipFile directory implementation

2009-03-06 Thread tsuraan
I wrote a really basic read-only Directory implementation for indices contained in zip files. It's read-only because that's what Java's API supports, and it has no documentation or anything else because I haven't gotten to that yet. It also claims its package is org.apache.lucene.store since that

Re: Deadlock in using FSDirectory

2009-03-06 Thread Michael McCandless
MakMak wrote: Hey Mike, thanks for the quick response, I tried passing Directory to IndexReader.open() and there were no deadlocks!! I will get rid of synchronizing on FSDirectory too. Great! However do you think it will be better to modify the docs for FSDirectory and remove the sync part

Re: Deadlock in using FSDirectory

2009-03-06 Thread MakMak
Hey Mike, thanks for the quick response, I tried passing Directory to IndexReader.open() and there were no deadlocks!! I will get rid of synchronizing on FSDirectory too. However do you think it will be better to modify the docs for FSDirectory and remove the sync part of "Directories are cached

Re: deletion of index-files fails

2009-03-06 Thread Erick Erickson
Why would it break back compat? They just return void now, so IndexReader.incRef(); should still compile/run. But that's arguing about angels dancing on pins. My real issue is that by not allowing *some* mechanism to get the refcount developers don't have any tools for figuring out that it's a r

Re: Marking commit points as deleted does not clean up on IW.close

2009-03-06 Thread Shalin Shekhar Mangar
On Fri, Mar 6, 2009 at 5:01 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > Shalin, did you ever get to the bottom of this? > No, I'll try to reproduce this and let you know tomorrow. -- Regards, Shalin Shekhar Mangar.

Re: Deadlock in using FSDirectory

2009-03-06 Thread Michael McCandless
It's not safe for you to synchronize externally on the Directory instance returned from FSDirectory.getDirectory -- that's leading to the deadlock here right? It looks like you passed in a File or String to IndexReader.open? One workaround (I think -- not tested) might be to pass Directory inst

Deadlock in using FSDirectory

2009-03-06 Thread MakMak
Hi, I have the following : Thread1 1. Acquires a lock on FSDirectory.getDirectory (not right, not needed, but should not be harmful anyway) 2. Issues an IndexReader.reopen() to open the reader and search. This call waits on acquiring a MultiSegmentReader lock. Thread2 - 1. Issues

Re: deletion of index-files fails

2009-03-06 Thread Michael McCandless
Yes ref counts are tricky, though these are expert APIs. I think changing close, incRef, decRef to return the RC would be good, though that breaks back compat. How about exposing getRefCount() instead? Mike Erick Erickson wrote: H, reference counting is always yucky. I looked the Ind

Re: deletion of index-files fails

2009-03-06 Thread Erick Erickson
H, reference counting is always yucky. I looked the IndexReader javadocs over and there isn't any help there for managing refcounts. You can't find the current refcount, close doesn't indicate the results, etc. Or I missed, for the Nth time, perfectly obvious documentation. What do people thin

Re: deletion of index-files fails

2009-03-06 Thread Michael McCandless
OK, phew! Thanks for bringing closure. Mike rolaren...@earthlink.net wrote: I did just now double/triple-check: the IndexWriter is definitely closed. However (cough), I did have a bogus call to IndexReader.incRef() ... once I removed that, the call to IndexReader.close() actually worked

Re: deletion of index-files fails

2009-03-06 Thread rolarenfan
I did just now double/triple-check: the IndexWriter is definitely closed. However (cough), I did have a bogus call to IndexReader.incRef() ... once I removed that, the call to IndexReader.close() actually worked and then the deletion did so too. Thanks; sorry to trouble you. -Paul -Orig

Re: deletion of index-files fails

2009-03-06 Thread rolarenfan
Right, I should have included these data in my orig. message (sorry): WinXP, R2.4 I do have permissions and the files are definitely part of the index being removed; nothing outside of (my code that uses) Lucene would have a handle on these files. -Paul -Original Message- >From: Ia

Re: Tomcat Threads are BLOCKED after some time

2009-03-06 Thread Yonik Seeley
On Fri, Mar 6, 2009 at 5:43 AM, damu_verse wrote: >            We have tried with the Lucene-2.4.0 also (JVM not changed) .. > But still threads are blocking..Not able to find the root cause... What is the *full* thread dump? Some threads blocking is fine and normal - there isn't necessarily anyt

Fuzzy Query with german special characters

2009-03-06 Thread Sertic Mirko, Bedag
h...@all I'd like to do a fuzzy search with german special characters. For instance I want to query for "müller", but also terms like "mueller" should be respected, as ü can also be written as ue. How could this be done? At index creation time, I could convert ü to ue, and just use the ue ve

Re: Using Lucene for user query parsing

2009-03-06 Thread Erick Erickson
Whatever you do will be wrong . What you're saying is that you have structured data that the user wants to search in an unstructured way, and you want to try to create a system that intuits what the user meant. Good luck . Can you back up a bit and talk about the problem you're trying to solve? If

Re: Questions about analyzer

2009-03-06 Thread Erick Erickson
See below On Fri, Mar 6, 2009 at 1:44 AM, Ganesh wrote: > Hello all > > 1) > Which is best to use Snowball analyzer or Lucene contrib analyzer? There is > no inbuilt stop word list for Snowball analyzer? > What is the "Lucene contrib analyzer"? There are 12 of them.. And regardless, the answ

Re: Filters - at what stage are they applied?

2009-03-06 Thread Michael McCandless
Prior to 2.4, the search runs first and then the filter. Ie, search does all the work to produce docIDs that match it, and then per docID the filter is checked. As of 2.4, they actually sort of play leap-frog, document by document. First, was ask the filter for its first matching docID,

Re: indexing but not tokenizing

2009-03-06 Thread Ian Lea
I don't know how QueryParser works behind the scenes but it looks like this is at least known behaviour. From the QueryParser javadocs: setLowercaseExpandedTerms public void setLowercaseExpandedTerms(boolean lowercaseExpandedTerms) Whether terms of wildcard, prefix, fuzzy and range queries

Re: Marking commit points as deleted does not clean up on IW.close

2009-03-06 Thread Michael McCandless
Shalin, did you ever get to the bottom of this? Mike Michael McCandless wrote: You mean on calling IndexWriter.close, with a deletion policy that's functionally equivalent to KeepOnlyLastCommitDeletionPolicy, you somehow see that last 2 commits remaining in the Directory once IndexWrit

Re: Lucene: MultiSearcher

2009-03-06 Thread Michael McCandless
You could look at the docID of each hit, and compare to the .maxDoc() of each underlying reader. MultiSearcher logically "concatenates" the docIDs. However, docIDs are an internal identifier for Lucene, so it's always possible in a new release of Lucene that how docIDs are mapped by Mult

Re: Instantiating a RAMDirectory from a mutating directory

2009-03-06 Thread Michael McCandless
This is an interesting challenge! Responses below... Kieran Topping wrote: Hello, I would like to be able to instantiate a RAMDirectory from a directory that an IndexWriter in another process might currently be modifying. Ideally, I would like to do this without any synchronizing or

Re: Tomcat Threads are BLOCKED after some time

2009-03-06 Thread damu_verse
Hi Yonik We have tried with the Lucene-2.4.0 also (JVM not changed) .. But still threads are blocking..Not able to find the root cause... thanks & regards -damu Yonik Seeley-2 wrote: > > Hmmm, if this is some sort of deadlock, we may need a thread dump of > all of the threads. > Do

Re: indexing but not tokenizing

2009-03-06 Thread John Marks
Another problem. Using the PerFieldAnalyzerWrapper solves the case where I have a simple query, such as the following: Query query = parser.parse("X"); or Query query = parser.parse("X OR Y"); but if I use a more complex query like the following: Query query = parser.parse("[A TO

Re: Using Lucene for user query parsing

2009-03-06 Thread Vasudevan Comandur
You could have single index file with all the names tagged at the time of indexing. For the query parsing, you could have a lookup for common words ending which identify the business names (like Corp, Inc, LLC, Ltd, etc.) and common words like (road, avenue, street, lane etc) for address and separ

Re: deletion of index-files fails

2009-03-06 Thread Michael McCandless
If truly the IndexWriter & all IndexReaders are closed, then they should no longer be holding open files. Maybe triple check that you've indeed closed everything. It's remotely possible that some other process (virus checker, source control clients, etc) has the file open. You could tr

Re: error in code

2009-03-06 Thread Ganesh
Corrected the second line.. Please refer the Javadocs for more help. document.add(new Field("path",textFiles[i].getPath(), Field.Store.NO, Field.Index.ANALYZED)); Regards Ganesh - Original Message - From: "nitin gopi" To: Sent: Friday, March 06, 2009 2:27 PM Subject: Re: error in

Re: Using Lucene for user query parsing

2009-03-06 Thread Ian Lea
Can you not make one index with all three types of name and just search that? Sounds much easier. You might get a few funnies like business Kingston on McDonald's street, but they'd be the exception. -- Ian. On Fri, Mar 6, 2009 at 6:25 AM, Srinivas Bharghav wrote: > I am trying to evaluate as

Re: deletion of index-files fails

2009-03-06 Thread Ian Lea
What OS are you running? What version of lucene? Are you sure that you have privilege to delete the files that it is failing on? That they are part of the index you are trying to remove? That something else doesn't have the files open? It seems likely that you are on Windows and that something

AUTO: Zhou Lin Dai is out of the office. (returning 2009-03-07)

2009-03-06 Thread Zhou Lin Dai
I am out of the office until 2009-03-07.. I will check emails at night. For anything emergent, you can call my cell phone (86) 131 6290 0375. Note: This is an automated response to your message Re: error in code sent on 6/3/09 13:31:02. This is the only notification you will receive while this

Re: error in code

2009-03-06 Thread nitin gopi
hi Ganesh, the program still gives error in the second line . it says that *cannot find symbol * .I think that we are initializing the object of Field class two times, that is why we are getting error. document.add(new Field("content",textReader)); document.add(new Field("path",textFiles[i].getP