Re: Querying wildcard

2008-10-29 Thread Aditi Goyal
Thanks Anshum and Eric. Well, I was looking for something like searching by domain name in the email address etc. How can I reverse the tokens? Can you please explain in little detail? Thanks, Aditi On Thu, Oct 30, 2008 at 10:58 AM, Anshum <[EMAIL PROTECTED]> wrote: > Hi Aditi, > As Eric mentio

Re: IllegalStateEx thrown when calling close

2008-10-29 Thread Jed Wesley-Smith
Mike, regarding this paragraph: "To workaround this, on catching an OOME on any of IndexWriter's methods, you should 1) forcibly remove the write lock (IndexWriter.unlock static method) and then 2) not call any methods on the old writer. Even if the old writer has concurrent merges running, the

Re: Querying wildcard

2008-10-29 Thread Anshum
Hi Aditi, As Eric mentioned, we'd need to know more to provide a rather apt solution. At the same time, a prefix wildcard is a highly unoptimized thing for lucene because of the way the index is stored/read. Ideally you'd atleast want to reverse the tokens as already mentioned. This is because the

problem with highlighter

2008-10-29 Thread Agrawal, Aashish (IT)
Hi, I am using RegexQuery and Highlighter, my query works fine and i get the matches, but there is nothing being printed out from highlighter ? at the same time, if I use Query, it works fine . is something wrong with the code below ? code -- //line -->input string (ie ".*out") RegexQ

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-29 Thread Todd Benge
Thanks Mark. I appreciate the help. I thought our memory may be low but wanted to verify there if there is any way to control memory usage. I think we'll likely upgrade the memory on the machines but that may just delay the inevitable. Wondering if anyone else has encountered similar issues wit

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-29 Thread Mark Miller
The term, terminfo, indexreader internals stuff is prob on the low end compared to the size of your field caches (needed for sorting). If you are sorting by String I think the space needed is 32 bits x number of docs + an array to hold all of the unique terms. So checking 300 million docs (I kn

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-29 Thread Todd Benge
There's usually only a couple sort fields and a bunch of terms in the various indices. The terms are user entered on various media so the number of terms is very large. Thanks for the help. Todd On 10/29/08, Todd Benge <[EMAIL PROTECTED]> wrote: > Hi, > > I'm the lead engineer for search on a

Re: IllegalStateEx thrown when calling close

2008-10-29 Thread Jed Wesley-Smith
not in 2.3.2 though. cheers, jed. Michael McCandless wrote: Or you can use IndexReader.unlock. Mike Jed Wesley-Smith wrote: Michael McCandless wrote: To workaround this, on catching an OOME on any of IndexWriter's methods, you should 1) forcibly remove the write lock (IndexWriter.unlock

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-29 Thread Mark Miller
How many fields are you sorting on? Lots of unuiqe terms in those fields? - Mark On Oct 29, 2008, at 6:03 PM, "Todd Benge" <[EMAIL PROTECTED]> wrote: Hi, I'm the lead engineer for search on a large website using lucene for search. We're indexing about 300M documents in ~ 100 indices.

OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-29 Thread Todd Benge
Hi, I'm the lead engineer for search on a large website using lucene for search. We're indexing about 300M documents in ~ 100 indices. The indices add up to ~ 60G. The indices are sorted into 4 different Multisearcher with the largest handling ~50G. The code is basically like the following:

Re: Runtime exception when creating IndexSearcher

2008-10-29 Thread Michael McCandless
OK I created this issue: https://issues.apache.org/jira/browse/LUCENE-1430 Mike Mindaugas Žakšauskas wrote: Hi, see my comments between Mike's text: On Wed, Oct 29, 2008 at 4:05 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: Hmm, so somehow your stored fields file is truncated --

Re: Runtime exception when creating IndexSearcher

2008-10-29 Thread Michael McCandless
Actually, compound file defaults to true. One odd thing about your index: it has a single segment with 0 docs. What was the history that led to this index? Did you create an index, and then delete all of its documents, and optimize that? Or... something else? Mike Mindaugas Žakšauska

Re: Runtime exception when creating IndexSearcher

2008-10-29 Thread Mindaugas Žakšauskas
Hi, see my comments between Mike's text: On Wed, Oct 29, 2008 at 4:05 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Hmm, so somehow your stored fields file is truncated -- FieldsReader was > unable to read the first int. > > Are you using compound file format in this index? I'm not calli

Re: Runtime exception when creating IndexSearcher

2008-10-29 Thread Michael McCandless
Hmm, so somehow your stored fields file is truncated -- FieldsReader was unable to read the first int. Are you using compound file format in this index? Do you have any idea how your index may have become corrupt? Do you still have the original corrupt (not yet fixed) index? If so can yo

Re: Runtime exception when creating IndexSearcher

2008-10-29 Thread Mindaugas Žakšauskas
Hi, Following Mike's advice, the actual (non-masked exception using Directory constructor) was as following: Exception in thread "main" java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151) at org.apache.lucene.store

Re: Runtime exception when creating IndexSearcher

2008-10-29 Thread Michael McCandless
I think I see how this exception can happen. I think you are hitting a different exception, which is masked by the exception you're seeing. Can you run CheckIndex on this index? I think that should show the actual root cause. I think another simple way to see the root cause would be to

Re: instantiated index in 2.4

2008-10-29 Thread Karl Wettin
Hi Darren, How large is your corpus? The speed you can expect depends on how much data you load it with. There is a graph in the package level javadocs that shows this: http://lucene.apache.org/java/2_4_0/api/contrib-instantiated/org/apache/lucene/store/instantiated/package-summary.html

Re: Runtime exception when creating IndexSearcher

2008-10-29 Thread Mindaugas Žakšauskas
Hi Erick, Sorry for not providing the context. The problem is that I couldn't work out the exact test case for causing this - I will definitely post one if I find. There's a possible cause for this but I don't want to speculate as I don't know for sure. Just to answer (some of) your questions, th

Re: Runtime exception when creating IndexSearcher

2008-10-29 Thread Erick Erickson
Well, I'd expect it to throw this error if you tried to close an already-closed FSDirectory, But that's pretty useless since you don't provide much context around your problem. Did this just start occurring? Did you just migrate to 2.4 from a previous version? Are you sure you aren't closing an al

Re: Newbie Question: Query Creation Best Approach

2008-10-29 Thread Grant Ingersoll
Hmm, this strikes me as there being something wrong with the index, but it could be a bug, too. Do you get an error if you just run the BooleanQuery without the filter? How about if you run a simple TermQuery with the Filter? Can you open the index with Luke? Does the CheckIndex tool (i

Runtime exception when creating IndexSearcher

2008-10-29 Thread Mindaugas Žakšauskas
Hi, We're using Lucene 2.4.0 on Linux. Java version is 1.6.0_06. Is there any reason why Lucene would be throwing this error: org.apache.lucene.store.AlreadyClosedException: this Directory is closed at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220) at org.apache

Re: Querying wildcard

2008-10-29 Thread Erick Erickson
Sure, there are many tricks. If you search the mail archives you'll find a bunch of them. One would be to reverse the tokens and make your leading wildcard queries into trailing ones on the reversed field. But without more details about what you're trying to accomplish, there's not much really us

Re: Query Search returns always the same id

2008-10-29 Thread Erick Erickson
Actually, FWIW, just after I posted last night I realized why the ID was always the same, perhaps it'll be useful as an insight into how Lucene works... When you add the same field to a document, all the values are added and retrieved in order. So calling " hits.doc(i).get("id")" returns the *firs

Re: Change the merge factor for an existing index?

2008-10-29 Thread Michael McCandless
It's fine to change any of IW's parameters on an existing index. Nothing will break. However, in general, such changes won't be retroactive: they only apply to future actions the IW will take. So, changing maxMergeDocs will only prevent future merges from producing segments larger than

Querying wildcard

2008-10-29 Thread Aditi Goyal
Hi All, I have been wanting to do a wildcard search with * as a first letter on an index. Is there a way out except for setAllowLeadingWildcard() of QueryParser to true? Because, i have heard it is an expensive operation. Thanks Aditi

Re: IllegalStateEx thrown when calling close

2008-10-29 Thread Michael McCandless
Jed Wesley-Smith wrote: Yeah, I saw the change to flush(). Trying to work out the correct strategy for our IndexWriter handling now. We probably should not be using autocommit for our writers. autoCommit=true is deprecated as of 2.4.0, and will go away when we finally get to 3.0, so I th

Re: Lucene Index taking a lot to time

2008-10-29 Thread Michael McCandless
It looks like this is using Lucene 2.4.0. Indexing time suddenly increased with respect to what baseline? 2.3? A previous run on 2.4? Mike Birendar Singh Waldiya -X (bwaldiya - TCS at Cisco) wrote: Hi Gurus, We are using Lucene for creating indexes on some database column and suddenly

Re: IllegalStateEx thrown when calling close

2008-10-29 Thread Michael McCandless
Or you can use IndexReader.unlock. Mike Jed Wesley-Smith wrote: Michael McCandless wrote: To workaround this, on catching an OOME on any of IndexWriter's methods, you should 1) forcibly remove the write lock (IndexWriter.unlock static method) IndexWriter.unlock(*) is 2.4 only. Use the fo