Re: Lucene query with long strings

2010-03-24 Thread Shashi Kant
Add the common terms such as University, School, Medicine, Institute etc. to stopwords list, so you are left with Stanford, Palo Alto etc. Then use Ahmet's suggestion of using a booleanquery .setMinimumNumberShouldMatch() to (say) 75% of the query string length. Finally, if you wish to be very

Re: Lucene query with long strings

2010-03-24 Thread Grant Ingersoll
On Mar 24, 2010, at 9:20 AM, Shashi Kant wrote: Add the common terms such as University, School, Medicine, Institute etc. to stopwords list, so you are left with Stanford, Palo Alto etc. I don't know if I would remove them, but you might consider using the CommonGram or n-gram approach

Garbage Collection performance on 2.9.2

2010-03-24 Thread Siraj Haider
We upgraded to 2.9.2 from 2.3.2 and the garbage collection performance deteriorated drastically. The system is going to Full GC cycles with long pauses very frequently. Did something got changed that we need to account for? thanks in advance -siraj

Fields with the same name

2010-03-24 Thread Murdoch, Paul
Hi, I have a quick question. If I have an index where some text values are indexed under the same field name, but some are ANALYZED and some are NOT_ANALYZED, does the last value's flags change the flags for the whole field name? For instance if I index 3 sentences under a field name as

Custom Filter

2010-03-24 Thread Siraj Haider
Hello there, I am getting exception when running queries with new getDocIdSet() in my customer filter. Following is the code for my getDocIdSet() function: /public DocIdSet getDocIdSet(IndexReader reader) throws IOException { OpenBitSet bitSet = new OpenBitSet(reader.maxDoc()); for

Re: Garbage Collection performance on 2.9.2

2010-03-24 Thread Grant Ingersoll
On Mar 24, 2010, at 2:13 PM, Siraj Haider wrote: We upgraded to 2.9.2 from 2.3.2 and the garbage collection performance deteriorated drastically. The system is going to Full GC cycles with long pauses very frequently. Did something got changed that we need to account for? Yes, quite a

Re: Fields with the same name

2010-03-24 Thread Erick Erickson
I don't think so, but a quick way to check would be to look at your index with a copy of Luke and see what the actual tokens are. But I'm not sure it matters, I don't think you *can* make things work out well; your query-time analysis will be...er...difficult. You only get to specify one analyzer

RE: Fields with the same name

2010-03-24 Thread Murdoch, Paul
It was an unexpected coincidence that the two cases ended up with the same field name. I just changed the one case to index with a different field name and that fixed my problem. I was still curious though. Thanks, Paul -Original Message- From:

Re: Garbage Collection performance on 2.9.2

2010-03-24 Thread Michael McCandless
Is this during indexing or searching? Mike On Wed, Mar 24, 2010 at 3:45 PM, Grant Ingersoll gsing...@apache.org wrote: On Mar 24, 2010, at 2:13 PM, Siraj Haider wrote: We upgraded to 2.9.2 from 2.3.2 and the garbage collection performance deteriorated drastically.  The system is going to

Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 21, 2010

2010-03-24 Thread Grant Ingersoll
Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20 21, 2010 All submissions must be received by Tuesday, April 13, 2010, 12 Midnight CET/6 PM US EDT The first European conference dedicated to Lucene and Solr is coming to Prague from May 18-21, 2010. Apache Lucene

Fwd: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 21, 2010

2010-03-24 Thread Yonik Seeley
Forwarding to lucene only - the big cross-post caused my gmail filters to file it. -Yonik -- Forwarded message -- From: Grant Ingersoll gsing...@apache.org Date: Wed, Mar 24, 2010 at 8:03 PM Subject: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 21,

Filters and multiple, per-segment calls to getDocIdSet

2010-03-24 Thread Daniel Noll
Hi all. I notice that Filter.getDocIdSet() is now documented as follows: Note: This method will be called once per segment in the index during searching. The returned {...@link DocIdSet} must refer to document IDs for that segment, not for the top-level reader. If I look at

Is anyone using SOLR in Australia?

2010-03-24 Thread Andrew Bruno
Hi all, I was wondering if anyone is using SOLR successfully in Australia in a high end high transaction system? Cheers Andrew - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: