Finding a match for an automaton against a FST

2015-01-09 Thread Olivier Binda
Hello. 1) What is the best way to check if an automaton (from a regex or a string with a wildcard) has at least 1 match against a FST (from a WFSTCompletionLookup) ? Is there a simple way to do that ? 2) Also, is there a simple/efficient way to find the lowest and the highest arcs of a FST

Re: Filtering MoreLikeThis results

2015-01-09 Thread Tomoko Uchida
Hi, > find me the 10 most similar documents I suppose you mean "mlt.count" supported by MoreLikeThisComponent. https://cwiki.apache.org/confluence/display/solr/MoreLikeThis MLT is ordinary search in Lucene, so you get documents in order of similarity (default scoring criteria) and can limit resu

version 4.10.3 AnalyzingInfixSuggester with multiple contexts

2015-01-09 Thread Greg Huber
Hello, I am trying to use multiple contexts on the org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester but there is a mistake on the CONTEXTS_FIELD_NAME, the BooleanClause.Occur.SHOULD needs to be BooleanClause.Occur.MUST. ( see << below) I noticed that its been fix

Re: version 4.10.3 AnalyzingInfixSuggester with multiple contexts

2015-01-09 Thread Michael McCandless
That change (a new feature, to let you control MUST vs SHOULD for each context) was done with https://issues.apache.org/jira/browse/LUCENE-6050 But it's a new feature, not a bug ... and 4.10.x is for bug fixes only, so I don't think we will backport it. Mike McCandless http://blog.mikemccandless

Re: version 4.10.3 AnalyzingInfixSuggester with multiple contexts

2015-01-09 Thread Michael McCandless
Well this is by design really. Ie, the original intent here (4.10.3) is to return a suggestion if it has any of the specified contexts. Maybe for 4.10.3 you could subclass AIS and override finishQuery to rewrite the SHOULD to MUST in your case? Mike McCandless http://blog.mikemccandless.com O

index writer closes due to OOM/heap space issue but no recovery after GC

2015-01-09 Thread Tom Burton-West
Hello, I'm testing Solr 4.10.2 with 4GB allocated to the heap. During the indexing process I get an error message that says it is caused by an "already closed indexwriter" due to an OOM. (See below). After this occurs it looks like the GC kicks in and there is plenty of heap space(see attached)

Details on setting block parameters for Lucene41PostingsFormat

2015-01-09 Thread Tom Burton-West
Hello all, We have over 3 billion unique terms in our indexes and with Solr 3.x we set the TermIndexInterval to about 8 times its default value in order to index without OOMs. ( http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again) We are now working with Solr 4 and running in

RE: index writer closes due to OOM/heap space issue but no recovery after GC

2015-01-09 Thread Ryan, Michael F. (LNG-DAY)
I’m not sure about this particular error, but in general, once the JVM has OOM’d, it is completely unreliable and should be restarted. I’m assuming Lucene catches the OOM just so that it doesn’t get in a state where it will corrupt the index. -Michael From: Tom Burton-West [mailto:tburt...@umi