RE: How best to compare tow sentences

2014-12-04 Thread Oliver Christ
Conceptually this use case is similar to what translation memories do. For an open-source TM engine, have a look at http://okapi.opentag.com/, and its default TM engine (Pensieve TM). Cheers, Oli -Original Message- From: Barry Coughlan [mailto:b.coughl...@gmail.com] Sent: Wednesday,

RE: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Oliver Christ
automaton. Could you give me a hint, how to start? Thank you for your kind support Michael [cid:image001.jpg@01CFF1CB.4F6814F0] Oliver Christ<mailto:ochr...@ebsco.com> Montag, 27. Oktober 2014 12:47 The hard way may be to use the standard Analyzing Suggester but to add each (analyzed) suffi

RE: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Oliver Christ
The hard way may be to use the standard Analyzing Suggester but to add each (analyzed) suffix of the surface string (mapping to the full surface form) during automaton generation. I.e. when adding "Donau...", you add all analyzed suffixes "donau...", "onau...", "nau...", ... - all mapping to "

RE: fuzzy/case insensitive AnalyzingSuggester )

2014-06-20 Thread Oliver Christ
Hi Clemens, I haven't yet built a suggester which combines all three, and am not aware of one. I'd love to have one though ;-) Case- and diacritics insensitivity is supported out-of-the-box by the analyzing suggesters, including the FuzzySuggester. The logic is in the Analyzer. I haven't yet t

RE: maxDoc/numDocs int fields

2014-03-21 Thread Oliver Christ
Can you split your corpus across multiple Lucene instances? Cheers, Oli -Original Message- From: Artem Gayardo-Matrosov [mailto:ar...@gayardo.com] Sent: Friday, March 21, 2014 12:29 PM To: java-user@lucene.apache.org Subject: maxDoc/numDocs int fields Hi all, I am using lucene to index

Suggesters: payloads and filter predicates

2014-01-08 Thread Oliver Christ
Hi, It's great to see support for payloads in the suggesters - this is really helpful, and pretty much addresses LUCENE-4516. Are there any plans to also support them for WFSTs? We have some cases where we don't need the Analyzer's capabilities (we look up the completion using the payload infor

FST-based suggesters: recent changes, binary compatibility of automata

2013-03-01 Thread Oliver Christ
Hi, I've seen some changes in trunk regarding the data format of Lucene's FST-based suggesters, and wonder whether the automata created by trunk builds/next Lucene version are/will be binary-compatible to the ones created with the current release, or whether any magic versioning is taking plac

RE: Suggesters: circumfix suggestions

2013-01-17 Thread Oliver Christ
.. if you have strong priors / boost (e.g. you have a good source of "popularity" or something) then you could sort by that ... Mike McCandless http://blog.mikemccandless.com On Wed, Jan 16, 2013 at 4:27 PM, Oliver Christ wrote: > Hi, > > > > Has anyone tried to implement circumfix suggeste

Suggesters: circumfix suggestions

2013-01-16 Thread Oliver Christ
Hi, Has anyone tried to implement circumfix suggesters, where the suggestion is a circumfix of the lookup string? E.g. "sox rumor" suggests "boston red sox rumors" (try it on google.com). I think there are several of ways to implement this: * Given some multiword term, ad

RE: Alternative for WildcardQuery with leading *

2012-12-07 Thread Oliver Christ
If I remember correctly it was Baeza-Yates or someone in his group at U Santiago who came up with the rotated term indexing. Indexing "abc", you explicitly mark end of string and index all rotations using a data structure which supports prefix search (such as a trie): abc$ bc$a c$ab $abc This

RE: WFST/Analyzing Suggesters: foreign keys, user-supplied filter, highlighting

2012-10-31 Thread Oliver Christ
Hi, I've added LUCENE-4516 - Suggesters: allow to associate a user-specified key (int) with a string LUCENE-4517 - Suggesters: allow to pass a user-defined predicate/filter to the completion searcher LUCENE-4518 - Suggesters: highlighting (explicit markup of user-typed portions vs. generated

WFST/Analyzing Suggesters: foreign keys, user-supplied filter, highlighting

2012-10-30 Thread Oliver Christ
Hi, I'm currently researching using a WFST suggester on e.g. book titles. While our basic use cases are well covered, there seem to be at least three which aren't: * The possibility to associate a "foreign key" with a string (rather: final node) in the WFST (in addition to the rank