RE: How best to compare tow sentences

2014-12-04 Thread Oliver Christ
Conceptually this use case is similar to what translation memories do. For an open-source TM engine, have a look at http://okapi.opentag.com/, and its default TM engine (Pensieve TM). Cheers, Oli -Original Message- From: Barry Coughlan [mailto:b.coughl...@gmail.com] Sent: Wednesday,

RE: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Oliver Christ
The hard way may be to use the standard Analyzing Suggester but to add each (analyzed) suffix of the surface string (mapping to the full surface form) during automaton generation. I.e. when adding Donau..., you add all analyzed suffixes donau..., onau..., nau..., ... - all mapping to

RE: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Oliver Christ
Hi Michael, There may be several entry points, I'm not sure which one still works - the suggester data processing chain has changed quite a bit since I looked at it about two years ago, maybe Mike or Robert can chime in if I'm totally off. One way I experimented with was to implement a custom

RE: fuzzy/case insensitive AnalyzingSuggester )

2014-06-20 Thread Oliver Christ
Hi Clemens, I haven't yet built a suggester which combines all three, and am not aware of one. I'd love to have one though ;-) Case- and diacritics insensitivity is supported out-of-the-box by the analyzing suggesters, including the FuzzySuggester. The logic is in the Analyzer. I haven't yet

RE: maxDoc/numDocs int fields

2014-03-21 Thread Oliver Christ
Can you split your corpus across multiple Lucene instances? Cheers, Oli -Original Message- From: Artem Gayardo-Matrosov [mailto:ar...@gayardo.com] Sent: Friday, March 21, 2014 12:29 PM To: java-user@lucene.apache.org Subject: maxDoc/numDocs int fields Hi all, I am using lucene to

Suggesters: payloads and filter predicates

2014-01-08 Thread Oliver Christ
Hi, It's great to see support for payloads in the suggesters - this is really helpful, and pretty much addresses LUCENE-4516. Are there any plans to also support them for WFSTs? We have some cases where we don't need the Analyzer's capabilities (we look up the completion using the payload

FST-based suggesters: recent changes, binary compatibility of automata

2013-03-01 Thread Oliver Christ
Hi, I've seen some changes in trunk regarding the data format of Lucene's FST-based suggesters, and wonder whether the automata created by trunk builds/next Lucene version are/will be binary-compatible to the ones created with the current release, or whether any magic versioning is taking

RE: Suggesters: circumfix suggestions

2013-01-17 Thread Oliver Christ
at 4:27 PM, Oliver Christ ochr...@ebscohost.com wrote: Hi, Has anyone tried to implement circumfix suggesters, where the suggestion is a circumfix of the lookup string? E.g. sox rumor suggests boston red sox rumors (try it on google.com). I think there are several of ways

RE: Alternative for WildcardQuery with leading *

2012-12-07 Thread Oliver Christ
If I remember correctly it was Baeza-Yates or someone in his group at U Santiago who came up with the rotated term indexing. Indexing abc, you explicitly mark end of string and index all rotations using a data structure which supports prefix search (such as a trie): abc$ bc$a c$ab $abc This

RE: WFST/Analyzing Suggesters: foreign keys, user-supplied filter, highlighting

2012-10-31 Thread Oliver Christ
Hi, I've added LUCENE-4516 - Suggesters: allow to associate a user-specified key (int) with a string LUCENE-4517 - Suggesters: allow to pass a user-defined predicate/filter to the completion searcher LUCENE-4518 - Suggesters: highlighting (explicit markup of user-typed portions vs. generated

WFST/Analyzing Suggesters: foreign keys, user-supplied filter, highlighting

2012-10-30 Thread Oliver Christ
Hi, I'm currently researching using a WFST suggester on e.g. book titles. While our basic use cases are well covered, there seem to be at least three which aren't: * The possibility to associate a foreign key with a string (rather: final node) in the WFST (in addition to the