Hi, i had a similar need to create somethink that acts not like a "filter" or "tokenizer" but only inserts self-generated tokens into the token-stream. (my purpose was to generate all kinds of word-forms for german umlauts...)
the following code-base helped me a lot to create it: http://207.44.206.178/message.jspa?messageID=91989#91991 the synonym-filter also adds tokens into the tokenstream regards, Alex On Wednesday 14 July 2010 01:11:02 Kai Weingärtner wrote: > Hello, > > > I am trying to create a suggest search (search results are displayed while > the user is entering the query) for names, but the search should also give > results if the given name just sounds like an indexed name. However a > perfect match should be ranked higher than a similar sounding match. > > > I looked at the SpellChecker contrib, but this AFAIK cannot handle > incomplete names (edge n-grams). > > > So I came up with this idea and it would be great if anyone could tell me > if that is sensible or if there is a better way: > > > I create an analyzer to be run on the full names, which does the following > - lowercase > - build edge n-grams > put these terms in the field (this would handle correctly spelled input) > > > - run soundex on the n-grams > put there soundexed n-grams in the field as well > > > The incoming query will then also run through this analyzer with an > or-default. So a correct spelling will match the normal n-grams plus the > soundexed n-grams leading to a good score. A missspelled name would still > match the soundexed n-grams, leading to a somewhat lower score. > > > My current problem is that I don't know how to duplicate the tokens in the > analyzer so I can add them as normal n-grams and soundexed n-grams. I > suppose the TeeSinkTokenFilter will get me there, but I could not figure > out how to add all tokens back in one stream. > > > To recap, my questions are: Could this approach work to create a "fuzzy > suggest"? How do I use the TeeSinkTokenFilter to separate and recombine the > tokenstream. > > > I hope that was clear, thanks for your help! > > > > Kai > > > > > Regelung im Bezug auf Paragraph 37a Absatz 4 HGB: WidasConcepts GmbH, > Geschaeftsfuehrer: Thomas Widmann und Christian Kappert, > Gerichtsstand Pforzheim, Registernummer: HRB 511442, > Umsatzsteueridentifikationsnummer: DE205851091 > > Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte > Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail > irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und > vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte > Weitergabe dieser Mail sind nicht gestattet. > > This e-mail may contain confidential and/or privileged information. > If you are not the intended recipient (or have received this e-mail in > error) please notify the sender immediately and destroy this e-mail. > Any unauthorized copying, disclosure or distribution of the material in > this e-mail is strictly forbidden. -- Alexander Rothenberg Fotofinder GmbH USt-IdNr. DE812854514 Software Entwicklung Web: http://www.fotofinder.net/ Potsdamer Str. 96 Tel: +49 30 25792890 10785 Berlin Fax: +49 30 257928999 Geschäftsführer: Ali Paczensky Amtsgericht: Berlin Charlottenburg (HRB 73099) Sitz: Berlin --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org