Yes, WDDF creates multiple tokens. But that has nothing to do with the multiValued suggestion.
You can get exactly what you want by 1> setting multiValued="true" in your schema file and re-indexing. Say positionIncrementGap is set to 100 2> When you index, add the field for each sentence, so your doc looks something like: <doc> <field name="sentences">i am a sales-manager in here</field> <field name="sentences">using asp.net and .net daily</field> ..... </doc> 3> search like "sales manager"~100 Best Erick On Wed, Feb 8, 2012 at 3:05 AM, Rob Brown <r...@intelcompute.com> wrote: > Apologies if things were a little vague. > > Given the example snippet to index (numbered to show searches needed to > match)... > > 1: i am a sales-manager in here > 2: using asp.net and .net daily > 3: working in design. > 4: using something called sage 200. and i'm fluent > 5: german sausages. > 6: busy A&E dept earning £10,000 annually > > > ... all with newlines in place. > > able to match... > > 1. sales > 1. "sales manager" > 1. sales-manager > 1. "sales-manager" > 2. .net > 2. asp.net > 3. design > 4. sage 200 > 6. A&E > 6. £10,000 > > But do NOT match "fluent german" from 4 + 5 since there's a newline > between them when indexed, but not when searched. > > > Do the filters (wdf in this case) not create multiple tokens, so if > splitting on period in "asp.net" would create tokens for all of "asp", > "asp.", "asp.net", ".net", "net". > > > Cheers, > Rob > > -- > > IntelCompute > Web Design and Online Marketing > > http://www.intelcompute.com > > > -----Original Message----- > From: Chris Hostetter <hossman_luc...@fucit.org> > Reply-to: solr-user@lucene.apache.org > To: solr-user@lucene.apache.org > Subject: Re: Which Tokeniser (and/or filter) > Date: Tue, 7 Feb 2012 15:02:36 -0800 (PST) > > : This all seems a bit too much work for such a real-world scenario? > > You haven't really told us what your scenerio is. > > You said you want to split tokens on whitespace, full-stop (aka: > period) and comma only, but then in response to some suggestions you added > comments other things that you never mentioned previously... > > 1) evidently you don't want the "." in foo.net to cause a split in tokens? > 2) evidently you not only want token splits on newlines, but also > positition gaps to prevent phrases matching across newlines. > > ...these are kind of important details that affect suggestions people > might give you. > > can you please provide some concrete examples of hte types of data you > have, the types of queries you want them to match, and the types of > queries you *don't* want to match? > > > -Hoss >