To clarify additionally: we use StandardTokenizer & StandardFilter in front of the WDF. Already following ST's transformations e-tail gets split into two consecutive tokens
On Mon, Jun 15, 2015 at 10:08 AM, Dmitry Kan <solrexp...@gmail.com> wrote: > Thanks, Erick. Analysis page shows the positions are growing=> there are > no "glued" words on the same position. > > On Sun, Jun 14, 2015 at 6:10 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> My guess is that you have WordDelimiterFilterFactory in your >> analysis chain with parameters that break up E-Tail to both "e" and >> "tail" _and_ >> put them in the same position. This assumes that the result fragment >> you pasted is incomplete and "commerce" is in it >> >> From <em>E</em>-Tail <em>commerce</em> >> >> or some such. Try the admin/analysis screen with the "verbose" box checked >> and you'll see the position of each token after analysis to see if my >> guess >> is accurate. >> >> Best, >> Erick >> >> On Sun, Jun 14, 2015 at 4:34 AM, Dmitry Kan <solrexp...@gmail.com> wrote: >> > Hi guys, >> > >> > We observe some strange bug in solr 4.10.2, where by a sloppy query hits >> > words it should not: >> > >> > <lst name="debug"><str name="rawquerystring">the "e commerce"</str><str >> > name="querystring">the "e commerce"</str><str >> > name="parsedquery">SpanNearQuery(spanNear([Contents:the, >> > spanNear([Contents:eä, Contents:commerceä], 0, true)], 300, >> > false))</str><str name="parsedquery_toString">spanNear([Contents:the, >> > spanNear([Contents:eä, Contents:commerceä], 0, true)], 300, false)</str> >> > >> > >> > This query produces words as hits, like: >> > >> > From <em>E</em>-Tail >> > >> > In the inner spanNear query we expect that e and commerce will occur >> within >> > 0 slop in that order. >> > >> > Can somebody shed light into what is going on? >> > >> > -- >> > Dmitry Kan >> > Luke Toolbox: http://github.com/DmitryKey/luke >> > Blog: http://dmitrykan.blogspot.com >> > Twitter: http://twitter.com/dmitrykan >> > SemanticAnalyzer: www.semanticanalyzer.info >> > > > > -- > Dmitry Kan > Luke Toolbox: http://github.com/DmitryKey/luke > Blog: http://dmitrykan.blogspot.com > Twitter: http://twitter.com/dmitrykan > SemanticAnalyzer: www.semanticanalyzer.info > > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info