Absolutely, but so what? Nothing in any Solr query is going to be based on character position.
Also, adding and removing characters in a char filter is a really bad idea if you might want to do highlighting since the token character position would not line up with the original source text. -- Jack Krupansky On Mon, Mar 7, 2016 at 10:33 AM, G, Rajesh <r...@cebglobal.com> wrote: > Hi Jack, > > > > Please correct me if iam wrong I added Char filter because > > > > In Analyzer[solr ui] I have provided "Microsoft office" in Field Value > (Index) now WhitespaceTokenizerFactory produces the below result Office > starts at 10. if I leave additional space say 2 more spaces Office starts > at 12 should it not start at 10? > > > > text > > > raw_bytes > > > start > > > end > > > positionLength > > > type > > > position > > > > > microsoft > > > [6d 69 63 72 6f 73 6f 66 74] > > > 0 > > > 9 > > > 1 > > > word > > > 1 > > > > > office > > > [6f 66 66 69 63 65] > > > 10 > > > 16 > > > 1 > > > word > > > 2 > > > > > > > text > > > raw_bytes > > > start > > > end > > > positionLength > > > type > > > position > > > > > microsoft > > > [6d 69 63 72 6f 73 6f 66 74] > > > 0 > > > 9 > > > 1 > > > word > > > 1 > > > > > office > > > [6f 66 66 69 63 65] > > > 12 > > > 18 > > > 1 > > > word > > > 2 > > > > > > > Thanks > > Rajesh > > > > > > Corporate Executive Board India Private Limited. Registration No: > U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building > No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.. > > > > This e-mail and/or its attachments are intended only for the use of the > addressee(s) and may contain confidential and legally privileged > information belonging to CEB and/or its subsidiaries, including CEB > subsidiaries that offer SHL Talent Measurement products and services. If > you have received this e-mail in error, please notify the sender and > immediately, destroy all copies of this email and its attachments. The > publication, copying, in whole or in part, or use or dissemination in any > other way of this e-mail and attachments by anyone other than the intended > person(s) is prohibited. > > > > -----Original Message----- > From: Jack Krupansky [mailto:jack.krupan...@gmail.com] > Sent: Monday, March 7, 2016 8:24 PM > To: solr-user@lucene.apache.org > Subject: Re: Text search NGram > > > > The charFilter isn't doing anything useful - the white space tokenzier > will ignore extra white space anyway. > > > > -- Jack Krupansky > > > > On Mon, Mar 7, 2016 at 5:44 AM, G, Rajesh <r...@cebglobal.com<mailto: > r...@cebglobal.com>> wrote: > > > > > Hi Team, > > > > > > We have the blow type and we have indexed the value "title": > > > "Microsoft Visual Studio 2006" and "title": "Microsoft Visual Studio > > > 8.0.61205.56 (2005)" > > > > > > When I search for title:(Microsoft Visual AND Studio AND 2005) I get > > > Microsoft Visual Studio 8.0.61205.56 (2005) as the second record and > > > Microsoft Visual Studio 2006 as first record. I wanted to have > > > Microsoft Visual Studio 8.0.61205.56 (2005) listed first since the > > > user has searched for Microsoft Visual Studio 2005. Can you please help?. > > > > > > We are using NGram so it takes care of misspelled or jumbled words[it > > > works as expected] e.g. > > > searching Micrs Visual Studio will gets Microsoft Visual Studio > > > searching Visual Microsoft Studio will gets Microsoft Visual Studio > > > > > > <fieldType name="txt_token" class="solr.TextField" > > > positionIncrementGap="0" > > > > <analyzer type="index"> > > > <charFilter > > > class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" > "/> > > > <tokenizer > > > class="solr.WhitespaceTokenizerFactory"/> > > > <filter > > > class="solr.LowerCaseFilterFactory"/> > > > <filter class="solr.NGramFilterFactory" > > > minGramSize="2" maxGramSize="800"/> > > > </analyzer> > > > <analyzer type="query"> > > > <charFilter > > > class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" > "/> > > > <tokenizer > > > class="solr.WhitespaceTokenizerFactory"/> > > > <filter > > > class="solr.LowerCaseFilterFactory"/> > > > <filter class="solr.NGramFilterFactory" > > > minGramSize="2" maxGramSize="800"/> > > > </analyzer> > > > </fieldType> > > > > > > > > > > > > Corporate Executive Board India Private Limited. Registration No: > > > U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF > > > Building > > > No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.. > > > > > > > > > > > > This e-mail and/or its attachments are intended only for the use of > > > the > > > addressee(s) and may contain confidential and legally privileged > > > information belonging to CEB and/or its subsidiaries, including CEB > > > subsidiaries that offer SHL Talent Measurement products and services. > > > If you have received this e-mail in error, please notify the sender > > > and immediately, destroy all copies of this email and its attachments. > > > The publication, copying, in whole or in part, or use or dissemination > > > in any other way of this e-mail and attachments by anyone other than > > > the intended > > > person(s) is prohibited. > > > > > > > > > >