Re: Which Tokenizer to use at searching

abhishek jain Mon, 10 Mar 2014 05:21:47 -0700

Hi,
As a solution, i have tried a combination of PatternTokenizerFactory and
PatternReplaceFilterFactory .


In both query and indexer i have written:

<tokenizer class="solr.PatternTokenizerFactory" pattern="\s+" />
<filter class="solr.PatternReplaceFilterFactory" pattern="([^-\w]+)"
replacement=" punct " replace="all"/>

What i am trying to do is tokenizing on space and then rewriting every
special character as " punct " .

So, A,B becomes A punct B .

but the problem is A punct B is still one word and not tokenized further
application of filter,

Is there a way i can tokenize after application of filter, please suggest i
know i am missing something basic.

thanks
abhishek


On Mon, Mar 10, 2014 at 2:06 AM, <abhishek.netj...@gmail.com> wrote:

> Hi
> Oops my bad. I actually meant
> While indexing A,B
> A and B should give result but
> "A B" should not give result.
>
> Also I will look at analyser.
>
> Thanks
> Abhishek
>
>   Original Message
> From: Erick Erickson
> Sent: Monday, 10 March 2014 01:38
> To: abhishek jain
> Subject: Re: Which Tokenizer to use at searching
>
> Then I don't see the problem. StandardTokenizer
> (see the "text_general" fieldType) should do all this
> for you automatically.
>
> Did you look at the analysis page? I really recommend it.
>
> Best,
> Erick
>
> On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain
> <abhishek.netj...@gmail.com> wrote:
> > Hi Erick,
> > Thanks for replying,
> >
> > I want to index A,B (with or without space with comma) as separate words
> and
> > also want to return results when A and B searched individually and also
> > "A,B" .
> >
> > Please let me know your views.
> > Let me know if i still havent explained correctly. I will try again.
> >
> > Thanks
> > abhishek
> >
> >
> > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >>
> >> You've contradicted yourself, so it's hard to say. Or
> >> I'm mis-reading your messages.
> >>
> >> bq: During indexing i want to token on all punctuations, so i can use
> >> StandardTokenizer, but at search time i want to consider punctuations as
> >> part of text,
> >>
> >> and in your second message:
> >>
> >> bq: when i search for "A,B" it should return result. [for input "A,B"]
> >>
> >> If, indeed, you "... at search time i want to consider punctuations as
> >> part of text" then "A,B" should NOT match the document.
> >>
> >> The admin/analysis page is your friend, I strongly suggest you spend
> >> some time looking at the various transformations performed by
> >> the various analyzers and tokenizers.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
> >> <abhishek.netj...@gmail.com> wrote:
> >> > hi,
> >> >
> >> > Thanks for replying promptly,
> >> > an example:
> >> >
> >> > I want to index for A,B
> >> > but when i search A AND B, it should return result,
> >> > when i search for "A,B" it should return result.
> >> >
> >> > Also Ideally when i search for "A , B" (with space) it should return
> >> > result.
> >> >
> >> >
> >> > please advice
> >> > thanks
> >> > abhishek
> >> >
> >> >
> >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI
> >> > <furkankam...@gmail.com>wrote:
> >> >
> >> >> Hi;
> >> >>
> >> >> Firstly you have to keep in mind that if you don't index punctuation
> >> >> they
> >> >> will not be visible for search. On the other hand you can have
> >> >> different
> >> >> analyzer for index and search. You have to give more detail about
> your
> >> >> situation. What will be your tokenizer at search time,
> >> >> WhiteSpaceTokenizer?
> >> >> You can have a look at here:
> >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >> >>
> >> >> If you can give some examples what you want for indexing and
> searching
> >> >> I
> >> >> can help you to combine index and search analyzer/tokenizer/token
> >> >> filters.
> >> >>
> >> >> Thanks;
> >> >> Furkan KAMACI
> >> >>
> >> >>
> >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain <abhishek.netj...@gmail.com
> >:
> >> >>
> >> >> > Hi Friends,
> >> >> >
> >> >> > I am concerned on Tokenizer, my scenario is:
> >> >> >
> >> >> > During indexing i want to token on all punctuations, so i can use
> >> >> > StandardTokenizer, but at search time i want to consider
> punctuations
> >> >> > as
> >> >> > part of text,
> >> >> >
> >> >> > I dont store contents but only indexes.
> >> >> >
> >> >> > What should i use.
> >> >> >
> >> >> > Any advices ?
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thanks and kind Regards,
> >> >> > Abhishek jain
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and kind Regards,
> >> > Abhishek jain
> >> > +91 9971376767
> >
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767
>



-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

Re: Which Tokenizer to use at searching

Reply via email to