Re: Edgengram

Brian Lamb Wed, 01 Jun 2011 07:44:25 -0700

Hi Tomás,

Thank you very much for your suggestion. I took another crack at it using
your recommendation and it worked ideally. The only thing I had to change
was


<analyzer type="query">
  <tokenizer class="solr.KeywordTokenizerFactory" />
</analyzer>

to

<analyzer type="query">
  <tokenizer class="solr.LowerCaseTokenizerFactory" />
</analyzer>

The first did not produce any results but the second worked beautifully.

Thanks!

Brian Lamb

2011/5/31 Tomás Fernández Löbbe <tomasflo...@gmail.com>

> ...or also use the LowerCaseTokenizerFactory at query time for consistency,
> but not the edge ngram filter.
>
> 2011/5/31 Tomás Fernández Löbbe <tomasflo...@gmail.com>
>
> > Hi Brian, I don't know if I understand what you are trying to achieve.
> You
> > want the term query "abcdefg" to have an idf of 1 insead of 7? I think
> using
> > the KeywordTokenizerFilterFactory at query time should work. I would be
> > something like:
> >
> > <fieldType name="edgengram" class="solr.TextField"
> > positionIncrementGap="1000">
> >   <analyzer type="index">
> >
> >     <tokenizer class="solr.LowerCaseTokenizerFactory" />
> >     <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> > maxGramSize="25" side="front" />
> >   </analyzer>
> >   <analyzer type="query">
> >   <tokenizer class="solr.KeywordTokenizerFactory" />
> >   </analyzer>
> > </fieldType>
> >
> > this way, at query time "abcdefg" won't be turned to "a ab abc abcd abcde
> > abcdef abcdefg". At index time it will.
> >
> > Regards,
> > Tomás
> >
> >
> > On Tue, May 31, 2011 at 1:07 PM, Brian Lamb <
> brian.l...@journalexperts.com
> > > wrote:
> >
> >> <fieldType name="edgengram" class="solr.TextField"
> >> positionIncrementGap="1000">
> >>   <analyzer>
> >>     <tokenizer class="solr.LowerCaseTokenizerFactory" />
> >>     <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> >> maxGramSize="25" side="front" />
> >>   </analyzer>
> >> </fieldType>
> >>
> >> I believe I used that link when I initially set up the field and it
> worked
> >> great (and I'm still using it in other places). In this particular
> example
> >> however it does not appear to be practical for me. I mentioned that I
> have
> >> a
> >> similarity class that returns 1 for the idf and in the case of an
> >> edgengram,
> >> it returns 1 * length of the search string.
> >>
> >> Thanks,
> >>
> >> Brian Lamb
> >>
> >> On Tue, May 31, 2011 at 11:34 AM, bmdakshinamur...@gmail.com <
> >> bmdakshinamur...@gmail.com> wrote:
> >>
> >> > Can you specify the analyzer you are using for your queries?
> >> >
> >> > May be you could use a KeywordAnalyzer for your queries so you don't
> end
> >> up
> >> > matching parts of your query.
> >> >
> >> >
> >>
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
> >> > This should help you.
> >> >
> >> > On Tue, May 31, 2011 at 8:24 PM, Brian Lamb
> >> > <brian.l...@journalexperts.com>wrote:
> >> >
> >> > > In this particular case, I will be doing a solr search based on user
> >> > > preferences. So I will not be depending on the user to type
> "abcdefg".
> >> > That
> >> > > will be automatically generated based on user selections.
> >> > >
> >> > > The contents of the field do not contain spaces and since I am
> created
> >> > the
> >> > > search parameters, case isn't important either.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Brian Lamb
> >> > >
> >> > > On Tue, May 31, 2011 at 9:44 AM, Erick Erickson <
> >> erickerick...@gmail.com
> >> > > >wrote:
> >> > >
> >> > > > That'll work for your case, although be aware that string types
> >> aren't
> >> > > > analyzed at all,
> >> > > > so case matters, as do spaces etc.....
> >> > > >
> >> > > > What is the use-case here? If you explain it a bit there might be
> >> > > > better answers....
> >> > > >
> >> > > > Best
> >> > > > Erick
> >> > > >
> >> > > > On Fri, May 27, 2011 at 9:17 AM, Brian Lamb
> >> > > > <brian.l...@journalexperts.com> wrote:
> >> > > > > For this, I ended up just changing it to string and using
> >> "abcdefg*"
> >> > to
> >> > > > > match. That seems to work so far.
> >> > > > >
> >> > > > > Thanks,
> >> > > > >
> >> > > > > Brian Lamb
> >> > > > >
> >> > > > > On Wed, May 25, 2011 at 4:53 PM, Brian Lamb
> >> > > > > <brian.l...@journalexperts.com>wrote:
> >> > > > >
> >> > > > >> Hi all,
> >> > > > >>
> >> > > > >> I'm running into some confusion with the way edgengram works. I
> >> have
> >> > > the
> >> > > > >> field set up as:
> >> > > > >>
> >> > > > >> <fieldType name="edgengram" class="solr.TextField"
> >> > > > >> positionIncrementGap="1000">
> >> > > > >>    <analyzer>
> >> > > > >>      <tokenizer class="solr.LowerCaseTokenizerFactory" />
> >> > > > >>        <filter class="solr.EdgeNGramFilterFactory"
> >> minGramSize="1"
> >> > > > >> maxGramSize="100" side="front" />
> >> > > > >>    </analyzer>
> >> > > > >> </fieldType>
> >> > > > >>
> >> > > > >> I've also set up my own similarity class that returns 1 as the
> >> idf
> >> > > > score.
> >> > > > >> What I've found this does is if I match a string "abcdefg"
> >> against a
> >> > > > field
> >> > > > >> containing "abcdefghijklmnop", then the idf will score that as
> a
> >> 7:
> >> > > > >>
> >> > > > >> 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2
> >> > abcdefg=2)
> >> > > > >>
> >> > > > >> I get why that's happening, but is there a way to avoid that?
> Do
> >> I
> >> > > need
> >> > > > to
> >> > > > >> do a new field type to achieve the desired affect?
> >> > > > >>
> >> > > > >> Thanks,
> >> > > > >>
> >> > > > >> Brian Lamb
> >> > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and Regards,
> >> > DakshinaMurthy BM
> >> >
> >>
> >
> >
>

Re: Edgengram

Reply via email to