Yeah, what does that do anyway, omit both, but not one in particular, and where 
was omitTermFreq all this time, does it make sense?

Not to me at least, so i never tried it and just overridden the similarity in 
place.

M. 
 
-----Original message-----
> From:Alexandre Rafalovitch <arafa...@gmail.com>
> Sent: Thursday 9th February 2017 18:00
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Removing duplicate terms from query
> 
> Would omitTermFreqAndPositions help here? Though that's probably an
> overkill as that disables phrase searches too. I am not sure if it is
> possible to do omitTermFreqAndPositions=true omitPositions=false to
> just skip frequencies.
> 
> Regards,
>    Alex.
> ----
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
> 
> 
> On 9 February 2017 at 11:37, Walter Underwood <wun...@wunderwood.org> wrote:
> > 1. I don’t think this is a good idea. It means that a search for “hey hey 
> > hey” won’t score that document higher.
> >
> > 2. Maybe you want to change how tf is calculated. Ignore multiple 
> > occurrences of a word.
> >
> > I ran into this with the movie title “New York, New York” at Netflix. It 
> > isn’t twice as much about New York, but it needs to be the best match for 
> > the query “new york new york”.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> >> On Feb 9, 2017, at 5:18 AM, Ere Maijala <ere.maij...@helsinki.fi> wrote:
> >>
> >> Thanks Emir.
> >>
> >> I was thinking of something very simple like doing what 
> >> RemoveDuplicatesTokenFilter does but ignoring positions. It would of 
> >> course still be possible to have the same term multiple times, but at 
> >> least the adjacent ones could be deduplicated. The reason I'm not too 
> >> eager to do it in a query preprocessor is that I'd have to essentially 
> >> duplicate functionality of the query analysis chain that contains 
> >> ICUTokenizerFactory, WordDelimiterFilterFactory and whatnot.
> >>
> >> Regards,
> >> Ere
> >>
> >> 9.2.2017, 14.52, Emir Arnautovic kirjoitti:
> >>> Hi Ere,
> >>>
> >>> I don't think that there is such filter. Implementing such filter would
> >>> require looking backward which violates streaming approach of token
> >>> filters and unpredictable memory usage.
> >>>
> >>> I would do it as part of query preprocessor and not necessarily as part
> >>> of Solr.
> >>>
> >>> HTH,
> >>> Emir
> >>>
> >>>
> >>> On 09.02.2017 12:24, Ere Maijala wrote:
> >>>> Hi,
> >>>>
> >>>> I just noticed that while we use RemoveDuplicatesTokenFilter during
> >>>> query time, it will consider term positions and not really do anything
> >>>> e.g. if query is 'term term term'. As far as I can see the term
> >>>> positions make no difference in a simple non-phrase search. Is there a
> >>>> built-in way to deal with this? I know I can write a filter to do
> >>>> this, but I feel like this would be something quite basic to do for
> >>>> the query. And I don't think it's even anything too weird for normal
> >>>> users to do. Just consider e.g. searching for music by title:
> >>>>
> >>>> Hey, hey, hey ; Shivers of pleasure
> >>>>
> >>>> I also verified that at least according to debugQuery=true and
> >>>> anecdotal evicende the search really slows down if you repeat the same
> >>>> term enough.
> >>>>
> >>>> --Ere
> >>>
> >>
> >> --
> >> Ere Maijala
> >> Kansalliskirjasto / The National Library of Finland
> >
> 

Reply via email to