are you boosting your docs?

2011/8/8 Jason Toy <jason...@gmail.com>

> I am trying to test out and compare different sorts and scoring.
>
>  When I use dismax to search for "indie music"
> with: qf=all_lists_text&q="indie+music"&defType=dismax&rows=100
> I see some stuff that seems "irrelevant", meaning in top results I see only
> 1 or 2 mentions of "indie music", but when I look further down the list I
> do
> see other docs that have more occurrences of "indie music".
> So I a want to test by comparing the the different queries versus seeing a
> list of docs ranked specifically by the count of occurrences of the phrase
> "indie music"
>
> On Mon, Aug 8, 2011 at 2:19 PM, Markus Jelsma <markus.jel...@openindex.io
> >wrote:
>
> >
> > > Dismax queries can. But
> > >
> > > sort=termfreq(all_lists_text,'indie+music')
> > >
> > > is not using dismax.  Apparenty termfreq function can not? I am not
> > > familiar with the termfreq function.
> >
> > It simply returns the TF of the given _term_  as it is indexed of the
> > current
> > document.
> >
> > Sorting on TF like this seems strange as by default queries are already
> > sorted
> > that way since TF plays a big role in the final score.
> >
> > >
> > > To understand why you'd need to reindex, you might want to read up on
> how
> > > lucene actually works, to get a basic understanding of how different
> > > indexing choices effect what is possible at query time. Lucene In
> Action
> > > is a pretty good book.
> > >
> > > On 8/8/2011 5:02 PM, Jason Toy wrote:
> > > > Are not  Dismax queries able to search for phrases using the default
> > > > index(which is what I am using?) If I can already do phrase
>  searches,
> > I
> > > > don't understand why I would need to reindex t be able to access
> > phrases
> > > > from a function.
> > > >
> > > > On Mon, Aug 8, 2011 at 1:49 PM, Markus
> > Jelsma<markus.jel...@openindex.io>wrote:
> > > >>> Aelexei, thank you , that does seem to work.
> > > >>>
> > > >>> My sort results seem to be totally wrong though, I'm not sure if
> its
> > > >>> because of my sort function or something else.
> > > >>>
> > > >>> My query consists of:
> > > >>> sort=termfreq(all_lists_text,'indie+music')+desc&q=*:*&rows=100
> > > >>> And I get back 4571232 hits.
> > > >>
> > > >> That's normal, you issue a catch all query. Sorting should work
> but..
> > > >>
> > > >>> All the results don't have the phrase "indie music" anywhere in
> their
> > > >>
> > > >> data.
> > > >>
> > > >>>   Does termfreq not support phrases?
> > > >>
> > > >> No, it is TERM frequency and indie music is not one term. I don't
> know
> > > >> how this function parses your input but it might not understand your
> +
> > > >> escape and
> > > >> think it's one term constisting of exactly that.
> > > >>
> > > >>> If not, how can I sort specifically by termfreq of a phrase?
> > > >>
> > > >> You cannot. What you can do is index multiple terms as one term
> using
> > > >> the shingle filter. Take care, it can significantly increase your
> > index
> > > >> size and
> > > >> number of unique terms.
> > > >>
> > > >>> On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko<
> > > >>>
> > > >>> ale...@superdownloads.com.br>  wrote:
> > > >>>> You can use the standard query parser and pass q=*:*
> > > >>>>
> > > >>>> 2011/8/8 Jason Toy<jason...@gmail.com>
> > > >>>>
> > > >>>>> I am trying to list some data based on a function I run ,
> > > >>>>> specifically  termfreq(post_text,'indie music')  and I am unable
> to
> > > >>
> > > >> do
> > > >>
> > > >>>>> it without passing in data to the q paramater.  Is it possible to
> > get
> > > >>>>> a
> > > >>>>
> > > >>>> sorted
> > > >>>>
> > > >>>>> list without searching for any terms?
> > > >>>>
> > > >>>> --
> > > >>>>
> > > >>>> *Alexei Martchenko* | *CEO* | Superdownloads
> > > >>>> ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
> > > >>>> 5083.1018/5080.3535/5080.3533
> >
>
>
>
> --
> - sent from my mobile
> 6176064373
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

Reply via email to