Re: Relevancy, Phrase Boosting, Shingles and Long Tail Curves

2010-09-11 Thread Lance Norskog
t + overallIntersectionPercent) / 2; } *From:* Mark Bennett *To:* dev@lucene.apache.org *Sent:* Fri, 10 September, 2010 18:44:31 *Subject:* Re: Relevancy, Phrase Boosting, Shingles and Long Tail Curves Thanks Mark H, Maybe I'll look at MLT (More Like This) again. I'll also c

Re: Relevancy, Phrase Boosting, Shingles and Long Tail Curves

2010-09-11 Thread mark harwood
tersectionPercent; // so here we take an average of the two: return (termBIntersectionPercent + overallIntersectionPercent) / 2; } From: Mark Bennett To: dev@lucene.apache.org Sent: Fri, 10 September, 2010 18:44:31 Subject: Re: Relevancy

Re: Relevancy, Phrase Boosting, Shingles and Long Tail Curves

2010-09-10 Thread Mark Bennett
x27;s another topic. > BTW, the Luke tool has a "Zipf" plugin that you may find useful in > examining index term distributions in Lucene indexes.. > > Cheers > Mark > > -- > *From:* Mark Bennett > *To:* java-...@lucene.apache.org >

Re: Relevancy, Phrase Boosting, Shingles and Long Tail Curves

2010-09-10 Thread mark harwood
Lucene indexes.. Cheers Mark From: Mark Bennett To: java-...@lucene.apache.org Sent: Fri, 10 September, 2010 1:42:11 Subject: Relevancy, Phrase Boosting, Shingles and Long Tail Curves I want to boost the relevancy of some Question and Answer content. I'm using stop words, Dism

Relevancy, Phrase Boosting, Shingles and Long Tail Curves

2010-09-09 Thread Mark Bennett
I want to boost the relevancy of some Question and Answer content. I'm using stop words, Dismax, and I'm already a fan of Phrase Boosting and have cranked that up a bit. But I'm considering using long Shingles to make use of some of the normally stopped out "junk words" in the content to help relev