Re: Performance problems with extremely common terms in collection (Solr 7.4)

Ash Ramesh Mon, 08 Apr 2019 00:27:57 -0700

Hi Toke,

Thanks for the prompt reply. I'm glad to hear that this is a common
problem. In regards to stop words, I've been thinking about trying that
out. In our business case, most of these terms are keywords related to
stock photography, therefore it's natural for 'photography' or 'background'
to appear commonly in a document's keyword list. it seems unlikely we can
use the common grams solution with our business case.


Regards,

Ash

On Mon, Apr 8, 2019 at 5:01 PM Toke Eskildsen <t...@kb.dk> wrote:

> On Mon, 2019-04-08 at 09:58 +1000, Ash Ramesh wrote:
> > We have a corpus of 50+ million documents in our collection. I've
> > noticed that some queries with specific keywords tend to be extremely
> > slow.
> > E.g. the q=`photography' or q='background'. After digging into the
> > raw documents, I could see that these two terms appear in greater
> > than 90% of all documents, which means solr has to score each of
> > those documents.
>
> That is known behaviour, which can be remedied somewhat. Stop words is
> a common approach, but your samples does not seem to fit well with
> that. Instead you can look at Common Grams, where your high-frequency
> words gets concatenated with surrounding words. This only works with
> phrases though. There's a nice article at
>
>
> https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2
>
> - Toke Eskildsen, Royal Danish Library
>
>
>

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
<https://product.canva.com/>. ***
** <https://www.canva.com/>Empowering the 
world to design
Also, we're hiring. Apply here! 
<https://about.canva.com/careers/>
 <https://twitter.com/canva> 
<https://facebook.com/canva> <https://au.linkedin.com/company/canva> 
<https://twitter.com/canva>  <https://facebook.com/canva>  
<https://au.linkedin.com/company/canva>  <https://instagram.com/canva>

Re: Performance problems with extremely common terms in collection (Solr 7.4)

Reply via email to