Re: Improving performance for use-case where large (200) number of phrase queries are used?

Peter Keegan Wed, 24 Oct 2012 10:21:00 -0700

Could you index your 'phrase tags' as single tokens? Then your phrase
queries become simple TermQuerys.


On Wed, Oct 24, 2012 at 12:26 PM, Robert Muir <rcm...@gmail.com> wrote:

> On Wed, Oct 24, 2012 at 11:09 AM, Aaron Daubman <daub...@gmail.com> wrote:
> > Greetings,
> >
> > We have a solr instance in use that gets some perhaps atypical queries
> > and suffers from poor (>2 second) QTimes.
> >
> > Documents (~2,350,000) in this instance are mainly comprised of
> > various "descriptive fields", such as multi-word (phrase) tags - an
> > average document contains 200-400 phrases like this across several
> > different multi-valued field types.
> >
> > A custom QueryComponent has been built that functions somewhat like a
> > very specific MoreLikeThis. A seed document is specified via the
> > incoming query, its terms are retrieved, boosted both by query
> > parameters as well as fields within the document that specify term
> > weighting, sorted by this custom boosting, and then a second query is
> > crafted by taking the top 200 (sorted by the custom boosting)
> > resulting field values paired with their fields and searching for
> > documents matching these 200 values.
>
> a few more ideas:
> * use shingles e.g. to turn two-word phrases into single terms (how
> long is your average phrase?).
> * in addition to the above, maybe for phrases with > 2 terms, consider
> just a boolean conjunction of the shingled phrases instead of a "real"
> phrase query: e.g. "more like this" -> (more_like AND like_this). This
> would have some false positives.
> * use a more aggressive stopwords list for your "MorePhrasesLikeThis".
> * reduce this number 200, and instead work harder to prune out which
> phrases are the "most descriptive" from the seed document, e.g. based
> on some heuristics like their frequency or location within that seed
> document, so your query isnt so massive.
>

Re: Improving performance for use-case where large (200) number of phrase queries are used?

Reply via email to