Re: Related Search

Trey Grainger Wed, 26 Oct 2016 12:15:39 -0700

Yeah, the approach listed by Grant and Markus is a common approach. I've
worked on systems that mined query logs like this, and it's a good approach
if you have sufficient query logs to pull it off.

There are a lot of linguistic nuances you'll encounter along the way,
including how you disambiguate homonyms and their related terms, identify
synonyms/acronyms as having the same underlying meaning, how you parse and
handle unknown phrases, removing noise present in the query logs, and even
how you weight the strength or relationship between related queries. I gave
a presentation on this topic at Lucene/Solr Revolution in 2015 if you're
interested in learning more about how to build such a system (
http://www.treygrainger.com/posts/presentations/leveraging-lucene-solr-as-a-knowledge-graph-and-intent-engine/
).

Another approach (also referenced in the above presentation), for those
with more of a cold-start problem with query logs, is to mine related terms
and phrases out of the underlying content in the search engine (inverted
index) itself. The Semantic Knowledge Graph that was recently open sourced
by CareerBuilder and contributed back to Solr (disclaimer: I worked on it,
and it's available both a Solr plugin and patch, but it's not ready to be
committed into Solr yet.) enables such a capability. See
https://issues.apache.org/jira/browse/SOLR-9480 for the most current patch.

It is a request handler that can take in any query and discover the most
related other terms to that entire query from the inverted index, sorted by
strength of relationship to that query (it can also traverse from those
terms across fields/relationships to other terms, but that's probably
overkill for the basic related searches use case). Think of it as a way to
run a query and find the most relevant other keywords, as opposed to
finding the most relevant documents.

Using this, you can then either return the related keywords as your related
searches, or you can modify your query to include them and power a
conceptual/semantic search instead of the pure text-based search you
started with. It's effectively a (better) way to implement More Like This,
where instead of taking a document and using tf-idf to extract out the
globally-interesting terms from the document (like MLT), you can instead
use a query to find contextually-relevant keywords across many documents,
score them based upon their similarity to the original query, and then turn
around and use the top most semantically-relevant terms as your related
search(es).

I don't have near-term plans to expose the semantic knowledge graph as a
search component (it's a request handler right now), but once it's finished
that could certainly be done. Just wanted to mention it as another approach
to solve this specific problem.

-Trey Grainger
SVP of Engineering @ Lucidworks
Co-author, Solr in Action

On Wed, Oct 26, 2016 at 1:59 PM, Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Indeed, we have similar processes running of which one generates a
> 'related query collection' which just contains a (normalized) query and its
> related queries. I would not know how this is even possible without
> continuously processing query and click logs.
>
> M.
>
>
> -----Original message-----
> > From:Grant Ingersoll <gsing...@apache.org>
> > Sent: Tuesday 25th October 2016 23:51
> > To: solr-user@lucene.apache.org
> > Subject: Re: Related Search
> >
> > Hi Rick,
> >
> > I typically do this stuff just by searching a different collection that I
> > create offline by analyzing query logs and then indexing them and
> searching.
> >
> > On Mon, Oct 24, 2016 at 8:32 PM Rick Leir <rl...@leirtech.com> wrote:
> >
> > > Hi all,
> > >
> > > There is an issue 'Create a Related Search Component' which has been
> > > open for some years now.
> > >
> > > It has a priority: major.
> > >
> > > https://issues.apache.org/jira/browse/SOLR-2080
> > >
> > >
> > > I discovered it linked from Lucidwork's very useful blog on ecommerce:
> > >
> > >
> > > https://lucidworks.com/blog/2011/01/25/implementing-the-
> ecommerce-checklist-with-apache-solr-and-lucidworks/
> > >
> > >
> > > Did people find a better way to accomplish Related Search? Perhaps MLT
> > > http://wiki.apache.org/solr/MoreLikeThis ?
> > >
> > > cheers -- Rick
> > >
> > >
> > >
> >
>

Re: Related Search

Reply via email to