The MLT search component is enabled using &mlt=true and works on any normal Solr query. It gives a batch of similar documents for each search result of the original query, one batch per original query result. It uses the &mlt.count=n parameter to control how many similar results to return for each original query result.

The MLT request handler is a standalone request handler that does a query, takes the first result, and then returns one batch of documents that are similar to that one document. You have to configure the handler yourself, but typically it would have the name "/mlt", so you would write:

http://10.0.0.1:8080/solr/mlt/?q=shoes&rows=3

It will show you both the single document from the original query and then the batch of documents that are most similar to the top terms from that one original document.

Add &debugQuery=true or &debug=query or &debug=results to see the terms that are used in the secondary queries that find the similar documents.

There are a bunch a parameters that you have to tune for either approach.

-- Jack Krupansky

-----Original Message----- From: David Parks
Sent: Thursday, January 03, 2013 4:11 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

I'm not seeing the results I would expect. In the previous email below it's
stated that the "MLT search component" returns N results and K similar
documents per EACH of the N results.

If I'm not mistaken I access the "MLT search component" via a query to
/solr/select/?qt=mlt, such as this:

http://10.0.0.1:8080/solr/select/?qt=mlt&terms=true&q=shoes&rows=3

The query above for a simple term such as "shoes" can return many documents.
But I limited the results to 3, and I see 3 results, and the results don't
appear to me any different than doing this query:

http://107.23.102.164:8080/solr/select/?q=shoes&rows=3

So that suggests to me that solr maybe isn't handing things off to the MLT
component as expected (I don't know what results to expect so it's hard for
me to know where I'm trying to get to).

So add in a debugQuery=on parameter and I see this, possibly useful
reference:

<str name="QParser">LuceneQParser</str>

It also appears that the MoreLikeThisComponent did indeed run

<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">

So maybe I should ask exactly what results I should be expecting here?

Thanks very much!
David


-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Friday, December 28, 2012 8:13 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis supporting multiple document IDs as input?

Try a query that returns multiple results and you will see the difference.

MLT search component: n results, k similar documents per EACH of the n
results

MLT request handler: only FIRST result is examined, so only k similar
documents for that ONE (first) TOP search result.

Are you really saying that you don't comprehend what the difference is, or
simply that you don't LIKE the difference?! Or, maybe that you are wondering
WHY they are different? That latter question I don't have the answer to.

-- Jack Krupansky

-----Original Message-----
From: David Parks
Sent: Friday, December 28, 2012 2:48 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

So the Search Components are executed in series an _every_ request. I
presume then that they look at the request parameters and decide what and
whether to take action.

So in the case of the MLT component this was said:

The MLT search component returns similar documents for each of the
documents in the search results, but processes each search result base
document one at a time and keeps its similar documents segregated by
each of the base documents.

So what I think I understand is that the Query Component (presumably this
guy: org.apache.solr.handler.component.QueryComponent) takes the input from
the "q" parameter and returns a result (the "q=id:123456" ensure that the
Query Component will return just this one document).

The MltComponent then looks at the result from the QueryComponent and
generates its results.

The part that is still confusing is understanding the difference between
these two comments:

- The MLT search component returns similar documents for each of the
documents in the search results
- The MLT handler returns similar documents only for the first document that
the query matches.



-----Original Message-----
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
Sent: Friday, December 28, 2012 1:26 PM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

Hi Dave,

Think of search components as a chain of Java classes that get executed
during each search request. If you open solrconfig.xml you will see how they
are defined and used.

HTH

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, "David Parks" <davidpark...@yahoo.com> wrote:

I'm somewhat new to Solr (it's running, I've been through the books,
but I'm no master). What I hear you say is that MLT *can* accept, say
5, documents and provide results, but the results would essentially be
the same as running the query 5 times for each document?

If that's the case, I might accept it. I would just have to merge them
together at the end (perhaps I'd take the top 2 of each result, for
example).

Being somewhat new I'm a little confused by the difference between a
"Search Component" and a "Handler". I've got the /mlt handler working
and I'm using that. But how's that different from a "Search
Component"? Is that referring to the default /solr/select?q="..."
style query?

And if what I said about multiple documents above is correct, what's
the syntax to try that out?

Thanks very much for the great help!
Dave


-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Wednesday, December 26, 2012 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis supporting multiple document IDs as input?

MLT has both a request handler and a search component.

The MLT handler returns similar documents only for the first document
that the query matches.

The MLT search component returns similar documents for each of the
documents in the search results, but processes each search result base
document one at a time and keeps its similar documents segregated by
each of the base documents.

It sounds like you wanted to merge the base search results and then
find documents similar to that merged super-document. Is that what you
were really seeking, as opposed to what the MLT component does?
Unfortunately, you can't do that with the components as they are.

You would have to manually merge the values from the base documents
and then you could POST that text back to the MLT handler and find
similar documents using the posted text rather than a query. Kind of
messy, but in theory that should work.

-- Jack Krupansky

-----Original Message-----
From: David Parks
Sent: Tuesday, December 25, 2012 5:04 AM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis supporting multiple document IDs as input?

I'm unclear on this point from the documentation. Is it possible to
give Solr X # of document IDs and tell it that I want documents
similar to those X documents?

Example:

  - The user is browsing 5 different articles
  - I send Solr the IDs of these 5 articles so I can present the user
other similar articles

I see this example for sending it 1 document ID:
http://localhost:8080/solr/select/?qt=mlt&q=id:[document
id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10

But can I send it 2+ document IDs as the query?


Reply via email to