Implementing MoreLikeThis support in DismaxRequestHandler
---------------------------------------------------------

                 Key: SOLR-295
                 URL: https://issues.apache.org/jira/browse/SOLR-295
             Project: Solr
          Issue Type: Improvement
          Components: search
    Affects Versions: 1.3
            Reporter: Pieter Berkel
            Priority: Minor


There's nothing too clever about this initial patch to be upload shortly, I 
have simply extracted the MLT code from the StandardRequestHandler and inserted 
it into the DismaxRequestHandler.  However, there are some broader MLT issues 
that I'd also like to address in the near future:

1) (trivial) No "This response format is experimental" warning when MLT is used 
with StandardRequestHandler (or DismaxRequestHandler).  Not really a big deal 
but at least makes developers aware of the possibility of future changes.

2) (trivial) "org.apache.solr.common.util.MoreLikeThisParams" should perhaps be 
moved to the more appropriate package "org.apache.solr.common.params".

3) (non-trivial) The ability to specify the list of fields that should be 
returned when MLT is invoked from an external handler (i.e. 
StandardRequestHandler).  Currently the field list (FL) parameter is inherited 
from the main query but I can envisage cases where it would be desirable to 
specify more or less return fields in the MLT query than the main query.  One 
complication is that "mlt.fl" is already used to specify the fields used for 
similarity.  Perhaps "mlt.fl" is not the best name for this parameter and 
should be renamed to avoid potential conflict / confusion?

4) (fairly-trivial) On a similar note to 3, there is currently no way to 
specify a "start" value for the rows returned when MLT is invoked from an 
external handler (e.g. StandardRequestHandler), it is hard-coded to 0 (i.e. the 
first "mlt.count" documents matched).  While I can see the logic in naming the 
parameter "mlt.count", it does seem a little inconsistent and perhaps it would 
be better to rename (or at least alias) it to "mlt.rows" to be consistent with 
the CommonQueryParameters.  Note that "mlt.start" is fundamentally different to 
the "mlt.match.offset" parameter as the later deals with documents *matching* 
the initial MLT query while the former deals with documents *returned* by the 
MLT query (hope that makes sense).

I have created a patch that implemented "mlt.start" (to specify the start doc) 
and added "mlt.rows" that could be used interchangeably with "mlt.count" (but I 
would prefer to remove "mlt.count" altogether), but since it involves changing 
the method definition of MoreLikeThisHelper.getMoreLikeThese(), I wanted to get 
some opinions before submitting it.

5) (non-trivial) Interesting Terms - the ability to return interesting term 
information using the "mlt.interestingTerms" parameter when MLT is invoked from 
an external handler.  This is perhaps the most useful feature I am looking to 
implement, I can see great benefit in being able to provide a list of 
interesting terms or "keywords" for each document returned in a standard or 
dismax query.  Currently this only available from the MLT request handler so 
perhaps the best approach would be to re-factor the "interestingTerms" code in 
MoreLikeThisHandler class and put it somewhere in MoreLikeThisHelper so it is 
available to all handlers?  Again, I would appreciate any comments or 
suggestions.

I've also noted the MLT features suggested by Tristan [ 
http://www.nabble.com/MoreLikeThis-with-DisMax-boost-query---functions-tf4047187.html
 ] which could quite possibly be rolled together with the above points -- I'm 
not sure whether is is better to have a single ticket tracking several related 
issues or create invididual tickets for each issue, however will be happy to 
comply with the Solr issue tracking policy on advice from the core developers.

regards,
Pieter


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to