I'm having some problems with distributed MLT. On 4.4, it seems
completely broken. Searches that work on 4.2.1 return an exception on
4.4.0. This stackoverflow post shows the
EarlyTerminatingCollectorException I'm getting:
http://stackoverflow.com/questions/17866313/earlyterminatingcollectorexception-in-mlt-component-of-solr-4-4
In the example query URL below, the tag_id field is my uniqueKey and
XXXXX is a valid document id. The ncmain core has the shards parameter
in its handler definition. The catchall field is indexed, not stored,
and has termvectors. It is a copyField destination for several other
fields, one of which is completely ignored:
/solr/ncmain/select?q=tag_id:XXXXX&mlt=true&mlt.fl=catchall&mlt.count=100&q.op=OR
I'm also having some other problems with distributed MLT on 4.2.1:
If the termvectors for the source document contain a parenthesis or
other special parser character, that gets put into the distributed
queries as-is, which breaks the query parser. That exception results in
the parent query returning a NullPointerException.
A successful distributed MLT query takes a really long time. With a
source document that's very small, it takes longer than my load
balancer's 30 second timeout. An immediate browser refresh got the
result back in only a few seconds. One test that I did with a larger
document took over five minutes to complete, and only worked when I
bypassed the load balancer.
If q.op=AND, which applies to my setup, distributed MLT appears to
simply not work at all. It took a really long time for me to figure
this one out. This happens because the q.op parameter is also sent with
the distributed queries (along with an exclusion for the original
document), so nothing matches. I can work around this problem by using
q.op=OR with the standard parser or switching to the edismax parser.
I intend to file bugs on these problems, unless someone thinks I'm
having these problems due to something I've misconfigured. I know I've
not included logs or config info, so if there's anything you'd like to
see, let me know.
Thanks,
Shawn