I'm having some problems with distributed MLT. On 4.4, it seems completely broken. Searches that work on 4.2.1 return an exception on 4.4.0. This stackoverflow post shows the EarlyTerminatingCollectorException I'm getting:

http://stackoverflow.com/questions/17866313/earlyterminatingcollectorexception-in-mlt-component-of-solr-4-4

In the example query URL below, the tag_id field is my uniqueKey and XXXXX is a valid document id. The ncmain core has the shards parameter in its handler definition. The catchall field is indexed, not stored, and has termvectors. It is a copyField destination for several other fields, one of which is completely ignored:

/solr/ncmain/select?q=tag_id:XXXXX&mlt=true&mlt.fl=catchall&mlt.count=100&q.op=OR



I'm also having some other problems with distributed MLT on 4.2.1:

If the termvectors for the source document contain a parenthesis or other special parser character, that gets put into the distributed queries as-is, which breaks the query parser. That exception results in the parent query returning a NullPointerException.

A successful distributed MLT query takes a really long time. With a source document that's very small, it takes longer than my load balancer's 30 second timeout. An immediate browser refresh got the result back in only a few seconds. One test that I did with a larger document took over five minutes to complete, and only worked when I bypassed the load balancer.

If q.op=AND, which applies to my setup, distributed MLT appears to simply not work at all. It took a really long time for me to figure this one out. This happens because the q.op parameter is also sent with the distributed queries (along with an exclusion for the original document), so nothing matches. I can work around this problem by using q.op=OR with the standard parser or switching to the edismax parser.

I intend to file bugs on these problems, unless someone thinks I'm having these problems due to something I've misconfigured. I know I've not included logs or config info, so if there's anything you'd like to see, let me know.

Thanks,
Shawn

Reply via email to