Hi all,

I've had a rotten day today because of Solr. I want to share my experience
and perhaps see if we can do something to fix this particular situation in
the future.

Solr currently has two ways to get grouped results (so far!). You can
either use Result Grouping or you can use the Collapsing Query Parser.
Result grouping seems like the obvious way to go. It's well documented, the
parameters are clear, it doesn't use a bunch of weird syntax (ie,
{!collapse blah=foo}), and it uses the feature name from SQL (so it comes
up in Google).

OTOH, if you use faceting with result grouping, which I imagine many people
do, you get terrible performance. In our case it went from subsecond to
10-120 seconds for big queries. Insanely bad.

Collapsing Query Parser looks like a good way forward for us, and we'll be
investigating that, but it uses the Expand component that our library
doesn't support, to say nothing of the truly bizarre syntax. So this will
be a fair amount of effort to switch.

I'm curious if there is anything we can do to clean up this situation. What
I'd really like to do is:

1. Put a HUGE warning on the Result Grouping docs directing people away
from the feature if they plan to use faceting (or perhaps directing them
away no matter what?)

2. Work towards eliminating one or the other of these features. They're
nearly completely compatible, except for their syntax and performance. The
collapsing query parser apparently was only written because the result
grouping had such bad performance -- In other words, it doesn't exist to
provide unique features, it exists to be faster than the old way. Maybe we
can get rid of one or the other of these, taking the best parts from each
(syntax from Result Grouping, and performance from Collapse Query Parser)?

Thanks,

Mike

PS -- For some extra context, I want to share some other reasons this is
frustrating:

1. I just spent a week upgrading a third-party library so it would support
grouped results, and another week implementing the feature in our code with
tests and everything. That was a waste.
2. It's hard to notice performance issues until after you deploy to a big
data environment. This creates a bad situation for users until you detect
it and revert the new features.
3. The documentation *could* say something about the fact that a new
feature was developed to provide better performance for grouping. It could
say that using facets with groups is an anti-feature. It says neither.

I only mention these because, like others, I've had a real rough time with
solr (again), and these are the kinds of seemingly small things that could
have made all the difference.

Reply via email to