Originally collapsing was designed with a very small feature set and one
goal in mind: High performance collapsing on high cardinality fields. To
avoid having to compromise on that goal, it was developed as a separate
feature.

The trick in combining grouping and collapsing into one feature, is to do
it in a way that does not hurt the original performance goal of collapse.
Otherwise we'll be back to just have slow grouping.

Perhaps the new API's that are being worked could have a facade over
grouping and collapsing so they would share the same API.







Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Oct 19, 2016 at 6:51 PM, Mike Lissner <
mliss...@michaeljaylissner.com> wrote:

> Hi all,
>
> I've had a rotten day today because of Solr. I want to share my experience
> and perhaps see if we can do something to fix this particular situation in
> the future.
>
> Solr currently has two ways to get grouped results (so far!). You can
> either use Result Grouping or you can use the Collapsing Query Parser.
> Result grouping seems like the obvious way to go. It's well documented, the
> parameters are clear, it doesn't use a bunch of weird syntax (ie,
> {!collapse blah=foo}), and it uses the feature name from SQL (so it comes
> up in Google).
>
> OTOH, if you use faceting with result grouping, which I imagine many people
> do, you get terrible performance. In our case it went from subsecond to
> 10-120 seconds for big queries. Insanely bad.
>
> Collapsing Query Parser looks like a good way forward for us, and we'll be
> investigating that, but it uses the Expand component that our library
> doesn't support, to say nothing of the truly bizarre syntax. So this will
> be a fair amount of effort to switch.
>
> I'm curious if there is anything we can do to clean up this situation. What
> I'd really like to do is:
>
> 1. Put a HUGE warning on the Result Grouping docs directing people away
> from the feature if they plan to use faceting (or perhaps directing them
> away no matter what?)
>
> 2. Work towards eliminating one or the other of these features. They're
> nearly completely compatible, except for their syntax and performance. The
> collapsing query parser apparently was only written because the result
> grouping had such bad performance -- In other words, it doesn't exist to
> provide unique features, it exists to be faster than the old way. Maybe we
> can get rid of one or the other of these, taking the best parts from each
> (syntax from Result Grouping, and performance from Collapse Query Parser)?
>
> Thanks,
>
> Mike
>
> PS -- For some extra context, I want to share some other reasons this is
> frustrating:
>
> 1. I just spent a week upgrading a third-party library so it would support
> grouped results, and another week implementing the feature in our code with
> tests and everything. That was a waste.
> 2. It's hard to notice performance issues until after you deploy to a big
> data environment. This creates a bad situation for users until you detect
> it and revert the new features.
> 3. The documentation *could* say something about the fact that a new
> feature was developed to provide better performance for grouping. It could
> say that using facets with groups is an anti-feature. It says neither.
>
> I only mention these because, like others, I've had a real rough time with
> solr (again), and these are the kinds of seemingly small things that could
> have made all the difference.
>

Reply via email to