Originally collapsing was designed with a very small feature set and one goal in mind: High performance collapsing on high cardinality fields. To avoid having to compromise on that goal, it was developed as a separate feature.
The trick in combining grouping and collapsing into one feature, is to do it in a way that does not hurt the original performance goal of collapse. Otherwise we'll be back to just have slow grouping. Perhaps the new API's that are being worked could have a facade over grouping and collapsing so they would share the same API. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Oct 19, 2016 at 6:51 PM, Mike Lissner < mliss...@michaeljaylissner.com> wrote: > Hi all, > > I've had a rotten day today because of Solr. I want to share my experience > and perhaps see if we can do something to fix this particular situation in > the future. > > Solr currently has two ways to get grouped results (so far!). You can > either use Result Grouping or you can use the Collapsing Query Parser. > Result grouping seems like the obvious way to go. It's well documented, the > parameters are clear, it doesn't use a bunch of weird syntax (ie, > {!collapse blah=foo}), and it uses the feature name from SQL (so it comes > up in Google). > > OTOH, if you use faceting with result grouping, which I imagine many people > do, you get terrible performance. In our case it went from subsecond to > 10-120 seconds for big queries. Insanely bad. > > Collapsing Query Parser looks like a good way forward for us, and we'll be > investigating that, but it uses the Expand component that our library > doesn't support, to say nothing of the truly bizarre syntax. So this will > be a fair amount of effort to switch. > > I'm curious if there is anything we can do to clean up this situation. What > I'd really like to do is: > > 1. Put a HUGE warning on the Result Grouping docs directing people away > from the feature if they plan to use faceting (or perhaps directing them > away no matter what?) > > 2. Work towards eliminating one or the other of these features. They're > nearly completely compatible, except for their syntax and performance. The > collapsing query parser apparently was only written because the result > grouping had such bad performance -- In other words, it doesn't exist to > provide unique features, it exists to be faster than the old way. Maybe we > can get rid of one or the other of these, taking the best parts from each > (syntax from Result Grouping, and performance from Collapse Query Parser)? > > Thanks, > > Mike > > PS -- For some extra context, I want to share some other reasons this is > frustrating: > > 1. I just spent a week upgrading a third-party library so it would support > grouped results, and another week implementing the feature in our code with > tests and everything. That was a waste. > 2. It's hard to notice performance issues until after you deploy to a big > data environment. This creates a bad situation for users until you detect > it and revert the new features. > 3. The documentation *could* say something about the fact that a new > feature was developed to provide better performance for grouping. It could > say that using facets with groups is an anti-feature. It says neither. > > I only mention these because, like others, I've had a real rough time with > solr (again), and these are the kinds of seemingly small things that could > have made all the difference. >