It's time to start working on the next major evolution of Solr (much as we did years ago for the SolrCloud effort). To kick things off, I've started a project on github and implemented "off-heap" filters, as a first step toward taking performance to the next level.
For a number of reasons, we felt it best to incubate this project at github, where we could have a community dedicated solely to it's advancement. The plan is to bring it back to the ASF once it has stabilized and gained enough traction. Off-Heap Filters: JVMs have never been good at dealing with large heaps. Large heaps mean the JVM needs to do a lot of garbage collection work, and often means some pretty long stop-the-world GC pauses. Filters (Solr DocSets) stored in the filterCache are now allocated off-heap and reference counted so they can be freed as soon as they are no longer needed. The JVM no longer needs to waste time copying around these potentially long-lived blocks of memory. This should both help eliminate the long GC pauses as well as increase request throughput. Performance Results: I'm still putting together a blog on the results, but they look good! It was pretty trivial to reproduce >1s stop-the-world GC pauses with a 4GB heap, and then see those pauses completely go away when I switched to off-heap filters. Throughput also increased since much less time was spent doing GC. Next major feature: Native Code Optimizations. In addition to moving more large data structures off-heap(like UnInvertedField?), I am planning to implement native code optimizations for certain hotspots. Native code faceting would be an obvious first choice since it can often be a CPU bottleneck. Project resources: https://github.com/Heliosearch/heliosearch https://groups.google.com/forum/#!forum/heliosearch https://groups.google.com/forum/#!forum/heliosearch-dev Freenode IRC: #heliosearch #heliosearch-dev -Yonik