In lieu of today's meeting, this is an email update: The 1.8 release process is underway, and it includes a few performance related changes:
- Parallel reads for the v0 API have been extended to all other v0 read only endpoints (e.g. /state-summary, /roles, etc). Whereas in 1.7.0, only /state had the parallel read support. Also, requests are de-duplicated by principal so that we don't perform redundant construction of responses if we know they will be the same. - The allocator performance has improved significantly when quota is in use, benchmarking shows allocation cycle time reduced ~40% for a small size cluster and up to ~70% for larger clusters. - A per-framework (and per-role) override of the global --min_allocatable_resources filter has been introduced. This lets frameworks specify the minimum size of offers they want to see for their roles, and improves scheduling efficiency by reducing the number of offers declined for having insufficient resource quantities. In the resource management area, we're currently working on the following near term items: - Investigating whether we can make some additional performance improvements to the sorters (e.g. incremental sorting). - Finishing the quota limits work, which will allow setting of limits separate from guarantees. - Adding an UPDATE_FRAMEWORK call to allow multi-role frameworks to change their roles without re-subscribing. - Exposing quota consumption via the API and UI (note that we currently expose "allocation", but reservations are also considered against quota consumption!) There's lots more in the medium term, but I'll omit them here unless folks are curious. In the performance area, the following seem like the most pressing short term items to me: - Bring the v0 parallel read functionality to the v1 read-only calls. - Bring v1 endpoint performance closer to v0. Please chime in if there are any questions or comments, Ben