Thanks Ben, that's very useful. My main reason for using Zookeeper was simplicity (one less component to worry about in the stack). >From what you've said, that was fools gold so I can see why you've dropped that option.
On 13 June 2014 20:15, Benjamin Mahler <benjamin.mah...@gmail.com> wrote: > Dick: Excellent question, the zookeeper backed registry was dropped for a > few reasons: > > (1) Znodes by default have a size limit of 1MB. This means if you're cluster > grows organically and the set of slaves surpasses 1MB, all subsequent > storage operations will fail. You would not be able to add slaves to your > cluster past this point. Compression helps, but does not solve it. > > (2) To implement a scalable ZooKeeper backed storage layer, we need to be > able to partition our data across znodes and perform atomic writes. > (a) Partitioning is non-trivial and we don't know of any C++ libraries > that do this already. > (b) To my knowledge, before 3.4.x transactional support was missing and > applications had to implement two-phase commit [1]. Complex! Even in 3.4.x > the transactional support seems to limit total transaction data to 1MB, from > the NOTE in [2]. > > (3) Alternatively, one can live with a simple, but operationally unfortunate > implementation outlined in (1). But that means we would at least need to > provide some tooling to make moving between state backends simple. Doable, > but implies more work and support. > > (4) ZooKeeper is currently the largest source of disruptions to our system > availability, becoming more reliant on it as a permanent storage backend, > was a bit worrisome. At Twitter we have had a lot more operational > experience and confidence with the replicated log as a permanent storage > backend. > > To be clear, there's nothing stopping anyone from wiring up the existing > ZooKeeper storage implementation in Mesos and providing it as an alternative > to the replicated log. As soon as we provide two we should have tooling to > allow people to move between them. > > I hope this clarifies things! > > [1] > http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_twoPhasedCommit > > [2] > http://zookeeper.apache.org/doc/r3.4.3/api/org/apache/zookeeper/ZooKeeper.html#multi%28java.lang.Iterable%29 > > > On Fri, Jun 13, 2014 at 11:04 AM, Jie Yu <yujie....@gmail.com> wrote: >>> >>> Largely because of a requirement to bring everything back up in a certain >>> order >> >> >> I don't think they need to be brought back up in a certain order. You just >> need to restart all of them. The only requirement is that all masters should >> be running at 0.19.0. >> >>> I'd also be very interested in a zookeeper implementation >> >> >> I think there is an issue with ZK impl. Ben Mahler probably can expand >> here. >> >> - Jie >> >> >> On Fri, Jun 13, 2014 at 12:32 AM, Tom Arnfeld <t...@duedil.com> wrote: >>> >>> Hey Dave (and the group), >>> >>> I have to say for me it was a little fiddly to upgrade a 0.18.2 >>> cluster to 0.19.0. Largely because of a requirement to bring >>> everything back up in a certain order (I had to lower the quorum count >>> to 1) otherwise mesos failed to get a majority vote to initialise the >>> log (I had 3 masters). >>> >>> I'd also be very interested in a zookeeper implementation - and >>> perhaps some improved documentation around the log. >>> >>> Cheers, >>> >>> Tom. >>> >>> > On 13 Jun 2014, at 08:17, Dick Davies <d...@hellooperator.net> wrote: >>> > >>> > I thought I read that there was going to be a registry implementation >>> > backed by zookeeper; >>> > does anyone know why that was dropped? >>> > >>> > Really excited to see the containerizer features rolling in, but the >>> > quorum looks at first glance >>> > to make Mesos a little harder to operate >>> > ("This means adding or removing masters must be done carefully! ") - I >>> > understand the >>> > benefits but was hoping we could get by with the zookeeper registry. >>> > >>> > >>> >> On 13 June 2014 03:49, Dave Lester <daveles...@gmail.com> wrote: >>> >> Hi All, >>> >> >>> >> Below is a blog post that Ben Mahler wrote as release manager for >>> >> Mesos >>> >> 0.19.0; it was published on the Mesos site today. >>> >> >>> >> I know that not everyone follows @ApacheMesos Twitter (even though you >>> >> should!), so I wanted to make sure was also shared on the user@ list. >>> >> >>> >> Cheers, >>> >> Dave >>> >> >>> >> >>> >> Apache Mesos 0.19.0 Released >>> >> >>> >> The latest Mesos release, 0.19.0 is now available for download. This >>> >> new >>> >> version includes the following features and improvements: >>> >> >>> >> The master now persists the list of registered slaves in a durable >>> >> replicated manner using the Registrar and the replicated log. >>> >> Alpha support for custom container technologies has been added with >>> >> the >>> >> ExternalContainerizer. >>> >> Metrics reporting has been overhauled and is now exposed on >>> >> <ip:port>/metrics/snapshot. >>> >> Slave Authentication: optionally, only authenticated slaves can >>> >> register >>> >> with the master. >>> >> Numerous bug fixes and stability improvements. >>> >> >>> >> Full release notes are available on JIRA. >>> >> >>> >> Registrar >>> >> >>> >> Mesos 0.19.0 introduces the “Registrar”: the master now persists the >>> >> list of >>> >> registered slaves in a durable replicated manner. The previous lack of >>> >> durable state was an intentional design decision that simplified >>> >> failover >>> >> and allowed masters to be run and migrated with ease. However, the >>> >> stateless >>> >> design had issues: >>> >> >>> >> In the event of a dual failure (slave fails while master is down), no >>> >> lost >>> >> task notifications are sent. This leads to a task running according to >>> >> the >>> >> framework but unknown to Mesos. >>> >> When a new master is elected, we may allow rogue slaves to re-register >>> >> with >>> >> the master. This leads to tasks running on the slave that are not >>> >> known to >>> >> the framework. >>> >> >>> >> Persisting the list of registered slaves allows failed over masters to >>> >> detect slaves that do not re-register, and notify frameworks >>> >> accordingly. It >>> >> also allows us to prevent rogue slaves from re-registering; >>> >> terminating the >>> >> rogue tasks in the process. >>> >> >>> >> The state is persisted using the replicated log (available since >>> >> 0.9.0). >>> >> >>> >> External Containerization >>> >> >>> >> As alluded to during the containerization / isolation refactor in >>> >> 0.18.0, >>> >> the ExternalContainerizer has landed in this release. This provides >>> >> alpha >>> >> level support for custom containerization. >>> >> >>> >> Developers can implement their own external containerizers to provide >>> >> support for custom container technologies. Initial Docker support is >>> >> now >>> >> available through some community driven external containerizers: >>> >> Docker >>> >> Containerizer for Mesos by Tom Arnfeld and Deimos by Jason Dusek. >>> >> Please >>> >> reach out on the mailing lists with questions! >>> >> >>> >> Metrics >>> >> >>> >> Previously, Mesos components had to use custom metrics code and custom >>> >> HTTP >>> >> endpoints for exposing metrics. This made it difficult to expose >>> >> additional >>> >> system metrics and often required having an endpoint for each >>> >> libprocess >>> >> Process (Actor) for which metrics were desired. Having metrics spread >>> >> across >>> >> endpoints was operationally complex. >>> >> >>> >> We needed a consistent, simple, and global way to expose metrics, >>> >> which led >>> >> to the creation of a metrics library within libprocess. All metrics >>> >> are now >>> >> exposed via /metrics/snapshot. The /stats.json endpoint remains for >>> >> backwards compatibility. >>> >> >>> >> Upgrading >>> >> >>> >> For backwards compatibility, the “Registrar” will be enabled in a >>> >> phased >>> >> manner. By default, the “Registrar” is write-only in 0.19.0 and will >>> >> be >>> >> read/write in 0.20.0. >>> >> >>> >> If running in high-availability mode with ZooKeeper, operators must >>> >> now >>> >> specify the --work_dir for the master, along with the --quorum size of >>> >> the >>> >> ensemble of masters. This means adding or removing masters must be >>> >> done >>> >> carefully! The best practice is to only ever add or remove a single >>> >> master >>> >> at a time and to allow a small amount of time for the replicated log >>> >> to >>> >> catch up on the new master. Maintenance documentation will be added to >>> >> reflect this. >>> >> >>> >> Please refer to the upgrades document, which details how to perform an >>> >> upgrade from 0.18.x. >>> >> >>> >> Future Work >>> >> >>> >> Thanks to the Registrar, reconciliation primitives can now be provided >>> >> to >>> >> ensure that the state of tasks between Mesos and frameworks is kept >>> >> consistent. This will remove the need for frameworks to implement >>> >> out-of-band task reconciliation to inspect the state of slaves. >>> >> Reconciliation work is being tracked at MESOS-1407. >>> >> >>> >> The addition of state through the Registrar opens up a rich set of >>> >> possible >>> >> features that were previously not possible due to the lack of >>> >> persistent >>> >> state in the master. These include: >>> >> >>> >> Cluster maintenance primitives (MESOS-1474) >>> >> Repair automation (MESOS-695) >>> >> Global resource reservations >>> >> >>> >> Getting Involved >>> >> >>> >> We encourage you to try out this release, and let us know what you >>> >> think and >>> >> if you hit any issues on the user mailing list. You can also get in >>> >> touch >>> >> with us via @ApacheMesos or via mailing lists and IRC. >>> >> >>> >> Thanks >>> >> >>> >> Thanks to the 32 contributors who made 0.19.0 possible: >>> >> >>> >> Ashutosh Jain, Adam B, Alexandra Sava, Anton Lindström, Archana >>> >> kumari, >>> >> Benjamin Hindman, Benjamin Mahler, Bernardo Gomez Palacio, Bernd >>> >> Mathiske, >>> >> Charlie Carson, Chengwei Yang, Chi Zhang, Dave Lester, Dominic Hamon, >>> >> Ian >>> >> Downes, Isabel Jimenez, Jake Farrell, Jameel, Al-Aziz, Jiang Yan Xu, >>> >> Jie Yu, >>> >> Nikita Vetoshkin, Niklas Q. Nielsen, Ritwik Yadav, Sam Taha, Steven >>> >> Phung, >>> >> Till Toenshoff, Timothy St. Clair, Tobi Knaup, Tom Arnfeld, Tom >>> >> Galloway, >>> >> Vinod Kone, Vinson Lee >> >> >