Tom, Thank you for brining this up! I'll create documentation for the replicated log asap. The progress is tracked in this ticket: MESOS-1471
and I think aurora uses the mesos log too?? That's correct. In other words, it's battle tested:) Am I correct in saying frameworks are able to interact with the log to > store state too? Yes! Mesos also has a State <https://github.com/apache/mesos/blob/master/src/state/state.hpp> abstraction which can be backed by many storage options. Replicated log is among one of them. - Jie On Fri, Jun 13, 2014 at 12:49 PM, Tom Arnfeld <t...@duedil.com> wrote: > No worries at all. I think once there’s a more solid base for mesos > documentation in general, it’ll be easier for committers to add new docs > for new features. Fair enough about the launch ordering – I was probably > just a little surprised to see a bunch of warnings about an uninitialised > log and didn’t think about booting them all up (some upgrade notes would > have been useful here). > > Regarding zookeeper, those are some interesting points. Personally, it > doesn’t bother me that mesos has it’s own mechanism for this (and I think > aurora uses the mesos log too??). I think the documentation could go a long > way in exposing that the log exists, and why it’s used for the registry. Am > I correct in saying frameworks are able to interact with the log to store > state too? > > Tom. > > On 13 Jun 2014, at 20:23, Benjamin Mahler <benjamin.mah...@gmail.com> > wrote: > > Tom: Agreed that there needs to be replicated log documentation, I've > chatted with Jie and we'll be working to create some. We'll also work to > create some maintenance related documentation for the masters as it > pertains to the log replicas. > > As Jie mentioned, there is no requirement on bringing masters back up in a > certain order. There is a safety mechanism built in to the replicated log > that ensures that if the majority of your replica state is lost, writes are > prevented. This is why when you first upgrade to the replicated log, all of > the masters in your ensemble need to be up with 0.19.0 to have the replicas > initialize. > > I apologize for all of the tribal knowledge here, we will get some > documentation out there. > > > On Fri, Jun 13, 2014 at 12:15 PM, Benjamin Mahler < > benjamin.mah...@gmail.com> wrote: > >> Dick: Excellent question, the zookeeper backed registry was dropped for a >> few reasons: >> >> (1) Znodes by default have a size limit of 1MB. This means if you're >> cluster grows organically and the set of slaves surpasses 1MB, all >> subsequent storage operations will fail. You would not be able to add >> slaves to your cluster past this point. Compression helps, but does not >> solve it. >> >> (2) To implement a scalable ZooKeeper backed storage layer, we need to be >> able to partition our data across znodes and perform atomic writes. >> (a) Partitioning is non-trivial and we don't know of any C++ libraries >> that do this already. >> (b) To my knowledge, before 3.4.x transactional support was missing and >> applications had to implement two-phase commit [1]. Complex! Even in 3.4.x >> the transactional support seems to limit total transaction data to 1MB, >> from the NOTE in [2]. >> >> (3) Alternatively, one can live with a simple, but operationally >> unfortunate implementation outlined in (1). But that means we would at >> least need to provide some tooling to make moving between state backends >> simple. Doable, but implies more work and support. >> >> (4) ZooKeeper is currently the largest source of disruptions to our >> system availability, becoming more reliant on it as a permanent storage >> backend, was a bit worrisome. At Twitter we have had a lot more operational >> experience and confidence with the replicated log as a *permanent* >> storage backend. >> >> To be clear, there's nothing stopping anyone from wiring up the existing >> ZooKeeper storage implementation in Mesos and providing it as an >> alternative to the replicated log. As soon as we provide two we should have >> tooling to allow people to move between them. >> >> I hope this clarifies things! >> >> [1] >> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_twoPhasedCommit >> >> [2] >> http://zookeeper.apache.org/doc/r3.4.3/api/org/apache/zookeeper/ZooKeeper.html#multi%28java.lang.Iterable%29 >> >> >> On Fri, Jun 13, 2014 at 11:04 AM, Jie Yu <yujie....@gmail.com> wrote: >> >>> Largely because of a requirement to bring everything back up in a >>>> certain order >>> >>> >>> I don't think they need to be brought back up in a certain order. You >>> just need to restart all of them. The only requirement is that all masters >>> should be running at 0.19.0. >>> >>> I'd also be very interested in a zookeeper implementation >>> >>> >>> I think there is an issue with ZK impl. Ben Mahler probably can expand >>> here. >>> >>> - Jie >>> >>> >>> On Fri, Jun 13, 2014 at 12:32 AM, Tom Arnfeld <t...@duedil.com> wrote: >>> >>>> Hey Dave (and the group), >>>> >>>> I have to say for me it was a little fiddly to upgrade a 0.18.2 >>>> cluster to 0.19.0. Largely because of a requirement to bring >>>> everything back up in a certain order (I had to lower the quorum count >>>> to 1) otherwise mesos failed to get a majority vote to initialise the >>>> log (I had 3 masters). >>>> >>>> I'd also be very interested in a zookeeper implementation - and >>>> perhaps some improved documentation around the log. >>>> >>>> Cheers, >>>> >>>> Tom. >>>> >>>> > On 13 Jun 2014, at 08:17, Dick Davies <d...@hellooperator.net> wrote: >>>> > >>>> > I thought I read that there was going to be a registry implementation >>>> > backed by zookeeper; >>>> > does anyone know why that was dropped? >>>> > >>>> > Really excited to see the containerizer features rolling in, but the >>>> > quorum looks at first glance >>>> > to make Mesos a little harder to operate >>>> > ("This means adding or removing masters must be done carefully! ") - I >>>> > understand the >>>> > benefits but was hoping we could get by with the zookeeper registry. >>>> > >>>> > >>>> >> On 13 June 2014 03:49, Dave Lester <daveles...@gmail.com> wrote: >>>> >> Hi All, >>>> >> >>>> >> Below is a blog post that Ben Mahler wrote as release manager for >>>> Mesos >>>> >> 0.19.0; it was published on the Mesos site today. >>>> >> >>>> >> I know that not everyone follows @ApacheMesos Twitter (even though >>>> you >>>> >> should!), so I wanted to make sure was also shared on the user@ >>>> list. >>>> >> >>>> >> Cheers, >>>> >> Dave >>>> >> >>>> >> >>>> >> Apache Mesos 0.19.0 Released >>>> >> >>>> >> The latest Mesos release, 0.19.0 is now available for download. This >>>> new >>>> >> version includes the following features and improvements: >>>> >> >>>> >> The master now persists the list of registered slaves in a durable >>>> >> replicated manner using the Registrar and the replicated log. >>>> >> Alpha support for custom container technologies has been added with >>>> the >>>> >> ExternalContainerizer. >>>> >> Metrics reporting has been overhauled and is now exposed on >>>> >> <ip:port>/metrics/snapshot. >>>> >> Slave Authentication: optionally, only authenticated slaves can >>>> register >>>> >> with the master. >>>> >> Numerous bug fixes and stability improvements. >>>> >> >>>> >> Full release notes are available on JIRA. >>>> >> >>>> >> Registrar >>>> >> >>>> >> Mesos 0.19.0 introduces the “Registrar”: the master now persists the >>>> list of >>>> >> registered slaves in a durable replicated manner. The previous lack >>>> of >>>> >> durable state was an intentional design decision that simplified >>>> failover >>>> >> and allowed masters to be run and migrated with ease. However, the >>>> stateless >>>> >> design had issues: >>>> >> >>>> >> In the event of a dual failure (slave fails while master is down), >>>> no lost >>>> >> task notifications are sent. This leads to a task running according >>>> to the >>>> >> framework but unknown to Mesos. >>>> >> When a new master is elected, we may allow rogue slaves to >>>> re-register with >>>> >> the master. This leads to tasks running on the slave that are not >>>> known to >>>> >> the framework. >>>> >> >>>> >> Persisting the list of registered slaves allows failed over masters >>>> to >>>> >> detect slaves that do not re-register, and notify frameworks >>>> accordingly. It >>>> >> also allows us to prevent rogue slaves from re-registering; >>>> terminating the >>>> >> rogue tasks in the process. >>>> >> >>>> >> The state is persisted using the replicated log (available since >>>> 0.9.0). >>>> >> >>>> >> External Containerization >>>> >> >>>> >> As alluded to during the containerization / isolation refactor in >>>> 0.18.0, >>>> >> the ExternalContainerizer has landed in this release. This provides >>>> alpha >>>> >> level support for custom containerization. >>>> >> >>>> >> Developers can implement their own external containerizers to provide >>>> >> support for custom container technologies. Initial Docker support is >>>> now >>>> >> available through some community driven external containerizers: >>>> Docker >>>> >> Containerizer for Mesos by Tom Arnfeld and Deimos by Jason Dusek. >>>> Please >>>> >> reach out on the mailing lists with questions! >>>> >> >>>> >> Metrics >>>> >> >>>> >> Previously, Mesos components had to use custom metrics code and >>>> custom HTTP >>>> >> endpoints for exposing metrics. This made it difficult to expose >>>> additional >>>> >> system metrics and often required having an endpoint for each >>>> libprocess >>>> >> Process (Actor) for which metrics were desired. Having metrics >>>> spread across >>>> >> endpoints was operationally complex. >>>> >> >>>> >> We needed a consistent, simple, and global way to expose metrics, >>>> which led >>>> >> to the creation of a metrics library within libprocess. All metrics >>>> are now >>>> >> exposed via /metrics/snapshot. The /stats.json endpoint remains for >>>> >> backwards compatibility. >>>> >> >>>> >> Upgrading >>>> >> >>>> >> For backwards compatibility, the “Registrar” will be enabled in a >>>> phased >>>> >> manner. By default, the “Registrar” is write-only in 0.19.0 and will >>>> be >>>> >> read/write in 0.20.0. >>>> >> >>>> >> If running in high-availability mode with ZooKeeper, operators must >>>> now >>>> >> specify the --work_dir for the master, along with the --quorum size >>>> of the >>>> >> ensemble of masters. This means adding or removing masters must be >>>> done >>>> >> carefully! The best practice is to only ever add or remove a single >>>> master >>>> >> at a time and to allow a small amount of time for the replicated log >>>> to >>>> >> catch up on the new master. Maintenance documentation will be added >>>> to >>>> >> reflect this. >>>> >> >>>> >> Please refer to the upgrades document, which details how to perform >>>> an >>>> >> upgrade from 0.18.x. >>>> >> >>>> >> Future Work >>>> >> >>>> >> Thanks to the Registrar, reconciliation primitives can now be >>>> provided to >>>> >> ensure that the state of tasks between Mesos and frameworks is kept >>>> >> consistent. This will remove the need for frameworks to implement >>>> >> out-of-band task reconciliation to inspect the state of slaves. >>>> >> Reconciliation work is being tracked at MESOS-1407. >>>> >> >>>> >> The addition of state through the Registrar opens up a rich set of >>>> possible >>>> >> features that were previously not possible due to the lack of >>>> persistent >>>> >> state in the master. These include: >>>> >> >>>> >> Cluster maintenance primitives (MESOS-1474) >>>> >> Repair automation (MESOS-695) >>>> >> Global resource reservations >>>> >> >>>> >> Getting Involved >>>> >> >>>> >> We encourage you to try out this release, and let us know what you >>>> think and >>>> >> if you hit any issues on the user mailing list. You can also get in >>>> touch >>>> >> with us via @ApacheMesos or via mailing lists and IRC. >>>> >> >>>> >> Thanks >>>> >> >>>> >> Thanks to the 32 contributors who made 0.19.0 possible: >>>> >> >>>> >> Ashutosh Jain, Adam B, Alexandra Sava, Anton Lindström, Archana >>>> kumari, >>>> >> Benjamin Hindman, Benjamin Mahler, Bernardo Gomez Palacio, Bernd >>>> Mathiske, >>>> >> Charlie Carson, Chengwei Yang, Chi Zhang, Dave Lester, Dominic >>>> Hamon, Ian >>>> >> Downes, Isabel Jimenez, Jake Farrell, Jameel, Al-Aziz, Jiang Yan Xu, >>>> Jie Yu, >>>> >> Nikita Vetoshkin, Niklas Q. Nielsen, Ritwik Yadav, Sam Taha, Steven >>>> Phung, >>>> >> Till Toenshoff, Timothy St. Clair, Tobi Knaup, Tom Arnfeld, Tom >>>> Galloway, >>>> >> Vinod Kone, Vinson Lee >>>> >>> >>> >> > >