Thanks Ben, that's very useful.

My main reason for using Zookeeper was simplicity (one less component
to worry about in the stack).
>From what you've said, that was fools gold so I can see why you've
dropped that option.


On 13 June 2014 20:15, Benjamin Mahler <benjamin.mah...@gmail.com> wrote:
> Dick: Excellent question, the zookeeper backed registry was dropped for a
> few reasons:
>
> (1) Znodes by default have a size limit of 1MB. This means if you're cluster
> grows organically and the set of slaves surpasses 1MB, all subsequent
> storage operations will fail. You would not be able to add slaves to your
> cluster past this point. Compression helps, but does not solve it.
>
> (2) To implement a scalable ZooKeeper backed storage layer, we need to be
> able to partition our data across znodes and perform atomic writes.
>   (a) Partitioning is non-trivial and we don't know of any C++ libraries
> that do this already.
>   (b) To my knowledge, before 3.4.x transactional support was missing and
> applications had to implement two-phase commit [1]. Complex! Even in 3.4.x
> the transactional support seems to limit total transaction data to 1MB, from
> the NOTE in [2].
>
> (3) Alternatively, one can live with a simple, but operationally unfortunate
> implementation outlined in (1). But that means we would at least need to
> provide some tooling to make moving between state backends simple. Doable,
> but implies more work and support.
>
> (4) ZooKeeper is currently the largest source of disruptions to our system
> availability, becoming more reliant on it as a permanent storage backend,
> was a bit worrisome. At Twitter we have had a lot more operational
> experience and confidence with the replicated log as a permanent storage
> backend.
>
> To be clear, there's nothing stopping anyone from wiring up the existing
> ZooKeeper storage implementation in Mesos and providing it as an alternative
> to the replicated log. As soon as we provide two we should have tooling to
> allow people to move between them.
>
> I hope this clarifies things!
>
> [1]
> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_twoPhasedCommit
>
> [2]
> http://zookeeper.apache.org/doc/r3.4.3/api/org/apache/zookeeper/ZooKeeper.html#multi%28java.lang.Iterable%29
>
>
> On Fri, Jun 13, 2014 at 11:04 AM, Jie Yu <yujie....@gmail.com> wrote:
>>>
>>> Largely because of a requirement to bring everything back up in a certain
>>> order
>>
>>
>> I don't think they need to be brought back up in a certain order. You just
>> need to restart all of them. The only requirement is that all masters should
>> be running at 0.19.0.
>>
>>> I'd also be very interested in a zookeeper implementation
>>
>>
>> I think there is an issue with ZK impl. Ben Mahler probably can expand
>> here.
>>
>> - Jie
>>
>>
>> On Fri, Jun 13, 2014 at 12:32 AM, Tom Arnfeld <t...@duedil.com> wrote:
>>>
>>> Hey Dave (and the group),
>>>
>>> I have to say for me it was a little fiddly to upgrade a 0.18.2
>>> cluster to 0.19.0. Largely because of a requirement to bring
>>> everything back up in a certain order (I had to lower the quorum count
>>> to 1) otherwise mesos failed to get a majority vote to initialise the
>>> log (I had 3 masters).
>>>
>>> I'd also be very interested in a zookeeper implementation - and
>>> perhaps some improved documentation around the log.
>>>
>>> Cheers,
>>>
>>> Tom.
>>>
>>> > On 13 Jun 2014, at 08:17, Dick Davies <d...@hellooperator.net> wrote:
>>> >
>>> > I thought I read that there was going to be a registry implementation
>>> > backed by zookeeper;
>>> > does anyone know why that was dropped?
>>> >
>>> > Really excited to see the containerizer features rolling in, but the
>>> > quorum looks at first glance
>>> > to make Mesos a little harder to operate
>>> > ("This means adding or removing masters must be done carefully! ") - I
>>> > understand the
>>> > benefits but was hoping we could get by with the zookeeper registry.
>>> >
>>> >
>>> >> On 13 June 2014 03:49, Dave Lester <daveles...@gmail.com> wrote:
>>> >> Hi All,
>>> >>
>>> >> Below is a blog post that Ben Mahler wrote as release manager for
>>> >> Mesos
>>> >> 0.19.0; it was published on the Mesos site today.
>>> >>
>>> >> I know that not everyone follows @ApacheMesos Twitter (even though you
>>> >> should!), so I wanted to make sure was also shared on the user@ list.
>>> >>
>>> >> Cheers,
>>> >> Dave
>>> >>
>>> >>
>>> >> Apache Mesos 0.19.0 Released
>>> >>
>>> >> The latest Mesos release, 0.19.0 is now available for download. This
>>> >> new
>>> >> version includes the following features and improvements:
>>> >>
>>> >> The master now persists the list of registered slaves in a durable
>>> >> replicated manner using the Registrar and the replicated log.
>>> >> Alpha support for custom container technologies has been added with
>>> >> the
>>> >> ExternalContainerizer.
>>> >> Metrics reporting has been overhauled and is now exposed on
>>> >> <ip:port>/metrics/snapshot.
>>> >> Slave Authentication: optionally, only authenticated slaves can
>>> >> register
>>> >> with the master.
>>> >> Numerous bug fixes and stability improvements.
>>> >>
>>> >> Full release notes are available on JIRA.
>>> >>
>>> >> Registrar
>>> >>
>>> >> Mesos 0.19.0 introduces the “Registrar”: the master now persists the
>>> >> list of
>>> >> registered slaves in a durable replicated manner. The previous lack of
>>> >> durable state was an intentional design decision that simplified
>>> >> failover
>>> >> and allowed masters to be run and migrated with ease. However, the
>>> >> stateless
>>> >> design had issues:
>>> >>
>>> >> In the event of a dual failure (slave fails while master is down), no
>>> >> lost
>>> >> task notifications are sent. This leads to a task running according to
>>> >> the
>>> >> framework but unknown to Mesos.
>>> >> When a new master is elected, we may allow rogue slaves to re-register
>>> >> with
>>> >> the master. This leads to tasks running on the slave that are not
>>> >> known to
>>> >> the framework.
>>> >>
>>> >> Persisting the list of registered slaves allows failed over masters to
>>> >> detect slaves that do not re-register, and notify frameworks
>>> >> accordingly. It
>>> >> also allows us to prevent rogue slaves from re-registering;
>>> >> terminating the
>>> >> rogue tasks in the process.
>>> >>
>>> >> The state is persisted using the replicated log (available since
>>> >> 0.9.0).
>>> >>
>>> >> External Containerization
>>> >>
>>> >> As alluded to during the containerization / isolation refactor in
>>> >> 0.18.0,
>>> >> the ExternalContainerizer has landed in this release. This provides
>>> >> alpha
>>> >> level support for custom containerization.
>>> >>
>>> >> Developers can implement their own external containerizers to provide
>>> >> support for custom container technologies. Initial Docker support is
>>> >> now
>>> >> available through some community driven external containerizers:
>>> >> Docker
>>> >> Containerizer for Mesos by Tom Arnfeld and Deimos by Jason Dusek.
>>> >> Please
>>> >> reach out on the mailing lists with questions!
>>> >>
>>> >> Metrics
>>> >>
>>> >> Previously, Mesos components had to use custom metrics code and custom
>>> >> HTTP
>>> >> endpoints for exposing metrics. This made it difficult to expose
>>> >> additional
>>> >> system metrics and often required having an endpoint for each
>>> >> libprocess
>>> >> Process (Actor) for which metrics were desired. Having metrics spread
>>> >> across
>>> >> endpoints was operationally complex.
>>> >>
>>> >> We needed a consistent, simple, and global way to expose metrics,
>>> >> which led
>>> >> to the creation of a metrics library within libprocess. All metrics
>>> >> are now
>>> >> exposed via /metrics/snapshot. The /stats.json endpoint remains for
>>> >> backwards compatibility.
>>> >>
>>> >> Upgrading
>>> >>
>>> >> For backwards compatibility, the “Registrar” will be enabled in a
>>> >> phased
>>> >> manner. By default, the “Registrar” is write-only in 0.19.0 and will
>>> >> be
>>> >> read/write in 0.20.0.
>>> >>
>>> >> If running in high-availability mode with ZooKeeper, operators must
>>> >> now
>>> >> specify the --work_dir for the master, along with the --quorum size of
>>> >> the
>>> >> ensemble of masters. This means adding or removing masters must be
>>> >> done
>>> >> carefully! The best practice is to only ever add or remove a single
>>> >> master
>>> >> at a time and to allow a small amount of time for the replicated log
>>> >> to
>>> >> catch up on the new master. Maintenance documentation will be added to
>>> >> reflect this.
>>> >>
>>> >> Please refer to the upgrades document, which details how to perform an
>>> >> upgrade from 0.18.x.
>>> >>
>>> >> Future Work
>>> >>
>>> >> Thanks to the Registrar, reconciliation primitives can now be provided
>>> >> to
>>> >> ensure that the state of tasks between Mesos and frameworks is kept
>>> >> consistent. This will remove the need for frameworks to implement
>>> >> out-of-band task reconciliation to inspect the state of slaves.
>>> >> Reconciliation work is being tracked at MESOS-1407.
>>> >>
>>> >> The addition of state through the Registrar opens up a rich set of
>>> >> possible
>>> >> features that were previously not possible due to the lack of
>>> >> persistent
>>> >> state in the master. These include:
>>> >>
>>> >> Cluster maintenance primitives (MESOS-1474)
>>> >> Repair automation (MESOS-695)
>>> >> Global resource reservations
>>> >>
>>> >> Getting Involved
>>> >>
>>> >> We encourage you to try out this release, and let us know what you
>>> >> think and
>>> >> if you hit any issues on the user mailing list. You can also get in
>>> >> touch
>>> >> with us via @ApacheMesos or via mailing lists and IRC.
>>> >>
>>> >> Thanks
>>> >>
>>> >> Thanks to the 32 contributors who made 0.19.0 possible:
>>> >>
>>> >> Ashutosh Jain, Adam B, Alexandra Sava, Anton Lindström, Archana
>>> >> kumari,
>>> >> Benjamin Hindman, Benjamin Mahler, Bernardo Gomez Palacio, Bernd
>>> >> Mathiske,
>>> >> Charlie Carson, Chengwei Yang, Chi Zhang, Dave Lester, Dominic Hamon,
>>> >> Ian
>>> >> Downes, Isabel Jimenez, Jake Farrell, Jameel, Al-Aziz, Jiang Yan Xu,
>>> >> Jie Yu,
>>> >> Nikita Vetoshkin, Niklas Q. Nielsen, Ritwik Yadav, Sam Taha, Steven
>>> >> Phung,
>>> >> Till Toenshoff, Timothy St. Clair, Tobi Knaup, Tom Arnfeld, Tom
>>> >> Galloway,
>>> >> Vinod Kone, Vinson Lee
>>
>>
>

Reply via email to