Tom,

Thank you for brining this up! I'll create documentation for the replicated
log asap. The progress is tracked in this ticket: MESOS-1471

and I think aurora uses the mesos log too??


That's correct. In other words, it's battle tested:)

 Am I correct in saying frameworks are able to interact with the log to
> store state too?


Yes! Mesos also has a State
<https://github.com/apache/mesos/blob/master/src/state/state.hpp>
abstraction which can be backed by many storage options. Replicated log is
among one of them.

- Jie


On Fri, Jun 13, 2014 at 12:49 PM, Tom Arnfeld <t...@duedil.com> wrote:

> No worries at all. I think once there’s a more solid base for mesos
> documentation in general, it’ll be easier for committers to add new docs
> for new features. Fair enough about the launch ordering – I was probably
> just a little surprised to see a bunch of warnings about an uninitialised
> log and didn’t think about booting them all up (some upgrade notes would
> have been useful here).
>
> Regarding zookeeper, those are some interesting points. Personally, it
> doesn’t bother me that mesos has it’s own mechanism for this (and I think
> aurora uses the mesos log too??). I think the documentation could go a long
> way in exposing that the log exists, and why it’s used for the registry. Am
> I correct in saying frameworks are able to interact with the log to store
> state too?
>
> Tom.
>
> On 13 Jun 2014, at 20:23, Benjamin Mahler <benjamin.mah...@gmail.com>
> wrote:
>
> Tom: Agreed that there needs to be replicated log documentation, I've
> chatted with Jie and we'll be working to create some. We'll also work to
> create some maintenance related documentation for the masters as it
> pertains to the log replicas.
>
> As Jie mentioned, there is no requirement on bringing masters back up in a
> certain order. There is a safety mechanism built in to the replicated log
> that ensures that if the majority of your replica state is lost, writes are
> prevented. This is why when you first upgrade to the replicated log, all of
> the masters in your ensemble need to be up with 0.19.0 to have the replicas
> initialize.
>
> I apologize for all of the tribal knowledge here, we will get some
> documentation out there.
>
>
> On Fri, Jun 13, 2014 at 12:15 PM, Benjamin Mahler <
> benjamin.mah...@gmail.com> wrote:
>
>> Dick: Excellent question, the zookeeper backed registry was dropped for a
>> few reasons:
>>
>> (1) Znodes by default have a size limit of 1MB. This means if you're
>> cluster grows organically and the set of slaves surpasses 1MB, all
>> subsequent storage operations will fail. You would not be able to add
>> slaves to your cluster past this point. Compression helps, but does not
>> solve it.
>>
>> (2) To implement a scalable ZooKeeper backed storage layer, we need to be
>> able to partition our data across znodes and perform atomic writes.
>>   (a) Partitioning is non-trivial and we don't know of any C++ libraries
>> that do this already.
>>   (b) To my knowledge, before 3.4.x transactional support was missing and
>> applications had to implement two-phase commit [1]. Complex! Even in 3.4.x
>> the transactional support seems to limit total transaction data to 1MB,
>> from the NOTE in [2].
>>
>> (3) Alternatively, one can live with a simple, but operationally
>> unfortunate implementation outlined in (1). But that means we would at
>> least need to provide some tooling to make moving between state backends
>> simple. Doable, but implies more work and support.
>>
>> (4) ZooKeeper is currently the largest source of disruptions to our
>> system availability, becoming more reliant on it as a permanent storage
>> backend, was a bit worrisome. At Twitter we have had a lot more operational
>> experience and confidence with the replicated log as a *permanent*
>> storage backend.
>>
>> To be clear, there's nothing stopping anyone from wiring up the existing
>> ZooKeeper storage implementation in Mesos and providing it as an
>> alternative to the replicated log. As soon as we provide two we should have
>> tooling to allow people to move between them.
>>
>> I hope this clarifies things!
>>
>> [1]
>> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_twoPhasedCommit
>>
>> [2]
>> http://zookeeper.apache.org/doc/r3.4.3/api/org/apache/zookeeper/ZooKeeper.html#multi%28java.lang.Iterable%29
>>
>>
>> On Fri, Jun 13, 2014 at 11:04 AM, Jie Yu <yujie....@gmail.com> wrote:
>>
>>> Largely because of a requirement to bring everything back up in a
>>>> certain order
>>>
>>>
>>> I don't think they need to be brought back up in a certain order. You
>>> just need to restart all of them. The only requirement is that all masters
>>> should be running at 0.19.0.
>>>
>>> I'd also be very interested in a zookeeper implementation
>>>
>>>
>>> I think there is an issue with ZK impl. Ben Mahler probably can expand
>>> here.
>>>
>>> - Jie
>>>
>>>
>>> On Fri, Jun 13, 2014 at 12:32 AM, Tom Arnfeld <t...@duedil.com> wrote:
>>>
>>>> Hey Dave (and the group),
>>>>
>>>> I have to say for me it was a little fiddly to upgrade a 0.18.2
>>>> cluster to 0.19.0. Largely because of a requirement to bring
>>>> everything back up in a certain order (I had to lower the quorum count
>>>> to 1) otherwise mesos failed to get a majority vote to initialise the
>>>> log (I had 3 masters).
>>>>
>>>> I'd also be very interested in a zookeeper implementation - and
>>>> perhaps some improved documentation around the log.
>>>>
>>>> Cheers,
>>>>
>>>> Tom.
>>>>
>>>> > On 13 Jun 2014, at 08:17, Dick Davies <d...@hellooperator.net> wrote:
>>>> >
>>>> > I thought I read that there was going to be a registry implementation
>>>> > backed by zookeeper;
>>>> > does anyone know why that was dropped?
>>>> >
>>>> > Really excited to see the containerizer features rolling in, but the
>>>> > quorum looks at first glance
>>>> > to make Mesos a little harder to operate
>>>> > ("This means adding or removing masters must be done carefully! ") - I
>>>> > understand the
>>>> > benefits but was hoping we could get by with the zookeeper registry.
>>>> >
>>>> >
>>>> >> On 13 June 2014 03:49, Dave Lester <daveles...@gmail.com> wrote:
>>>> >> Hi All,
>>>> >>
>>>> >> Below is a blog post that Ben Mahler wrote as release manager for
>>>> Mesos
>>>> >> 0.19.0; it was published on the Mesos site today.
>>>> >>
>>>> >> I know that not everyone follows @ApacheMesos Twitter (even though
>>>> you
>>>> >> should!), so I wanted to make sure was also shared on the user@
>>>> list.
>>>> >>
>>>> >> Cheers,
>>>> >> Dave
>>>> >>
>>>> >>
>>>> >> Apache Mesos 0.19.0 Released
>>>> >>
>>>> >> The latest Mesos release, 0.19.0 is now available for download. This
>>>> new
>>>> >> version includes the following features and improvements:
>>>> >>
>>>> >> The master now persists the list of registered slaves in a durable
>>>> >> replicated manner using the Registrar and the replicated log.
>>>> >> Alpha support for custom container technologies has been added with
>>>> the
>>>> >> ExternalContainerizer.
>>>> >> Metrics reporting has been overhauled and is now exposed on
>>>> >> <ip:port>/metrics/snapshot.
>>>> >> Slave Authentication: optionally, only authenticated slaves can
>>>> register
>>>> >> with the master.
>>>> >> Numerous bug fixes and stability improvements.
>>>> >>
>>>> >> Full release notes are available on JIRA.
>>>> >>
>>>> >> Registrar
>>>> >>
>>>> >> Mesos 0.19.0 introduces the “Registrar”: the master now persists the
>>>> list of
>>>> >> registered slaves in a durable replicated manner. The previous lack
>>>> of
>>>> >> durable state was an intentional design decision that simplified
>>>> failover
>>>> >> and allowed masters to be run and migrated with ease. However, the
>>>> stateless
>>>> >> design had issues:
>>>> >>
>>>> >> In the event of a dual failure (slave fails while master is down),
>>>> no lost
>>>> >> task notifications are sent. This leads to a task running according
>>>> to the
>>>> >> framework but unknown to Mesos.
>>>> >> When a new master is elected, we may allow rogue slaves to
>>>> re-register with
>>>> >> the master. This leads to tasks running on the slave that are not
>>>> known to
>>>> >> the framework.
>>>> >>
>>>> >> Persisting the list of registered slaves allows failed over masters
>>>> to
>>>> >> detect slaves that do not re-register, and notify frameworks
>>>> accordingly. It
>>>> >> also allows us to prevent rogue slaves from re-registering;
>>>> terminating the
>>>> >> rogue tasks in the process.
>>>> >>
>>>> >> The state is persisted using the replicated log (available since
>>>> 0.9.0).
>>>> >>
>>>> >> External Containerization
>>>> >>
>>>> >> As alluded to during the containerization / isolation refactor in
>>>> 0.18.0,
>>>> >> the ExternalContainerizer has landed in this release. This provides
>>>> alpha
>>>> >> level support for custom containerization.
>>>> >>
>>>> >> Developers can implement their own external containerizers to provide
>>>> >> support for custom container technologies. Initial Docker support is
>>>> now
>>>> >> available through some community driven external containerizers:
>>>> Docker
>>>> >> Containerizer for Mesos by Tom Arnfeld and Deimos by Jason Dusek.
>>>> Please
>>>> >> reach out on the mailing lists with questions!
>>>> >>
>>>> >> Metrics
>>>> >>
>>>> >> Previously, Mesos components had to use custom metrics code and
>>>> custom HTTP
>>>> >> endpoints for exposing metrics. This made it difficult to expose
>>>> additional
>>>> >> system metrics and often required having an endpoint for each
>>>> libprocess
>>>> >> Process (Actor) for which metrics were desired. Having metrics
>>>> spread across
>>>> >> endpoints was operationally complex.
>>>> >>
>>>> >> We needed a consistent, simple, and global way to expose metrics,
>>>> which led
>>>> >> to the creation of a metrics library within libprocess. All metrics
>>>> are now
>>>> >> exposed via /metrics/snapshot. The /stats.json endpoint remains for
>>>> >> backwards compatibility.
>>>> >>
>>>> >> Upgrading
>>>> >>
>>>> >> For backwards compatibility, the “Registrar” will be enabled in a
>>>> phased
>>>> >> manner. By default, the “Registrar” is write-only in 0.19.0 and will
>>>> be
>>>> >> read/write in 0.20.0.
>>>> >>
>>>> >> If running in high-availability mode with ZooKeeper, operators must
>>>> now
>>>> >> specify the --work_dir for the master, along with the --quorum size
>>>> of the
>>>> >> ensemble of masters. This means adding or removing masters must be
>>>> done
>>>> >> carefully! The best practice is to only ever add or remove a single
>>>> master
>>>> >> at a time and to allow a small amount of time for the replicated log
>>>> to
>>>> >> catch up on the new master. Maintenance documentation will be added
>>>> to
>>>> >> reflect this.
>>>> >>
>>>> >> Please refer to the upgrades document, which details how to perform
>>>> an
>>>> >> upgrade from 0.18.x.
>>>> >>
>>>> >> Future Work
>>>> >>
>>>> >> Thanks to the Registrar, reconciliation primitives can now be
>>>> provided to
>>>> >> ensure that the state of tasks between Mesos and frameworks is kept
>>>> >> consistent. This will remove the need for frameworks to implement
>>>> >> out-of-band task reconciliation to inspect the state of slaves.
>>>> >> Reconciliation work is being tracked at MESOS-1407.
>>>> >>
>>>> >> The addition of state through the Registrar opens up a rich set of
>>>> possible
>>>> >> features that were previously not possible due to the lack of
>>>> persistent
>>>> >> state in the master. These include:
>>>> >>
>>>> >> Cluster maintenance primitives (MESOS-1474)
>>>> >> Repair automation (MESOS-695)
>>>> >> Global resource reservations
>>>> >>
>>>> >> Getting Involved
>>>> >>
>>>> >> We encourage you to try out this release, and let us know what you
>>>> think and
>>>> >> if you hit any issues on the user mailing list. You can also get in
>>>> touch
>>>> >> with us via @ApacheMesos or via mailing lists and IRC.
>>>> >>
>>>> >> Thanks
>>>> >>
>>>> >> Thanks to the 32 contributors who made 0.19.0 possible:
>>>> >>
>>>> >> Ashutosh Jain, Adam B, Alexandra Sava, Anton Lindström, Archana
>>>> kumari,
>>>> >> Benjamin Hindman, Benjamin Mahler, Bernardo Gomez Palacio, Bernd
>>>> Mathiske,
>>>> >> Charlie Carson, Chengwei Yang, Chi Zhang, Dave Lester, Dominic
>>>> Hamon, Ian
>>>> >> Downes, Isabel Jimenez, Jake Farrell, Jameel, Al-Aziz, Jiang Yan Xu,
>>>> Jie Yu,
>>>> >> Nikita Vetoshkin, Niklas Q. Nielsen, Ritwik Yadav, Sam Taha, Steven
>>>> Phung,
>>>> >> Till Toenshoff, Timothy St. Clair, Tobi Knaup, Tom Arnfeld, Tom
>>>> Galloway,
>>>> >> Vinod Kone, Vinson Lee
>>>>
>>>
>>>
>>
>
>

Reply via email to