Re: Local resilience for Artemis

Clebert Suconic Sat, 13 Mar 2021 11:01:03 -0800

@David please reach to me/us if you hit any issues on the new
functionality.


On Thu, Mar 11, 2021 at 5:12 PM David Martin <dav...@qoritek.com> wrote:

> Many thanks for the advice Clebert.
>
> I've just had to deal with journal corruption headaches with other
> messaging middleware in the past.
>
> It does seem like an edge case for Artemis and with mirroring now available
> I'll prioritise the DR solution. AMQP is already the protocol used
> throughout.
>
>
> Dave
>
>
> On Thu, Mar 11, 2021, 9:48 PM Clebert Suconic, <clebert.suco...@gmail.com>
> wrote:
>
> > If you are that concerned with losing the journal (which I believe it
> > would be pretty hard to happen),I would recommend you using the
> > Mirror.
> >
> > Note: The Mirror is sending the message as AMQP. so if you send Core,
> > the message will be converted as AMQP through the wire. (AMQP
> > Connection).
> >
> > I have been thinking to embed CoreMessage as AMQP. It would have still
> > some inefficiency crossing the protocol, but it would avoid conversion
> > issues.
> >
> > On Thu, Mar 11, 2021 at 1:31 PM Clebert Suconic
> > <clebert.suco...@gmail.com> wrote:
> > >
> > > The journal getting corrupted could happen in 2 situations:
> > >
> > > - the file system is damaged by the infra structure. (Hardware
> failures,
> > kernel issues ...   etc)
> > > * if you have a reliable file system here.  I’m not sure how concerned
> > you should be.
> > >
> > > - some invalid data in the journal making the broker to fail upon
> > restart.
> > >
> > > I have seen only a handful issues raised like this and as any bug we
> fix
> > them when reported.  I am not aware of any at the moment.
> > >
> > >
> > > So I think it would be considerable safe to do reconnect the POD.
> > >
> > > So a damage in the file system or journal after a failure is IMO a
> > disaster situation. And for that I can only think of the mirror to
> mitigate
> > any of that.
> > >
> > > On Thu, Mar 11, 2021 at 8:53 AM David Martin <dav...@qoritek.com>
> wrote:
> > >>
> > >> Hi,
> > >>
> > >> Looking to host an Artemis cluster in Kubernetes and am not sure how
> to
> > >> achieve full local resilience.  (Clusters for DR and remote
> distribution
> > >> will be added later using the mirroring feature introduced with
> v2.16).
> > >>
> > >> It is configured as 3 active cluster members using static discovery
> > because
> > >> the particular cloud provider does not officially support UDP on its
> > >> managed Kubernetes service network.
> > >>
> > >> There are no backup brokers (active/passive) because the stateful set
> > takes
> > >> care of restarting failed pods immediately.
> > >>
> > >> Each broker has its own networked storage so is resilient in terms of
> > local
> > >> state.
> > >>
> > >> Message redistribution is ON_DEMAND. Publishing is to topics and
> > consuming
> > >> is from durable topic subscription queues.
> > >>
> > >> Publishers and consumers are connecting round-robin with client IP
> > >> affinity/stickiness.
> > >>
> > >> What I'm concerned about is the possibility of journal corruption on
> one
> > >> broker. Publishers and consumers will failover to either of the
> > remaining 2
> > >> brokers which is fine but some data could be lost permanently as
> > follows.
> > >>
> > >> Hypothetically, consider that Publisher 1 is publishing to Broker 1
> and
> > >> Publisher 2 is publishing to Broker 3. Consumer 1 is consuming from
> > Broker
> > >> 2 and Consumer 2 is consuming from Broker 1.   There are more
> consumers
> > and
> > >> publishers but using 2 of each just to illustrate.
> > >>
> > >> Publisher 1 -> Broker 1 -> Broker 2 -> Consumer 1
> > >> Publisher 2 -> Broker 3 -> Broker 2 -> Consumer 1
> > >> Publisher 1 -> Broker 1 -> Consumer 2
> > >> Publisher 2 -> Broker 3 -> Broker 1 -> Consumer 2
> > >>
> > >> This all works very well with full data integrity and good performance
> > :)
> > >>
> > >> However if say Broker 1's journal got corrupted and it went down
> > >> permanently as a result, any data from Publisher 1 which hadn't yet
> been
> > >> distributed to Consumer 1 (via Broker 2) or *particularly* Consumer 2
> > >> (directly) would be lost (unless the journal could be recovered).
> > >>
> > >> Is there some straightforward configuration to avoid or reduce this
> > >> possibility? Perhaps a 4 broker cluster could have affinity for
> > publishers
> > >> on 2 brokers and affinity for consumers on the other 2, somehow?
> > >>
> > >>
> > >> Thanks for any advice you can offer.
> > >>
> > >>
> > >> Dave Martin.
> > >
> > > --
> > > Clebert Suconic
> >
> >
> >
> > --
> > Clebert Suconic
> >
>
-- 
Clebert Suconic

Re: Local resilience for Artemis

Reply via email to