You make some great cases for your architecture. To be clear - Ive been
proselytizing for kafka since I joined this company last year. I think my
largest issue is rethinking some preexisting notions about streaming to
make them work in the kstream universe.

On Fri, Mar 24, 2017 at 6:07 AM, Michael Noll <mich...@confluent.io> wrote:

> > If I understand this correctly: assuming I have a simple aggregator
> > distributed across n-docker instances each instance will _also_ need to
> > support some sort of communications process for allowing access to its
> > statestore (last param from KStream.groupby.aggregate).
>
> Yes.
>
> See
> http://docs.confluent.io/current/streams/developer-
> guide.html#your-application-and-interactive-queries
> .
>
> > - The tombstoning facilities of redis or C* would lend themselves well to
> > implementing a 'true' rolling aggregation
>
> What is a 'true' rolling aggregation, and how could Redis or C* help with
> that in a way that Kafka can't?  (Honest question.)
>
>
> > I get that RocksDB has a small footprint but given the choice of
> > implementing my own RPC / gossip-like process for data sharing and using
> a
> > well tested one (ala C* or redis) I would almost always opt for the
> latter.
> > [...]
> > Just my $0.02. I would love to hear why Im missing the 'big picture'. The
> > kstreams architecture seems rife with potential.
>
> One question is, for example:  can the remote/central DB of your choice
> (Redis, C*) handle as many qps as Kafka/Kafka Streams/RocksDB can handle?
> Over the network?  At the same low latency?  Also, what happens if the
> remote DB is unavailable?  Do you wait and retry?  Discard?  Accept the
> fact that your app's processing latency will now go through the roof?  I
> wrote about some such scenarios at
> https://www.confluent.io/blog/distributed-real-time-joins-
> and-aggregations-on-user-activity-events-using-kafka-streams/
> .
>
> One big advantage (for many use cases, not for) with Kafka/Kafka Streams is
> that you can leverage fault-tolerant *local* state that may also be
> distributed across app instances.  Local state is much more efficient and
> faster when doing stateful processing such as joins or aggregations.  You
> don't need to worry about an external system, whether it's up and running,
> whether its version is still compatible with your app, whether it can scale
> as much as your app/Kafka Streams/Kafka/the volume of your input data.
>
> Also, note that some users have actually opted to run hybrid setups:  Some
> processing output is sent to a remote data store like Cassandra (e.g. via
> Kafka Connect), some processing output is exposed directly through
> interactive queries.  It's not like your forced to pick only one approach.
>
>
> > - Typical microservices would separate storing / retrieving data
>
> I'd rather argue that for microservices you'd oftentimes prefer to *not*
> use a remote DB, and rather do everything inside your microservice whatever
> the microservice needs to do (perhaps we could relax this to "do everything
> in a way that your microservices is in full, exclusive control", i.e. it
> doesn't necessarily need to be *inside*, but arguably it would be better if
> it actually is).
> See e.g. the article
> https://www.confluent.io/blog/data-dichotomy-rethinking-the-
> way-we-treat-data-and-services/
> that lists some of the reasoning behind this school of thinking.  Again,
> YMMV.
>
> Personally, I think there's no simple true/false here.  The decisions
> depend on what you need, what your context is, etc.  Anyways, since you
> already have some opinions for the one side, I wanted to share some food
> for thought for the other side of the argument. :-)
>
> Best,
> Michael
>
>
>
>
>
> On Fri, Mar 24, 2017 at 1:25 PM, Jon Yeargers <jon.yearg...@cedexis.com>
> wrote:
>
> > If I understand this correctly: assuming I have a simple aggregator
> > distributed across n-docker instances each instance will _also_ need to
> > support some sort of communications process for allowing access to its
> > statestore (last param from KStream.groupby.aggregate).
> >
> > How would one go about substituting a separated db (EG redis) for the
> > statestore?
> >
> > Some advantages to decoupling:
> > - It would seem like having a centralized process like this would
> alleviate
> > the need to execute multiple requests for a given kv pair (IE "who has
> this
> > data?" and subsequent requests to retrieve it).
> > - it would take some pressure off of each node to maintain a large disk
> > store
> > - Typical microservices would separate storing / retrieving data
> > - It would raise some eyebrows if a spec called for a mysql/nosql
> instance
> > to be installed with every docker container
> > - The tombstoning facilities of redis or C* would lend themselves well to
> > implementing a 'true' rolling aggregation
> >
> > I get that RocksDB has a small footprint but given the choice of
> > implementing my own RPC / gossip-like process for data sharing and using
> a
> > well tested one (ala C* or redis) I would almost always opt for the
> latter.
> > (Footnote: Our implementations already heavily use redis/memcached for
> > deduplication of kafka messages so it would seem a small step to use the
> > same to store aggregation results.)
> >
> > Just my $0.02. I would love to hear why Im missing the 'big picture'. The
> > kstreams architecture seems rife with potential.
> >
> > On Thu, Mar 23, 2017 at 3:17 PM, Matthias J. Sax <matth...@confluent.io>
> > wrote:
> >
> > > The config does not "do" anything. It's metadata that get's broadcasted
> > > to other Streams instances for IQ feature.
> > >
> > > See this blog post for more details:
> > > https://www.confluent.io/blog/unifying-stream-processing-
> > > and-interactive-queries-in-apache-kafka/
> > >
> > > Happy to answer any follow up question.
> > >
> > >
> > > -Matthias
> > >
> > > On 3/23/17 11:51 AM, Jon Yeargers wrote:
> > > > What does this config param do?
> > > >
> > > > I see it referenced / used in some samples and here (
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 67%3A+Queryable+state+for+Kafka+Streams
> > > > )
> > > >
> > >
> > >
> >
>

Reply via email to