what value the abstraction overhead of
> the current model brings.
>
> Garry
>
> -Original Message-
> From: Yan Fang [mailto:yanfang...@gmail.com]
> Sent: 13 July 2015 19:58
> To: dev@samza.apache.org
> Subject: Re: Thoughts and obesrvations on Samza
&
ngs.
>
> Garry
>
> -----Original Message-
> From: Yan Fang [mailto:yanfang...@gmail.com]
> Sent: 13 July 2015 19:58
> To: dev@samza.apache.org
> Subject: Re: Thoughts and obesrvations on Samza
>
> I am leaning to Jay's fifth approach. It is not radical and gives us some
>From peanut gallery..
I like Yi's proposal in re-scoping the Samza project / code-base as "Stream
Processing as a Service" that will potentially include:
1. A service manager with some REST / Web UI to accept stream processing
jobs in terms of tgz / configs and schedule them as for:
a. partitio
I am leaning to Jay's fifth approach. It is not radical and gives us some
time to see the outcome.
In addition, I would suggest:
1) Keep the SystemConsumer/SystemProducer API. Because current
SystemConsumer/SystemProducer API satisfies the usage (From Joardan, and
even Garry's feedback) and is no
Jay,
I think doing this iteratively in smaller chunks is a better way to go as
new issues arise. As Navina said Kafka is a "stream system" and Samza is a
"stream processor" and those two ideas should be mutually exclusive.
-Jordan
On Mon, Jul 13, 2015 at 10:06 AM, Jay Kreps wrote:
> Hmm, though
Hmm, thought about this more. Maybe this is just too much too quick.
Overall I think there is some enthusiasm for the proposal but it's not
really unanimous enough to make any kind of change this big cleanly. The
board doesn't really like the merging stuff, user's are concerned about
compatibility,
Hey Chris,
Yeah, I'm obviously in favor of this.
The sub-project approach seems the ideal way to take a graceful step in
this direction, so I will ping the board folks and see why they are
discouraged, it would be good to understand that. If we go that route we
would need to do a similar discussi
Hey guys,
There seems to be some confusion in the last few emails: there is no plan
whatsoever to remove YARN support. The change suggested was to move the
partition management out of the YARN app master and rely on Kafka's
partition management. The advantage of this would be to make the vast
majo
Yi,
What you just summarized makes a whole lot more sense to me. Shamelessly
I am looking at this shift as a customer with a production workflow riding
on it so I am looking for some kind of consistency into the future of
Samza. This makes me feel a lot better about it.
Thank you!
On Sun, Ju
I'm afraid I don't agree that we're anywhere near coming to a
consensus, or even that we're all agreeing on what we're discussing.
(I do totally agree that the discussion itself has been awesome both
in tone and content, though).
As Tim brought up and I mentioned, the Board is not big on subprojec
Just to make it explicitly clear what I am proposing, here is a version of
more detailed description:
The fourth option (in addition to what Jakob summarized) we are proposing
is:
- Recharter Samza to “stream processing as a service”
- The current Samza core (the basic transformation API w/ basi
Hi, Chris,
Thanks for sending out this concrete set of points here. I agree w/ all but
have a slight different point view on 8).
My view on this is: instead of sunset Samza as TLP, can we re-charter the
scope of Samza to be the home for "running streaming process as a service"?
My main motivatio
Hey all, just want to chime in before it too late. Been following samza
for a long time, and using it in production for the past 6 months or so.
In no particular order the things I like most about Samza are:
- Yarn support, resiliency of my deployment is paramount. This is why I
use Samza ove
On Sun, Jul 12, 2015 at 8:54 PM, Chris Riccomini wrote:
> Hey all,
>
> I want to start by saying that I'm absolutely thrilled to be a part of this
> community. The amount of level-headed, thoughtful, educated discussion
> that's gone on over the past ~10 days is overwhelming. Wonderful.
>
> It see
That was meant to be "thread" not "threat". lol. :)
On Sun, Jul 12, 2015 at 5:54 PM, Chris Riccomini
wrote:
> Hey all,
>
> I want to start by saying that I'm absolutely thrilled to be a part of
> this community. The amount of level-headed, thoughtful, educated discussion
> that's gone on over th
Hey all,
I want to start by saying that I'm absolutely thrilled to be a part of this
community. The amount of level-headed, thoughtful, educated discussion
that's gone on over the past ~10 days is overwhelming. Wonderful.
It seems like discussion is waning a bit, and we've reached some
conclusion
> > > > >> > > > > >> > repo,
> > > > >> > > > > >> > > but I'm actually not really sure (I can't find a
> > > > definition
> > > > >> > of a
> > > > >
t;> > > > > >> > > streaming" or "kafka streams" or something like that
> > > would
> > > >> > > > actually
> > > >> > > > > >> do a
> > > >> > > > > >> >
t;> > >
> >> >> > > > > >> > > I think if we did that they having naming or branding
> >> like
> >> >> > > "kafka
> >> >> > > > > >> > > streaming" or "kafka streams&quo
> >> > > Fwiw we actually considered this model originally when
>> open
>> >> > > > sourcing
>> >> > > > > >> > Samza,
>> >> > > > > >> > > however at that time Kafka was
; >> > > > not
> > >> > > > > >> to
> > >> > > > > >> > do
> > >> > > > > >> > > it since we felt it would be limiting. From my point of
> > view
> > >> > the
&g
> > wrote:
> >> > > > > >> > >
> >> > > > > >> > >> Hi all,
> >> > > > > >> > >>
> >> > > > > >> > >> Lots of good thoughts here.
> >> > >
aking Samza
>> > > fully
>> > > > > >> > dependent
>> > > > > >> > >> on Kafka acknowledges that the system-independence was
>> never
>> > as
>> > > > > real
>> > > > > >>
gt; > >> > >> with large amounts of state, I think SAMZA-617 would be a
> big
> > > > boon,
> > > > > >> > since
> > > > > >> > >> restoring state off the changelog on every single restart
> is
> > &
t; > > > >> > run
> > > > >> > >> on each container that is part of the job (in which case, how
> > > does
> > > > >> the
> > > > >> > job
> > > > >> > >> submission to the cluster
>> > >> tight coupling between different Apache projects (e.g. Curator
> > and
> > > >> > >> Zookeeper, or Slider and YARN), so I think remaining separate
> > would
> > > >> be
> > > >> > ok.
> > > >> > >
Some thoughts from the peanut gallery...
On Thu, Jul 9, 2015 at 5:14 PM, Martin Kleppmann wrote:
> Thanks Julian for calling out the principle of community over code, which is
> super important. If it was just a matter of code, the Kafka project could
> simply pull in the Samza code (or write a
be
> >>>>>>>>>> (a)
> >>>>>>>>>>>> actually get better alignment in user experience, and (b)
> >>>> express
> >>>>>>>>>> this in
> >>>>>>>>>>&g
;>>>>>> 3. Unify the programming experience so the client and Samza
>>>> api
>>>>>>>> share
>>>>>>>>>>>> config/monitoring/naming/packaging/etc.
>>>>>>>>>>>>
>>>>
> >>> > >> > > processing, (2) we learned that abstracting out the stream
>> > well is
>> > >>> > >> > > basically impossible, (3) we learned it is really hard to
>> > keep the
>> > >>> > two
gt; > >>> > nobody
> > >>> > >> but
> > >>> > >> > >> Kafka actually implements it. (Databus is perhaps an
> > exception,
> > >>> but
> > >>> > >> it
> > >>> > >>
ers won't use anyway feels like it
could be a win.
-Tommy
From: Jay Kreps [j...@confluent.io]
Sent: Tuesday, July 07, 2015 2:35 PM
To: dev@samza.apache.org
Subject: Re: Thoughts and obesrvations on Samza
Hey Roger,
I couldn't agree more. We spe
; >> > denominator.
> >>> > >> > >> For example, would host affinity (SAMZA-617) still be
> possible?
> >>> For
> >>> > >> jobs
> >>> > >> > >> with large amounts of state, I think SAMZA-617 would be a big
>
job to a
>>> > cluster,
>>> > >> is
>>> > >> > the
>>> > >> > >> idea that the instantiation code runs on a client somewhere,
>>> which
>>> > >> then
>>> > >> > >> pokes the necessary endpoi
en different Apache projects (e.g. Curator
>> and
>> > >> > >> Zookeeper, or Slider and YARN), so I think remaining separate
>> would
>> > >> be
>> > >> > ok.
>> > >> > >> Even if Samza is fully dependent on Kafka, there is e
;> > >>
> > >> > >> From a project management perspective, I guess the "new Samza"
> > would
> > >> > have
> > >> > >> to be developed on a branch alongside ongoing maintenance of the
> > >> current
> > >
t more about
> this
> >> if
> >> > >>> you'd be interested. I think Chris and I started with the idea of
> >> "what
> >> > >>> would it take to make Samza a kick-ass ingestion tool" but
> >> ultimately
> >
at's super frustrating. I'd be happy to chat more about
> this
> >> if
> >> > >>> you'd be interested. I think Chris and I started with the idea of
> >> "what
> >> > >>> would it take to make Samza a kick-ass
gt;>>
>> > >>> With regard to your point about slider, I don't necessarily
>> disagree.
>> > >> But I
>> > >>> think getting good YARN support is quite doable and I think we can
>> make
>> > >>> that work w
nologies
> > people
> > >>> want to use (Docker, Kubernetes, various cloud-specific deploy tools,
> > >> etc)
> > >>> I really think it is important to get this right.
> > >>>
> > >>> -Jay
> > >>>
> > >>>
v1 release but I'm not sure it feels
> >> right to
> >>>> launch a v1 then immediately plan to deprecate most of it.
> >>>>
> >>>> From a selfish perspective I have some guys who have started working
> >> with
> >>>> Samza
> > > On Thu, Jul 2, 2015 at 4:17 AM, Garry Turkington <
>> > > > g.turking...@improvedigital.com> wrote:
>> > > >
>> > > >> Hi all,
>> > > >>
>> > > >> I think the question below re does Samza become
; > > >>
>> > > >> I think the question below re does Samza become a sub-project of
>> Kafka
>> > > >> highlights the broader point around migration. Chris mentions
>> Samza's
>> > > >> maturity is heading towards a v1
; > >> Samza and building some new consumers/producers was next up. Sounds
> > like
> > > >> that is absolutely not the direction to go. I need to look into the
> > KIP
> > > in
> > > >> more detail but for me the attractiveness of adding new S
, Elasticsearch is doing fine as a separate project :)
> From: Martin Kleppmann
> Sent: July 6, 2015 1:18:29pm PDT
> To: dev@samza.apache.org
> Subject: Re: Thoughts and obesrvations on Samza
>
> Ok, thanks for the clarifications. Just a few follow-up comments.
>
> - I
t;> the heavy lifting re scale and reliability done for me then it gives
> me
> > all
> > >> the pushing new consumers/producers would. If not then it complicates
> my
> > >> operational deployments.
> > >>
> > >> Which is simila
nts. If there is a generic Kafka
>>>> ingress/egress layer that I can plug a new connector into and have a
>> lot of
>>>> the heavy lifting re scale and reliability done for me then it gives me
>> all
>>>> the pushing new consumers/producers would
p and get a reliable production deployment may still
> >> dominate mailing list traffic, if for different reasons than today.
> >>
> >> Don't get me wrong -- I'm comfortable with making the Samza dependency
> on
> >> Kafka much more explicit and I abso
We may make it much easier for a newcomer to get something running but
> >> having them step up and get a reliable production deployment may still
> >> dominate mailing list traffic, if for different reasons than today.
> >>
> >> Don't get me wrong -- I&
Hi, Guozhang,
{quote}
but I think if we decide to go this
route we'd better do it now than later as the protocol is not officially
"released" yet. This may delay the first release of the new consumer.
{quote}
I totally agree. Given that potential heavy migration cost later, I think
that a slight d
Hi, Gianmarco,
{quote}
However, I think the fundamental operation that Samza, Copycat, and Kafka
consumers should agree upon is "how can I specify in a simple and
transparent way which partitions I want to consume, and how?".
{quote}
I agree that some basic partition distribution mechanism can be
't get me wrong -- I'm comfortable with making the Samza dependency on
>> Kafka much more explicit and I absolutely see the benefits in the
>> reduction of duplication and clashing terminologies/abstractions that
>> Chris/Jay describe. Samza as a library would likely be a very
1. I am neutral to modifying the consumer rebalance protocol to move the
logic pluggable to the client side, but I think if we decide to go this
route we'd better do it now than later as the protocol is not officially
"released" yet. This may delay the first release of the new consumer.
2. I like
Hey Gianmarco,
To your broader point, I agree that having a close alignment with Kafka
would be a great thing in terms of adoption/discoverability/etc. There
areas where I think this matters a lot are:
1. Website and docs: ideally when reading about Kafka you should be able to
find out about Samza
> > > > > > > With regard to your point about slider, I don't necessarily
> > > > > disagree.
> > > > > > > > But I
> > > > > > > > > think getting good YARN support is quite doable and I think
> > we
> > &g
Hi Jay,
Thanks for your answer.
> However a few things have changed since that original design:
> 1. We now have the additional use cases of copycat and Samza
> 2. We now realize that the assignment strategies don't actually necessarily
> ensure each partition is assigned to only one consumer--t
> > > > for
> > > > > > each
> > > > > > > > and they are all a little different so testing is really
> hard.
> > In
> > > > the
> > > > > > > > absence of this we have been stuck with j
ker, Kubernetes, various cloud-specific deploy
> > > tools,
> > > > > > etc)
> > > > > > > I really think it is important to get this right.
> > > > > > >
> > > > > > > -Jay
> > > > > > >
&
of work being put in to slider, marathon,
> > aws
> > >> > > > tooling, not to mention the umpteen related packaging
> technologies
> > >> > people
> > >> > > > want to use (Docker, Kubernetes, various cloud-specific deploy
> > >&
com> wrote:
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> I think the question below re does Samza become a sub-project of
> > > Kafka
> > > > > >> highlights the broader point around migrat
king...@improvedigital.com> wrote:
> >> > > >
> >> > > >> Hi all,
> >> > > >>
> >> > > >> I think the question below re does Samza become a sub-project of
> >> Kafka
> >> > > >> highligh
>>
>> > > >> From a selfish perspective I have some guys who have started
>> working
>> > > with
>> > > >> Samza and building some new consumers/producers was next up. Sounds
>> > like
>> > > >> that is absolutely not the di
building some new consumers/producers was next up.
> Sounds
> > > like
> > > > >> that is absolutely not the direction to go. I need to look into
> the
> > > KIP
> > > > in
> > > > >> more detail but for me the attractiveness of adding ne
> ingress/egress layer that I can plug a new connector into and have a
> > > lot of
> > > >> the heavy lifting re scale and reliability done for me then it gives
> > me
> > > all
> > > >> the pushing new consumers/producers would. If not then it
> c
complicates
> my
> > >> operational deployments.
> > >>
> > >> Which is similar to my other question with the proposal -- if we
> build a
> > >> fully available/stand-alone Samza plus the requisite shims to
> integrate
> > >> with Slider etc
tion deployment may still
> >> dominate mailing list traffic, if for different reasons than today.
> >>
> >> Don't get me wrong -- I'm comfortable with making the Samza dependency
> on
> >> Kafka much more explicit and I absolutely see the benefits in the
&
is/Jay describe. Samza as a library would likely be a very nice tool to
>> add to the Kafka ecosystem. I just have the concerns above re the
>> operational side.
>>
>> Garry
>>
>> -Original Message-
>> From: Gianmarco De Francisci Morales [mailto:g
that
> Chris/Jay describe. Samza as a library would likely be a very nice tool to
> add to the Kafka ecosystem. I just have the concerns above re the
> operational side.
>
> Garry
>
> -Original Message-
> From: Gianmarco De Francisci Morales [mailto:g...@apach
Hey Gianmarco,
I agree that most people view Samza as a compute layer on top of Kafka and
that is not actually a bad thing. We have kind of built things as if they
were totally separate which kind of makes things harder for people which
is, I think, the important thing to correct.
As to your ques
Hey Yan,
I think Chris and I are proposing the same thing. I not really saying that
we should literally make Samza a Kafka client, but rather that
philosophically what we want to have is closer to a fancy client than it is
to map/reduce (but current samza is the reverse).
To answer your questions
Guozhang,
Yeah I agree. Being able to run in YARN/Mesos is definitely doable and
perhaps easier. Having a generic command line run script should be possible
too but the question is how the wiring would work (e.g. how the config maps
to instantiated java objects). The current mechanism is pretty ha
cheers
Kartik
From: Jay Kreps [j...@confluent.io]
Sent: Tuesday, June 30, 2015 11:33 PM
To: dev@samza.apache.org
Subject: Re: Thoughts and obesrvations on Samza
Looks like gmail mangled the code example, it was supposed to look like
this:
Properties props = new Properties();
props.put(&qu
to add to the Kafka
ecosystem. I just have the concerns above re the operational side.
Garry
-Original Message-
From: Gianmarco De Francisci Morales [mailto:g...@apache.org]
Sent: 02 July 2015 12:56
To: dev@samza.apache.org
Subject: Re: Thoughts and obesrvations on Samza
Very interesting thoug
Very interesting thoughts.
>From outside, I have always perceived Samza as a computing layer over Kafka.
The question, maybe a bit provocative, is "should Samza be a sub-project of
Kafka then?"
Or does it make sense to keep it as a separate project with a separate
governance?
Cheers,
--
Gianmarc
Overall, I agree to couple with Kafka more tightly. Because Samza de facto
is based on Kafka, and it should leverage what Kafka has. At the same time,
Kafka does not need to reinvent what Samza already has. I also like the
idea of separating the ingestion and transformation.
But it is a little dif
Read through the code example and it looks good to me. A few thoughts
regarding deployment:
Today Samza deploys as executable runnable like:
deploy/samza/bin/run-job.sh --config-factory=... --config-path=file://...
And this proposal advocate for deploying Samza more as embedded libraries
in user
Looks like gmail mangled the code example, it was supposed to look like
this:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:4242");
StreamingConfig config = new StreamingConfig(props);
config.subscribe("test-topic-1", "test-topic-2");
config.processor(ExampleStream
Hey guys,
This came out of some conversations Chris and I were having around whether
it would make sense to use Samza as a kind of data ingestion framework for
Kafka (which ultimately lead to KIP-26 "copycat"). This kind of combined
with complaints around config and YARN and the discussion around
Hey all,
I have had some discussions with Samza engineers at LinkedIn and Confluent
and we came up with a few observations and would like to propose some
changes.
We've observed some things that I want to call out about Samza's design,
and I'd like to propose some changes.
* Samza is dependent u
79 matches
Mail list logo