Re: Thoughts and obesrvations on Samza

2015-07-13 Thread Yi Pan
what value the abstraction overhead of > the current model brings. > > Garry > > -Original Message- > From: Yan Fang [mailto:yanfang...@gmail.com] > Sent: 13 July 2015 19:58 > To: dev@samza.apache.org > Subject: Re: Thoughts and obesrvations on Samza &

Re: Thoughts and obesrvations on Samza

2015-07-13 Thread Yi Pan
ngs. > > Garry > > -----Original Message- > From: Yan Fang [mailto:yanfang...@gmail.com] > Sent: 13 July 2015 19:58 > To: dev@samza.apache.org > Subject: Re: Thoughts and obesrvations on Samza > > I am leaning to Jay's fifth approach. It is not radical and gives us some

Re: Thoughts and obesrvations on Samza

2015-07-13 Thread Guozhang Wang
>From peanut gallery.. I like Yi's proposal in re-scoping the Samza project / code-base as "Stream Processing as a Service" that will potentially include: 1. A service manager with some REST / Web UI to accept stream processing jobs in terms of tgz / configs and schedule them as for: a. partitio

Re: Thoughts and obesrvations on Samza

2015-07-13 Thread Yan Fang
I am leaning to Jay's fifth approach. It is not radical and gives us some time to see the outcome. In addition, I would suggest: 1) Keep the SystemConsumer/SystemProducer API. Because current SystemConsumer/SystemProducer API satisfies the usage (From Joardan, and even Garry's feedback) and is no

Re: Thoughts and obesrvations on Samza

2015-07-13 Thread Jordan Shaw
Jay, I think doing this iteratively in smaller chunks is a better way to go as new issues arise. As Navina said Kafka is a "stream system" and Samza is a "stream processor" and those two ideas should be mutually exclusive. -Jordan On Mon, Jul 13, 2015 at 10:06 AM, Jay Kreps wrote: > Hmm, though

Re: Thoughts and obesrvations on Samza

2015-07-13 Thread Jay Kreps
Hmm, thought about this more. Maybe this is just too much too quick. Overall I think there is some enthusiasm for the proposal but it's not really unanimous enough to make any kind of change this big cleanly. The board doesn't really like the merging stuff, user's are concerned about compatibility,

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Jay Kreps
Hey Chris, Yeah, I'm obviously in favor of this. The sub-project approach seems the ideal way to take a graceful step in this direction, so I will ping the board folks and see why they are discouraged, it would be good to understand that. If we go that route we would need to do a similar discussi

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Jay Kreps
Hey guys, There seems to be some confusion in the last few emails: there is no plan whatsoever to remove YARN support. The change suggested was to move the partition management out of the YARN app master and rely on Kafka's partition management. The advantage of this would be to make the vast majo

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Garrett Barton
Yi, What you just summarized makes a whole lot more sense to me. Shamelessly I am looking at this shift as a customer with a production workflow riding on it so I am looking for some kind of consistency into the future of Samza. This makes me feel a lot better about it. Thank you! On Sun, Ju

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Jakob Homan
I'm afraid I don't agree that we're anywhere near coming to a consensus, or even that we're all agreeing on what we're discussing. (I do totally agree that the discussion itself has been awesome both in tone and content, though). As Tim brought up and I mentioned, the Board is not big on subprojec

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Yi Pan
Just to make it explicitly clear what I am proposing, here is a version of more detailed description: The fourth option (in addition to what Jakob summarized) we are proposing is: - Recharter Samza to “stream processing as a service” - The current Samza core (the basic transformation API w/ basi

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Yi Pan
Hi, Chris, Thanks for sending out this concrete set of points here. I agree w/ all but have a slight different point view on 8). My view on this is: instead of sunset Samza as TLP, can we re-charter the scope of Samza to be the home for "running streaming process as a service"? My main motivatio

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Garrett Barton
​Hey all, just want to chime in before it too late. Been following samza for a long time, and using it in production for the past 6 months or so. In no particular order the things I like most about Samza are: - Yarn support, resiliency of my deployment is paramount. This is why I use Samza ove

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Tim Williams
On Sun, Jul 12, 2015 at 8:54 PM, Chris Riccomini wrote: > Hey all, > > I want to start by saying that I'm absolutely thrilled to be a part of this > community. The amount of level-headed, thoughtful, educated discussion > that's gone on over the past ~10 days is overwhelming. Wonderful. > > It see

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Chris Riccomini
That was meant to be "thread" not "threat". lol. :) On Sun, Jul 12, 2015 at 5:54 PM, Chris Riccomini wrote: > Hey all, > > I want to start by saying that I'm absolutely thrilled to be a part of > this community. The amount of level-headed, thoughtful, educated discussion > that's gone on over th

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Chris Riccomini
Hey all, I want to start by saying that I'm absolutely thrilled to be a part of this community. The amount of level-headed, thoughtful, educated discussion that's gone on over the past ~10 days is overwhelming. Wonderful. It seems like discussion is waning a bit, and we've reached some conclusion

Re: Thoughts and obesrvations on Samza

2015-07-10 Thread Yan Fang
> > > > >> > > > > >> > repo, > > > > >> > > > > >> > > but I'm actually not really sure (I can't find a > > > > definition > > > > >> > of a > > > > >

Re: Thoughts and obesrvations on Samza

2015-07-10 Thread Jay Kreps
t;> > > > > >> > > streaming" or "kafka streams" or something like that > > > would > > > >> > > > actually > > > >> > > > > >> do a > > > >> > > > > >> >

Re: Thoughts and obesrvations on Samza

2015-07-10 Thread Jay Kreps
t;> > > > >> >> > > > > >> > > I think if we did that they having naming or branding > >> like > >> >> > > "kafka > >> >> > > > > >> > > streaming" or "kafka streams&quo

Re: Thoughts and obesrvations on Samza

2015-07-10 Thread Julian Hyde
> >> > > Fwiw we actually considered this model originally when >> open >> >> > > > sourcing >> >> > > > > >> > Samza, >> >> > > > > >> > > however at that time Kafka was

Re: Thoughts and obesrvations on Samza

2015-07-10 Thread Yan Fang
; >> > > > not > > >> > > > > >> to > > >> > > > > >> > do > > >> > > > > >> > > it since we felt it would be limiting. From my point of > > view > > >> > the &g

Re: Thoughts and obesrvations on Samza

2015-07-10 Thread Jay Kreps
> > wrote: > >> > > > > >> > > > >> > > > > >> > >> Hi all, > >> > > > > >> > >> > >> > > > > >> > >> Lots of good thoughts here. > >> > >

Re: Thoughts and obesrvations on Samza

2015-07-10 Thread Jakob Homan
aking Samza >> > > fully >> > > > > >> > dependent >> > > > > >> > >> on Kafka acknowledges that the system-independence was >> never >> > as >> > > > > real >> > > > > >>

Re: Thoughts and obesrvations on Samza

2015-07-10 Thread Roger Hoover
gt; > >> > >> with large amounts of state, I think SAMZA-617 would be a > big > > > > boon, > > > > > >> > since > > > > > >> > >> restoring state off the changelog on every single restart > is > > &

Re: Thoughts and obesrvations on Samza

2015-07-10 Thread Jay Kreps
t; > > > >> > run > > > > >> > >> on each container that is part of the job (in which case, how > > > does > > > > >> the > > > > >> > job > > > > >> > >> submission to the cluster

Re: Thoughts and obesrvations on Samza

2015-07-10 Thread Roger Hoover
>> > >> tight coupling between different Apache projects (e.g. Curator > > and > > > >> > >> Zookeeper, or Slider and YARN), so I think remaining separate > > would > > > >> be > > > >> > ok. > > > >> > >

Re: Thoughts and obesrvations on Samza

2015-07-09 Thread Tim Williams
Some thoughts from the peanut gallery... On Thu, Jul 9, 2015 at 5:14 PM, Martin Kleppmann wrote: > Thanks Julian for calling out the principle of community over code, which is > super important. If it was just a matter of code, the Kafka project could > simply pull in the Samza code (or write a

Re: Thoughts and obesrvations on Samza

2015-07-09 Thread Yi Pan
be > >>>>>>>>>> (a) > >>>>>>>>>>>> actually get better alignment in user experience, and (b) > >>>> express > >>>>>>>>>> this in > >>>>>>>>>>&g

Re: Thoughts and obesrvations on Samza

2015-07-09 Thread Martin Kleppmann
;>>>>>> 3. Unify the programming experience so the client and Samza >>>> api >>>>>>>> share >>>>>>>>>>>> config/monitoring/naming/packaging/etc. >>>>>>>>>>>> >>>>

Re: Thoughts and obesrvations on Samza

2015-07-09 Thread Julian Hyde
> >>> > >> > > processing, (2) we learned that abstracting out the stream >> > well is >> > >>> > >> > > basically impossible, (3) we learned it is really hard to >> > keep the >> > >>> > two

Re: Thoughts and obesrvations on Samza

2015-07-08 Thread Jordan Shaw
gt; > >>> > nobody > > >>> > >> but > > >>> > >> > >> Kafka actually implements it. (Databus is perhaps an > > exception, > > >>> but > > >>> > >> it > > >>> > >>

RE: Thoughts and obesrvations on Samza

2015-07-08 Thread Thomas Becker
ers won't use anyway feels like it could be a win. -Tommy From: Jay Kreps [j...@confluent.io] Sent: Tuesday, July 07, 2015 2:35 PM To: dev@samza.apache.org Subject: Re: Thoughts and obesrvations on Samza Hey Roger, I couldn't agree more. We spe

Re: Thoughts and obesrvations on Samza

2015-07-08 Thread Jay Kreps
; >> > denominator. > >>> > >> > >> For example, would host affinity (SAMZA-617) still be > possible? > >>> For > >>> > >> jobs > >>> > >> > >> with large amounts of state, I think SAMZA-617 would be a big >

Re: Thoughts and obesrvations on Samza

2015-07-08 Thread Jakob Homan
job to a >>> > cluster, >>> > >> is >>> > >> > the >>> > >> > >> idea that the instantiation code runs on a client somewhere, >>> which >>> > >> then >>> > >> > >> pokes the necessary endpoi

Re: Thoughts and obesrvations on Samza

2015-07-08 Thread Ben Kirwin
en different Apache projects (e.g. Curator >> and >> > >> > >> Zookeeper, or Slider and YARN), so I think remaining separate >> would >> > >> be >> > >> > ok. >> > >> > >> Even if Samza is fully dependent on Kafka, there is e

Re: Thoughts and obesrvations on Samza

2015-07-07 Thread Jay Kreps
;> > >> > > >> > >> From a project management perspective, I guess the "new Samza" > > would > > >> > have > > >> > >> to be developed on a branch alongside ongoing maintenance of the > > >> current > > >

Re: Thoughts and obesrvations on Samza

2015-07-07 Thread Roger Hoover
t more about > this > >> if > >> > >>> you'd be interested. I think Chris and I started with the idea of > >> "what > >> > >>> would it take to make Samza a kick-ass ingestion tool" but > >> ultimately > >

Re: Thoughts and obesrvations on Samza

2015-07-07 Thread Jay Kreps
at's super frustrating. I'd be happy to chat more about > this > >> if > >> > >>> you'd be interested. I think Chris and I started with the idea of > >> "what > >> > >>> would it take to make Samza a kick-ass

Re: Thoughts and obesrvations on Samza

2015-07-07 Thread Gianmarco De Francisci Morales
gt;>> >> > >>> With regard to your point about slider, I don't necessarily >> disagree. >> > >> But I >> > >>> think getting good YARN support is quite doable and I think we can >> make >> > >>> that work w

Re: Thoughts and obesrvations on Samza

2015-07-07 Thread Gianmarco De Francisci Morales
nologies > > people > > >>> want to use (Docker, Kubernetes, various cloud-specific deploy tools, > > >> etc) > > >>> I really think it is important to get this right. > > >>> > > >>> -Jay > > >>> > > >>>

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Jay Kreps
v1 release but I'm not sure it feels > >> right to > >>>> launch a v1 then immediately plan to deprecate most of it. > >>>> > >>>> From a selfish perspective I have some guys who have started working > >> with > >>>> Samza

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Yi Pan
> > > On Thu, Jul 2, 2015 at 4:17 AM, Garry Turkington < >> > > > g.turking...@improvedigital.com> wrote: >> > > > >> > > >> Hi all, >> > > >> >> > > >> I think the question below re does Samza become

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Timothy Chen
; > > >> >> > > >> I think the question below re does Samza become a sub-project of >> Kafka >> > > >> highlights the broader point around migration. Chris mentions >> Samza's >> > > >> maturity is heading towards a v1

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Yi Pan
; > >> Samza and building some new consumers/producers was next up. Sounds > > like > > > >> that is absolutely not the direction to go. I need to look into the > > KIP > > > in > > > >> more detail but for me the attractiveness of adding new S

RE: Thoughts and obesrvations on Samza

2015-07-06 Thread Ken Krugler
, Elasticsearch is doing fine as a separate project :) > From: Martin Kleppmann > Sent: July 6, 2015 1:18:29pm PDT > To: dev@samza.apache.org > Subject: Re: Thoughts and obesrvations on Samza > > Ok, thanks for the clarifications. Just a few follow-up comments. > > - I

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Jay Kreps
t;> the heavy lifting re scale and reliability done for me then it gives > me > > all > > >> the pushing new consumers/producers would. If not then it complicates > my > > >> operational deployments. > > >> > > >> Which is simila

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Martin Kleppmann
nts. If there is a generic Kafka >>>> ingress/egress layer that I can plug a new connector into and have a >> lot of >>>> the heavy lifting re scale and reliability done for me then it gives me >> all >>>> the pushing new consumers/producers would

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Yi Pan
p and get a reliable production deployment may still > >> dominate mailing list traffic, if for different reasons than today. > >> > >> Don't get me wrong -- I'm comfortable with making the Samza dependency > on > >> Kafka much more explicit and I abso

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Jay Kreps
We may make it much easier for a newcomer to get something running but > >> having them step up and get a reliable production deployment may still > >> dominate mailing list traffic, if for different reasons than today. > >> > >> Don't get me wrong -- I&

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Yi Pan
Hi, Guozhang, {quote} but I think if we decide to go this route we'd better do it now than later as the protocol is not officially "released" yet. This may delay the first release of the new consumer. {quote} I totally agree. Given that potential heavy migration cost later, I think that a slight d

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Yi Pan
Hi, Gianmarco, {quote} However, I think the fundamental operation that Samza, Copycat, and Kafka consumers should agree upon is "how can I specify in a simple and transparent way which partitions I want to consume, and how?". {quote} I agree that some basic partition distribution mechanism can be

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Martin Kleppmann
't get me wrong -- I'm comfortable with making the Samza dependency on >> Kafka much more explicit and I absolutely see the benefits in the >> reduction of duplication and clashing terminologies/abstractions that >> Chris/Jay describe. Samza as a library would likely be a very

Re: Thoughts and obesrvations on Samza

2015-07-05 Thread Guozhang Wang
1. I am neutral to modifying the consumer rebalance protocol to move the logic pluggable to the client side, but I think if we decide to go this route we'd better do it now than later as the protocol is not officially "released" yet. This may delay the first release of the new consumer. 2. I like

Re: Thoughts and obesrvations on Samza

2015-07-03 Thread Jay Kreps
Hey Gianmarco, To your broader point, I agree that having a close alignment with Kafka would be a great thing in terms of adoption/discoverability/etc. There areas where I think this matters a lot are: 1. Website and docs: ideally when reading about Kafka you should be able to find out about Samza

Re: Thoughts and obesrvations on Samza

2015-07-03 Thread Jay Kreps
> > > > > > > With regard to your point about slider, I don't necessarily > > > > > disagree. > > > > > > > > But I > > > > > > > > > think getting good YARN support is quite doable and I think > > we > > &g

Re: Thoughts and obesrvations on Samza

2015-07-03 Thread Gianmarco De Francisci Morales
Hi Jay, Thanks for your answer. > However a few things have changed since that original design: > 1. We now have the additional use cases of copycat and Samza > 2. We now realize that the assignment strategies don't actually necessarily > ensure each partition is assigned to only one consumer--t

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
> > > > for > > > > > > each > > > > > > > > and they are all a little different so testing is really > hard. > > In > > > > the > > > > > > > > absence of this we have been stuck with j

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Jay Kreps
ker, Kubernetes, various cloud-specific deploy > > > tools, > > > > > > etc) > > > > > > > I really think it is important to get this right. > > > > > > > > > > > > > > -Jay > > > > > > > &

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
of work being put in to slider, marathon, > > aws > > >> > > > tooling, not to mention the umpteen related packaging > technologies > > >> > people > > >> > > > want to use (Docker, Kubernetes, various cloud-specific deploy > > >&

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
com> wrote: > > > > > > > > > > > >> Hi all, > > > > > >> > > > > > >> I think the question below re does Samza become a sub-project of > > > Kafka > > > > > >> highlights the broader point around migrat

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Jay Kreps
king...@improvedigital.com> wrote: > >> > > > > >> > > >> Hi all, > >> > > >> > >> > > >> I think the question below re does Samza become a sub-project of > >> Kafka > >> > > >> highligh

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
>> >> > > >> From a selfish perspective I have some guys who have started >> working >> > > with >> > > >> Samza and building some new consumers/producers was next up. Sounds >> > like >> > > >> that is absolutely not the di

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Guozhang Wang
building some new consumers/producers was next up. > Sounds > > > like > > > > >> that is absolutely not the direction to go. I need to look into > the > > > KIP > > > > in > > > > >> more detail but for me the attractiveness of adding ne

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
> ingress/egress layer that I can plug a new connector into and have a > > > lot of > > > >> the heavy lifting re scale and reliability done for me then it gives > > me > > > all > > > >> the pushing new consumers/producers would. If not then it > c

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Jay Kreps
complicates > my > > >> operational deployments. > > >> > > >> Which is similar to my other question with the proposal -- if we > build a > > >> fully available/stand-alone Samza plus the requisite shims to > integrate > > >> with Slider etc

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
tion deployment may still > >> dominate mailing list traffic, if for different reasons than today. > >> > >> Don't get me wrong -- I'm comfortable with making the Samza dependency > on > >> Kafka much more explicit and I absolutely see the benefits in the &

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Sriram
is/Jay describe. Samza as a library would likely be a very nice tool to >> add to the Kafka ecosystem. I just have the concerns above re the >> operational side. >> >> Garry >> >> -Original Message- >> From: Gianmarco De Francisci Morales [mailto:g

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Jay Kreps
that > Chris/Jay describe. Samza as a library would likely be a very nice tool to > add to the Kafka ecosystem. I just have the concerns above re the > operational side. > > Garry > > -Original Message- > From: Gianmarco De Francisci Morales [mailto:g...@apach

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Jay Kreps
Hey Gianmarco, I agree that most people view Samza as a compute layer on top of Kafka and that is not actually a bad thing. We have kind of built things as if they were totally separate which kind of makes things harder for people which is, I think, the important thing to correct. As to your ques

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Jay Kreps
Hey Yan, I think Chris and I are proposing the same thing. I not really saying that we should literally make Samza a Kafka client, but rather that philosophically what we want to have is closer to a fancy client than it is to map/reduce (but current samza is the reverse). To answer your questions

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Jay Kreps
Guozhang, Yeah I agree. Being able to run in YARN/Mesos is definitely doable and perhaps easier. Having a generic command line run script should be possible too but the question is how the wiring would work (e.g. how the config maps to instantiated java objects). The current mechanism is pretty ha

RE: Thoughts and obesrvations on Samza

2015-07-02 Thread Kartik Paramasivam
cheers Kartik From: Jay Kreps [j...@confluent.io] Sent: Tuesday, June 30, 2015 11:33 PM To: dev@samza.apache.org Subject: Re: Thoughts and obesrvations on Samza Looks like gmail mangled the code example, it was supposed to look like this: Properties props = new Properties(); props.put(&qu

RE: Thoughts and obesrvations on Samza

2015-07-02 Thread Garry Turkington
to add to the Kafka ecosystem. I just have the concerns above re the operational side. Garry -Original Message- From: Gianmarco De Francisci Morales [mailto:g...@apache.org] Sent: 02 July 2015 12:56 To: dev@samza.apache.org Subject: Re: Thoughts and obesrvations on Samza Very interesting thoug

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Gianmarco De Francisci Morales
Very interesting thoughts. >From outside, I have always perceived Samza as a computing layer over Kafka. The question, maybe a bit provocative, is "should Samza be a sub-project of Kafka then?" Or does it make sense to keep it as a separate project with a separate governance? Cheers, -- Gianmarc

Re: Thoughts and obesrvations on Samza

2015-07-01 Thread Yan Fang
Overall, I agree to couple with Kafka more tightly. Because Samza de facto is based on Kafka, and it should leverage what Kafka has. At the same time, Kafka does not need to reinvent what Samza already has. I also like the idea of separating the ingestion and transformation. But it is a little dif

Re: Thoughts and obesrvations on Samza

2015-07-01 Thread Guozhang Wang
Read through the code example and it looks good to me. A few thoughts regarding deployment: Today Samza deploys as executable runnable like: deploy/samza/bin/run-job.sh --config-factory=... --config-path=file://... And this proposal advocate for deploying Samza more as embedded libraries in user

Re: Thoughts and obesrvations on Samza

2015-06-30 Thread Jay Kreps
Looks like gmail mangled the code example, it was supposed to look like this: Properties props = new Properties(); props.put("bootstrap.servers", "localhost:4242"); StreamingConfig config = new StreamingConfig(props); config.subscribe("test-topic-1", "test-topic-2"); config.processor(ExampleStream

Re: Thoughts and obesrvations on Samza

2015-06-30 Thread Jay Kreps
Hey guys, This came out of some conversations Chris and I were having around whether it would make sense to use Samza as a kind of data ingestion framework for Kafka (which ultimately lead to KIP-26 "copycat"). This kind of combined with complaints around config and YARN and the discussion around

Thoughts and obesrvations on Samza

2015-06-30 Thread Chris Riccomini
Hey all, I have had some discussions with Samza engineers at LinkedIn and Confluent and we came up with a few observations and would like to propose some changes. We've observed some things that I want to call out about Samza's design, and I'd like to propose some changes. * Samza is dependent u