With regard to KIP-130: Form KIP-130 thread:
> About subtopologies and tasks. We do have the concept of subtopologies > already in KIP-120. It's only missing and ID that allow to link a subtopology > to a task. > > IMHO, adding a simple variable to `Subtopoloy` that provide the id should be > sufficient. We can simply document in the JavaDocs how Subtopology and > TaskMetadata can be linked to each other. I updated KIP-120 to include one for field for this. -Matthias On 3/27/17 4:27 PM, Matthias J. Sax wrote: > Hi, > > I would like to trigger this discussion again. It seems that the naming > question is rather subjective and both main alternatives (w/ or w/o the > word "Topology" in the name) have pros/cons. > > If you have any further thought, please share it. At the moment I still > propose `StreamsBuilder` in the KIP. > > I also want do point out, that the VOTE thread was already started. So > if you like the current KIP, please cast your vote there. > > > Thanks a lot! > > > -Matthias > > > On 3/23/17 3:38 PM, Matthias J. Sax wrote: >> Jay, >> >> about the naming schema: >> >>>> 1. "kstreams" - the DSL >>>> 2. "processor api" - the lower level callback/topology api >>>> 3. KStream/KTable - entities in the kstreams dsl >>>> 4. "Kafka Streams" - General name for stream processing stuff in Kafka, >>>> including both kstreams and the processor API plus the underlying >>>> implementation. >> >> It think this terminology has some issues... To me, `kstreams` was >> always not more than an abbreviation for `Kafka Streams` -- thus (1) and >> (4) kinda collide here. Following questions on the mailing list etc I >> often see people using kstreams or kstream exactly a abbr. for "Kafka >> Streams" >> >>> I think referring to the dsl as "kstreams" is cute and pneumonic and not >>> particularly confusing. >> >> I disagree here. It's a very subtle difference between `kstreams` and >> `KStream` -- just singular/plural, thus (1) and (3) also "collide" -- >> it's just too close to each other. >> >> Thus, I really think it's a good idea to get a new name for the DSL to >> get a better separation of the 4 concepts. >> >> Furthermore, we use the term "Streams API". Thus, I think >> `StreamsBuilder` (or `StreamsTopologyBuilder`) are both very good names. >> >> >> Thus, I prefer to keep the KIP as is (suggesting `StreamsBuilder`). >> >> I will start a VOTE thread. Of course, we can still discuss the naming >> issue. :) >> >> >> >> -Matthias >> >> >> On 3/22/17 8:53 PM, Jay Kreps wrote: >>> I don't feel strongly on this, so I'm happy with whatever everyone else >>> wants. >>> >>> Michael, I'm not arguing that people don't need to understand topologies, I >>> just think it is like rocks db, you need to understand it when >>> debugging/operating but not in the initial coding since the metaphor we're >>> providing at this layer isn't a topology of processors but rather something >>> like the collections api. Anyhow it won't hurt people to have it there. >>> >>> For the original KStreamBuilder thing, I think that came from the naming we >>> discussed originally: >>> >>> 1. "kstreams" - the DSL >>> 2. "processor api" - the lower level callback/topology api >>> 3. KStream/KTable - entities in the kstreams dsl >>> 4. "Kafka Streams" - General name for stream processing stuff in Kafka, >>> including both kstreams and the processor API plus the underlying >>> implementation. >>> >>> I think referring to the dsl as "kstreams" is cute and pneumonic and not >>> particularly confusing. Just like referring to the "java collections >>> library" isn't confusing even though it contains the Iterator interface >>> which is not actually itself a collection. >>> >>> So I think KStreamBuilder should technically have been KstreamsBuilder and >>> is intended not to be a builder of a KStream but rather the builder for the >>> kstreams DSL. Okay, yes, that *is* slightly confusing. :-) >>> >>> -Jay >>> >>> On Wed, Mar 22, 2017 at 11:25 AM, Guozhang Wang <wangg...@gmail.com> wrote: >>> >>>> Regarding the naming of `StreamsTopologyBuilder` v.s. `StreamsBuilder` that >>>> are going to be used in DSL, I agree both has their arguments: >>>> >>>> 1. On one side, people using the DSL layer probably do not need to be aware >>>> (or rather, "learn about") of the "topology" concept, although this concept >>>> is a publicly exposed one in Kafka Streams. >>>> >>>> 2. On the other side, StreamsBuilder#build() returning a Topology object >>>> sounds a little weird, at least to me (admittedly subjective matter). >>>> >>>> >>>> Since the second bullet point seems to be more "subjective" and many people >>>> are not worried about it, I'm OK to go with the other option. >>>> >>>> >>>> Guozhang >>>> >>>> >>>> On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io> >>>> wrote: >>>> >>>>> Forwarding to kafka-user. >>>>> >>>>> >>>>> ---------- Forwarded message ---------- >>>>> From: Michael Noll <mich...@confluent.io> >>>>> Date: Wed, Mar 22, 2017 at 8:48 AM >>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API >>>>> To: d...@kafka.apache.org >>>>> >>>>> >>>>> Matthias, >>>>> >>>>>> @Michael: >>>>>> >>>>>> You seemed to agree with Jay about not exposing the `Topology` concept >>>>>> in our main entry class (ie, current KStreamBuilder), thus, I >>>>>> interpreted that you do not want `Topology` in the name either (I am a >>>>>> little surprised by your last response, that goes the opposite >>>>> direction). >>>>> >>>>> Oh, sorry for not being clear. >>>>> >>>>> What I wanted to say in my earlier email was the following: Yes, I do >>>>> agree with most of Jay's reasoning, notably about carefully deciding how >>>>> much and which parts of the API/concept "surface" we expose to users of >>>> the >>>>> DSL. However, and this is perhaps where I wasn't very clear, I disagree >>>> on >>>>> the particular opinion about not exposing the topology concept to DSL >>>>> users. Instead, I think the concept of a topology is important to >>>>> understand even for DSL users -- particularly because of the way the DSL >>>> is >>>>> currently wiring your processing logic via the builder pattern. (As I >>>>> noted, e.g. Akka uses a different approach where you might be able to get >>>>> away with not exposing the "topology" concept, but even in Akka there's >>>> the >>>>> notion of graphs and flows.) >>>>> >>>>> >>>>>>> StreamsBuilder builder = new StreamsBuilder(); >>>>>>> >>>>>>> // And here you'd define your...well, what actually? >>>>>>> // Ah right, you are composing a topology here, though you are >>>> not >>>>>>> aware of it. >>>>>> >>>>>> Yes. You are not aware of if -- that's the whole point about it -- >>>> don't >>>>>> put the Topology concept in the focus... >>>>> >>>>> Let me turn this around, because that was my point: it's confusing to >>>> have >>>>> a name "StreamsBuilder" if that thing isn't building streams, and it is >>>>> not. >>>>> >>>>> As I mentioned before, I do think it is a benefit to make it clear to DSL >>>>> users that there are two aspects at play: (1) defining the logic/plan of >>>>> your processing, and (2) the execution of that plan. I have a less >>>> strong >>>>> opinion whether or not having "topology" in the names would help to >>>>> communicate this separation as well as combination of (1) and (2) to make >>>>> your app work as expected. >>>>> >>>>> If we stick with `KafkaStreams` for (2) *and* don't like having >>>> "topology" >>>>> in the name, then perhaps we should rename `KStreamBuilder` to >>>>> `KafkaStreamsBuilder`. That at least gives some illusion of a combo of >>>> (1) >>>>> and (2). IMHO, `KafkaStreamsBuilder` highlights better that "it is a >>>>> builder/helper for the Kafka Streams API", rather than "a builder for >>>>> streams". >>>>> >>>>> Also, I think some of the naming challenges we're discussing here are >>>>> caused by having this builder pattern in the first place. If the Streams >>>>> API was implemented in Scala, for example, we could use implicits for >>>>> helping us to "stitch streams/tables together to build the full >>>> topology", >>>>> thus using a different (better?) approach to composing your topologies >>>> that >>>>> through a builder pattern. So: perhaps there's a better way then the >>>>> builder, and that way would also be clearer on terminology? That said, >>>>> this might take this KIP off-scope. >>>>> >>>>> -Michael >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Mar 22, 2017 at 12:33 AM, Matthias J. Sax <matth...@confluent.io >>>>> >>>>> wrote: >>>>> >>>>>> @Guozhang: >>>>>> >>>>>> I recognized that you want to have `Topology` in the name. But it seems >>>>>> that more people preferred to not have it (Jay, Ram, Michael [?], >>>>> myself). >>>>>> >>>>>> @Michael: >>>>>> >>>>>> You seemed to agree with Jay about not exposing the `Topology` concept >>>>>> in our main entry class (ie, current KStreamBuilder), thus, I >>>>>> interpreted that you do not want `Topology` in the name either (I am a >>>>>> little surprised by your last response, that goes the opposite >>>>> direction). >>>>>> >>>>>>> StreamsBuilder builder = new StreamsBuilder(); >>>>>>> >>>>>>> // And here you'd define your...well, what actually? >>>>>>> // Ah right, you are composing a topology here, though you are >>>> not >>>>>>> aware of it. >>>>>> >>>>>> Yes. You are not aware of if -- that's the whole point about it -- >>>> don't >>>>>> put the Topology concept in the focus... >>>>>> >>>>>> Furthermore, >>>>>> >>>>>>>>> So what are you building here with StreamsBuilder? Streams (hint: >>>>> No)? >>>>>>>>> And what about tables -- is there a TableBuilder (hint: No)? >>>>>> >>>>>> I am not sure, if this is too much a concern. In contrast to >>>>>> `KStreamBuilder` (singular) that contains `KStream` and thus puts >>>>>> KStream concept in focus and thus degrade `KTable`, `StreamsBuilder` >>>>>> (plural) focuses on "Streams API". IMHO, it does not put focus on >>>>>> KStream. It's just a builder from the Streams API -- you don't need to >>>>>> worry what you are building -- and you don't need to think about the >>>>>> `Topology` concept (of course, you see that .build() return a >>>> Topology). >>>>>> >>>>>> >>>>>> Personally, I see pros and cons for both `StreamsBuilder` and >>>>>> `StreamsTopologyBuilder` and thus, I am fine either way. Maybe Jay and >>>>>> Ram can follow up and share their thoughts? >>>>>> >>>>>> I would also help a lot if other people put their vote for a name, too. >>>>>> >>>>>> >>>>>> >>>>>> -Matthias >>>>>> >>>>>> >>>>>> >>>>>> On 3/21/17 2:11 PM, Guozhang Wang wrote: >>>>>>> Just to clarify, I did want to have the term `Topology` as part of >>>> the >>>>>>> class name, for the reasons above. I'm not too worried about to be >>>>>>> consistent with the previous names, but I feel the >>>> `XXTopologyBuilder` >>>>> is >>>>>>> better than `XXStreamsBuilder` since it's build() function returns a >>>>>>> Topology object. >>>>>>> >>>>>>> >>>>>>> Guozhang >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 20, 2017 at 12:53 PM, Michael Noll <mich...@confluent.io >>>>> >>>>>> wrote: >>>>>>> >>>>>>>> Hmm, I must admit I don't like this last update all too much. >>>>>>>> >>>>>>>> Basically we would have: >>>>>>>> >>>>>>>> StreamsBuilder builder = new StreamsBuilder(); >>>>>>>> >>>>>>>> // And here you'd define your...well, what actually? >>>>>>>> // Ah right, you are composing a topology here, though you are >>>> not >>>>>>>> aware of it. >>>>>>>> >>>>>>>> KafkaStreams streams = new KafkaStreams(builder.build(), >>>>>>>> streamsConfiguration); >>>>>>>> >>>>>>>> So what are you building here with StreamsBuilder? Streams (hint: >>>>> No)? >>>>>>>> And what about tables -- is there a TableBuilder (hint: No)? >>>>>>>> >>>>>>>> I also interpret Guozhang's last response as that he'd prefer to >>>> have >>>>>>>> "Topology" in the class/interface names. I am aware that we >>>> shouldn't >>>>>>>> necessarily use the status quo to make decisions about future >>>> changes, >>>>>> but >>>>>>>> the very first concept we explain in the Kafka Streams documentation >>>>> is >>>>>>>> "Stream Processing Topology": >>>>>>>> https://kafka.apache.org/0102/documentation/streams#streams_ >>>> concepts >>>>>>>> >>>>>>>> -Michael >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 20, 2017 at 7:55 PM, Matthias J. Sax < >>>>> matth...@confluent.io >>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> \cc users list >>>>>>>>> >>>>>>>>> >>>>>>>>> -------- Forwarded Message -------- >>>>>>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API >>>>>>>>> Date: Mon, 20 Mar 2017 11:51:01 -0700 >>>>>>>>> From: Matthias J. Sax <matth...@confluent.io> >>>>>>>>> Organization: Confluent Inc >>>>>>>>> To: d...@kafka.apache.org >>>>>>>>> >>>>>>>>> I want to push this discussion further. >>>>>>>>> >>>>>>>>> Guozhang's argument about "exposing" the Topology class is valid. >>>>> It's >>>>>> a >>>>>>>>> public class anyway, so it's not as issue. However, I think the >>>>>> question >>>>>>>>> is not too much about exposing but about "advertising" (ie, putting >>>>> it >>>>>>>>> into the focus) or not at DSL level. >>>>>>>>> >>>>>>>>> >>>>>>>>> If I interpret the last replies correctly, it seems that we could >>>>> agree >>>>>>>>> on "StreamsBuilder" as name. I did update the KIP accordingly. >>>> Please >>>>>>>>> correct me, if I got this wrong. >>>>>>>>> >>>>>>>>> >>>>>>>>> If there are not other objects -- this naming discussion was the >>>> last >>>>>>>>> open point to far -- I would like the start the VOTE thread. >>>>>>>>> >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/14/17 2:37 PM, Guozhang Wang wrote: >>>>>>>>>> I'd like to keep the term "Topology" inside the builder class >>>> since, >>>>>> as >>>>>>>>>> Matthias mentioned, this builder#build() function returns a >>>>> "Topology" >>>>>>>>>> object, whose type is a public class anyways. Although you can >>>> argue >>>>>> to >>>>>>>>> let >>>>>>>>>> users always call >>>>>>>>>> >>>>>>>>>> "new KafkaStreams(builder.build())" >>>>>>>>>> >>>>>>>>>> I think it is still more benefit to expose this concept. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Guozhang >>>>>>>>>> >>>>>>>>>> On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax < >>>>>>>> matth...@confluent.io >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks for your input Michael. >>>>>>>>>>> >>>>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the >>>>>>>>> logical >>>>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and >>>>>>>>>>>>> `KafkaStreams.table("input-topic")`. >>>>>>>>>>> >>>>>>>>>>> I don't thinks this is a good idea, for multiple reasons: >>>>>>>>>>> >>>>>>>>>>> (1) We would reuse a name for a completely different purpose. The >>>>>> same >>>>>>>>>>> argument for not renaming KStreamBuilder to TopologyBuilder. The >>>>>>>>>>> confusion would just be too large. >>>>>>>>>>> >>>>>>>>>>> So if we would start from scratch, it might be ok to do so, but >>>> now >>>>>> we >>>>>>>>>>> cannot make this move, IMHO. >>>>>>>>>>> >>>>>>>>>>> Also a clarification question: do you suggest to have static >>>>> methods >>>>>>>>>>> #stream and #table -- I am not sure if this would work? >>>>>>>>>>> (or was you code snippet just simplification?) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> (2) Kafka Streams is basically a "processing client" next to >>>>> consumer >>>>>>>>>>> and producer client. Thus, the name KafkaStreams aligns to the >>>>> naming >>>>>>>>>>> schema of KafkaConsumer and KafkaProducer. I am not sure if it >>>>> would >>>>>>>> be >>>>>>>>>>> a good choice to "break" this naming scheme. >>>>>>>>>>> >>>>>>>>>>> Btw: this is also the reason, why we have KafkaStreams#close() -- >>>>> and >>>>>>>>>>> not KafkaStreams#stop() -- because #close() aligns with consumer >>>>> and >>>>>>>>>>> producer client. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> (3) On more argument against using KafkaStreams as DSL entry >>>> class >>>>>>>> would >>>>>>>>>>> be, that it would need to create a Topology that can be given to >>>>> the >>>>>>>>>>> "runner/processing-client". Thus the pattern would be >>>>>>>>>>> >>>>>>>>>>>> Topology topology = streams.build(); >>>>>>>>>>>> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology) >>>>>>>>>>> >>>>>>>>>>> (or of course as a one liner). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On the other hand, there was the idea (that we intentionally >>>>> excluded >>>>>>>>>>> from the KIP), to change the "client instantiation" pattern. >>>>>>>>>>> >>>>>>>>>>> Right now, a new client in actively instantiated (ie, by calling >>>>>>>> "new") >>>>>>>>>>> and the topology if provided as a constructor argument. However, >>>>>>>>>>> especially for DSL (not sure if it would make sense for PAPI), >>>> the >>>>>> DSL >>>>>>>>>>> builder could create the client for the user. >>>>>>>>>>> >>>>>>>>>>> Something like this: >>>>>>>>>>> >>>>>>>>>>>> KStreamBuilder builder = new KStreamBuilder(); >>>>>>>>>>>> builder.whatever() // use the builder >>>>>>>>>>>> >>>>>>>>>>>> StreamsConfig config = .... >>>>>>>>>>>> KafkaStreams streams = builder.getKafkaStreams(config); >>>>>>>>>>> >>>>>>>>>>> If we change the patter like this, the notion a the "DSL builder" >>>>>>>> would >>>>>>>>>>> change, as it does not create a topology anymore, but it creates >>>>> the >>>>>>>>>>> "processing client". This would address Jay's concern about "not >>>>>>>>>>> exposing concept users don't need the understand" and would not >>>>>>>> require >>>>>>>>>>> to include the word "Topology" in the DSL builder class name, >>>>> because >>>>>>>>>>> the builder does not build a Topology anymore. >>>>>>>>>>> >>>>>>>>>>> I just put some names that came to my mind first hand -- did not >>>>>> think >>>>>>>>>>> about good names. It's just to discuss the pattern. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Matthias >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 3/14/17 3:36 AM, Michael Noll wrote: >>>>>>>>>>>> I see Jay's point, and I agree with much of it -- notably about >>>>>> being >>>>>>>>>>>> careful which concepts we do and do not expose, depending on >>>> which >>>>>>>> user >>>>>>>>>>>> group / user type is affected. That said, I'm not sure yet >>>>> whether >>>>>>>> or >>>>>>>>>>> not >>>>>>>>>>>> we should get rid of "Topology" (or a similar term) in the DSL. >>>>>>>>>>>> >>>>>>>>>>>> For what it's worth, here's how related technologies define/name >>>>>>>> their >>>>>>>>>>>> "topologies" and "builders". Note that, in all cases, it's >>>> about >>>>>>>>>>>> constructing a logical processing plan, which then is being >>>>>>>>> executed/run. >>>>>>>>>>>> >>>>>>>>>>>> - `Pipeline` (Google Dataflow/Apache Beam) >>>>>>>>>>>> - To add a source you first instantiate the Source (e.g. >>>>>>>>>>>> `TextIO.Read.from("gs://some/inputData.txt")`), >>>>>>>>>>>> then attach it to your processing plan via >>>>>>>>>>> `Pipeline#apply(<source>)`. >>>>>>>>>>>> This setup is a bit different to our DSL because in our >>>> DSL >>>>>> the >>>>>>>>>>>> builder does both, i.e. >>>>>>>>>>>> instantiating + auto-attaching to itself. >>>>>>>>>>>> - To execute the processing plan you call >>>>> `Pipeline#execute()`. >>>>>>>>>>>> - `StreamingContext`` (Spark): This setup is similar to our DSL. >>>>>>>>>>>> - To add a source you call e.g. >>>>>>>>>>>> `StreamingContext#socketTextStream("localhost", 9999)`. >>>>>>>>>>>> - To execute the processing plan you call >>>>>>>>>>> `StreamingContext#execute()`. >>>>>>>>>>>> - `StreamExecutionEnvironment` (Flink): This setup is similar to >>>>> our >>>>>>>>> DSL. >>>>>>>>>>>> - To add a source you call e.g. >>>>>>>>>>>> `StreamExecutionEnvironment#socketTextStream("localhost", >>>> 9999)`. >>>>>>>>>>>> - To execute the processing plan you call >>>>>>>>>>>> `StreamExecutionEnvironment#execute()`. >>>>>>>>>>>> - `Graph`/`Flow` (Akka Streams), as a result of composing >>>> Sources >>>>> (~ >>>>>>>>>>>> `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`) >>>>>>>>>>>> into Flows, which are [Runnable]Graphs. >>>>>>>>>>>> - You instantiate a Source directly, and then compose the >>>>> Source >>>>>>>>> with >>>>>>>>>>>> Sinks to create a RunnableGraph: >>>>>>>>>>>> see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], >>>>>>>>> Mat2]): >>>>>>>>>>>> RunnableGraph[Mat]`. >>>>>>>>>>>> - To execute the processing plan you call `Flow#run()`. >>>>>>>>>>>> >>>>>>>>>>>> In our DSL, in comparison, we do: >>>>>>>>>>>> >>>>>>>>>>>> - `KStreamBuilder` (Kafka Streams API) >>>>>>>>>>>> - To add a source you call e.g. >>>> `KStreamBuilder#stream("input- >>>>>>>>>>> topic")`. >>>>>>>>>>>> - To execute the processing plan you create a `KafkaStreams` >>>>>>>>> instance >>>>>>>>>>>> from `KStreamBuilder` >>>>>>>>>>>> (where the builder will instantiate the topology = >>>>> processing >>>>>>>>> plan >>>>>>>>>>> to >>>>>>>>>>>> be executed), and then >>>>>>>>>>>> call `KafkaStreams#start()`. Think of `KafkaStreams` as >>>> our >>>>>>>>>>> runner. >>>>>>>>>>>> >>>>>>>>>>>> First, I agree with the sentiment that the current name of >>>>>>>>>>> `KStreamBuilder` >>>>>>>>>>>> isn't great (which is why we're having this discussion). Also, >>>>> that >>>>>>>>>>>> finding a good name is tricky. ;-) >>>>>>>>>>>> >>>>>>>>>>>> Second, even though I agree with many of Jay's points I'm not >>>> sure >>>>>>>>>>> whether >>>>>>>>>>>> I like the `StreamsBuilder` suggestion (i.e. any name that does >>>>> not >>>>>>>>>>> include >>>>>>>>>>>> "topology" or a similar term) that much more. It still doesn't >>>>>>>>> describe >>>>>>>>>>>> what that class actually does, and what the difference to >>>>>>>>> `KafkaStreams` >>>>>>>>>>>> is. IMHO, the point of `KStreamBuilder` is that it lets you >>>>> build a >>>>>>>>>>>> logical plan (what we call "topology"), and `KafkaStreams` is >>>> the >>>>>>>> thing >>>>>>>>>>>> that executes that plan. I'm not yet convinced that abstracting >>>>>>>> these >>>>>>>>>>> two >>>>>>>>>>>> points away from the user is a good idea if the argument is that >>>>>> it's >>>>>>>>>>>> potentially confusing to beginners (a claim which I am not sure >>>> is >>>>>>>>>>> actually >>>>>>>>>>>> true). >>>>>>>>>>>> >>>>>>>>>>>> That said, if we rather favor "good-sounding but perhaps less >>>>>>>>> technically >>>>>>>>>>>> correct names", I'd argue we should not even use something like >>>>>>>>>>> "Builder". >>>>>>>>>>>> We could, for example, also pick the following names: >>>>>>>>>>>> >>>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the >>>>>>>> logical >>>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and >>>>>>>>>>>> `KafkaStreams.table("input-topic")`. >>>>>>>>>>>> - KafkaStreamsRunner as the new name for the executioner of the >>>>>> plan, >>>>>>>>>>> with >>>>>>>>>>>> `KafkaStreamsRunner(KafkaStreams).run()`. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian < >>>>>>>> r...@confluent.io> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> StreamsBuilder would be my vote. >>>>>>>>>>>>> >>>>>>>>>>>>>> On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io> >>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hey Matthias, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Make sense, I'm more advocating for removing the word topology >>>>>> than >>>>>>>>> any >>>>>>>>>>>>>> particular new replacement. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Jay >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax < >>>>>>>>>>> matth...@confluent.io >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jay, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> thanks for your feedback >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What if instead we called it KStreamsBuilder? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That's the current name and I personally think it's not the >>>>> best >>>>>>>>> one. >>>>>>>>>>>>>>> The main reason why I don't like KStreamsBuilder is, that we >>>>> have >>>>>>>>> the >>>>>>>>>>>>>>> concepts of KStreams and KTables, and the builder creates >>>> both. >>>>>>>>>>> However, >>>>>>>>>>>>>>> the name puts he focus on KStream and devalues KTable. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I understand your argument, and I am personally open the >>>> remove >>>>>>>> the >>>>>>>>>>>>>>> "Topology" part, and name it "StreamsBuilder". Not sure what >>>>>>>> others >>>>>>>>>>>>>>> think about this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> About Processor API: I like the idea in general, but I thinks >>>>>> it's >>>>>>>>> out >>>>>>>>>>>>>>> of scope for this KIP. KIP-120 has the focus on removing >>>>> leaking >>>>>>>>>>>>>>> internal APIs and do some cleanup how our API reflects some >>>>>>>>> concepts. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> However, I added your idea to API discussion Wiki page and we >>>>>> take >>>>>>>>> if >>>>>>>>>>>>>>> from there: >>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/ >>>>>>>>>>>>>>> Kafka+Streams+Discussions >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 3/13/17 11:52 AM, Jay Kreps wrote: >>>>>>>>>>>>>>>> Two things: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1. This is a minor thing but the proposed new name for >>>>>>>>>>> KStreamBuilder >>>>>>>>>>>>>>>> is StreamsTopologyBuilder. I actually think we should not >>>>> put >>>>>>>>>>>>>>> topology in >>>>>>>>>>>>>>>> the name as topology is not a concept you need to >>>> understand >>>>>> at >>>>>>>>> the >>>>>>>>>>>>>>>> kstreams layer right now. I'd think of three categories of >>>>>>>>>>> concepts: >>>>>>>>>>>>>>> (1) >>>>>>>>>>>>>>>> concepts you need to understand to get going even for a >>>>> simple >>>>>>>>>>>>>>> example, (2) >>>>>>>>>>>>>>>> concepts you need to understand to operate and debug a >>>> real >>>>>>>>>>>>>>> production app, >>>>>>>>>>>>>>>> (3) concepts we truly abstract and you don't need to ever >>>>>>>>>>> understand. >>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>> think in the kstream layer topologies are currently >>>> category >>>>>>>> (2), >>>>>>>>>>> and >>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>> is where they belong. By introducing the name in even the >>>>>>>>> simplest >>>>>>>>>>>>>>> example >>>>>>>>>>>>>>>> it means the user has to go read about toplogies to really >>>>>>>>>>> understand >>>>>>>>>>>>>>> even >>>>>>>>>>>>>>>> this simple snippet. What if instead we called it >>>>>>>>> KStreamsBuilder? >>>>>>>>>>>>>>>> 2. For the processor api, I think this api is mostly not >>>> for >>>>>>>> end >>>>>>>>>>>>>>> users. >>>>>>>>>>>>>>>> However this are a couple cases where it might make sense >>>> to >>>>>>>>> expose >>>>>>>>>>>>>>> it. I >>>>>>>>>>>>>>>> think users coming from Samza, or JMS's MessageListener ( >>>>>>>>>>>>>>>> https://docs.oracle.com/javaee/7/api/javax/jms/ >>>>>>>>>>> MessageListener.html) >>>>>>>>>>>>>>>> understand a simple callback interface for message >>>>> processing. >>>>>>>> In >>>>>>>>>>>>>>> fact, >>>>>>>>>>>>>>>> people often ask why Kafka's consumer doesn't provide such >>>>> an >>>>>>>>>>>>>>> interface. >>>>>>>>>>>>>>>> I'd argue we do, it's KafkaStreams. The only issue is that >>>>> the >>>>>>>>>>>>>>> processor >>>>>>>>>>>>>>>> API documentation is a bit scary for a person implementing >>>>>> this >>>>>>>>>>> type >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> api. My observation is that people using this style of API >>>>>>>> don't >>>>>>>>>>> do a >>>>>>>>>>>>>>> lot >>>>>>>>>>>>>>>> of cross-message operations, then just do single message >>>>>>>>> operations >>>>>>>>>>>>>>> and use >>>>>>>>>>>>>>>> a database for anything that spans messages. They also >>>> don't >>>>>>>>> factor >>>>>>>>>>>>>>> their >>>>>>>>>>>>>>>> code into many MessageListeners and compose them, they >>>> just >>>>>>>> have >>>>>>>>>>> one >>>>>>>>>>>>>>>> listener that has the complete handling logic. Say I am a >>>>> user >>>>>>>>> who >>>>>>>>>>>>>>> wants to >>>>>>>>>>>>>>>> implement a single Processor in this style. Do we have an >>>>> easy >>>>>>>>> way >>>>>>>>>>> to >>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>> that today (either with the .transform/.process methods in >>>>>>>>> kstreams >>>>>>>>>>>>>>> or with >>>>>>>>>>>>>>>> the topology apis)? Is there anything we can do in the way >>>>> of >>>>>>>>>>> trivial >>>>>>>>>>>>>>>> helper code to make this better? Also, how can we explain >>>>> that >>>>>>>>>>>>>>> pattern to >>>>>>>>>>>>>>>> people? I think currently we have pretty in-depth docs on >>>>> our >>>>>>>>> apis >>>>>>>>>>>>>>> but I >>>>>>>>>>>>>>>> suspect a person trying to figure out how to implement a >>>>>> simple >>>>>>>>>>>>>>> callback >>>>>>>>>>>>>>>> might get a bit lost trying to figure out how to wire it >>>>> up. A >>>>>>>>>>> simple >>>>>>>>>>>>>>> five >>>>>>>>>>>>>>>> line example in the docs would probably help a lot. Not >>>> sure >>>>>> if >>>>>>>>>>> this >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> best addressed in this KIP or is a side comment. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Jay >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax < >>>>>>>>>>> matth...@confluent.io >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I did prepare a KIP to do some cleanup some of Kafka's >>>>>> Streaming >>>>>>>>>>> API. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please have a look here: >>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>>>>>>>>>>> 120%3A+Cleanup+Kafka+Streams+builder+API >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Looking forward to your feedback! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> -- Guozhang >>>> >>> >> >
signature.asc
Description: OpenPGP digital signature