Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Matthias J. Sax Tue, 28 Mar 2017 19:02:18 -0700

With regard to KIP-130:

Form KIP-130 thread:


> About subtopologies and tasks. We do have the concept of subtopologies 
> already in KIP-120. It's only missing and ID that allow to link a subtopology 
> to a task.
> 
> IMHO, adding a simple variable to `Subtopoloy` that provide the id should be 
> sufficient. We can simply document in the JavaDocs how Subtopology and 
> TaskMetadata can be linked to each other.

I updated KIP-120 to include one for field for this.


-Matthias


On 3/27/17 4:27 PM, Matthias J. Sax wrote:
> Hi,
> 
> I would like to trigger this discussion again. It seems that the naming
> question is rather subjective and both main alternatives (w/ or w/o the
> word "Topology" in the name) have pros/cons.
> 
> If you have any further thought, please share it. At the moment I still
> propose `StreamsBuilder` in the KIP.
> 
> I also want do point out, that the VOTE thread was already started. So
> if you like the current KIP, please cast your vote there.
> 
> 
> Thanks a lot!
> 
> 
> -Matthias
> 
> 
> On 3/23/17 3:38 PM, Matthias J. Sax wrote:
>> Jay,
>>
>> about the naming schema:
>>
>>>>    1. "kstreams" - the DSL
>>>>    2. "processor api" - the lower level callback/topology api
>>>>    3. KStream/KTable - entities in the kstreams dsl
>>>>    4. "Kafka Streams" - General name for stream processing stuff in Kafka,
>>>>    including both kstreams and the processor API plus the underlying
>>>>    implementation.
>>
>> It think this terminology has some issues... To me, `kstreams` was
>> always not more than an abbreviation for `Kafka Streams` -- thus (1) and
>> (4) kinda collide here. Following questions on the mailing list etc I
>> often see people using kstreams or kstream exactly a abbr. for "Kafka
>> Streams"
>>
>>> I think referring to the dsl as "kstreams" is cute and pneumonic and not
>>> particularly confusing.
>>
>> I disagree here. It's a very subtle difference between `kstreams` and
>> `KStream` -- just singular/plural, thus (1) and (3) also "collide" --
>> it's just too close to each other.
>>
>> Thus, I really think it's a good idea to get a new name for the DSL to
>> get a better separation of the 4 concepts.
>>
>> Furthermore, we use the term "Streams API". Thus, I think
>> `StreamsBuilder` (or `StreamsTopologyBuilder`) are both very good names.
>>
>>
>> Thus, I prefer to keep the KIP as is (suggesting `StreamsBuilder`).
>>
>> I will start a VOTE thread. Of course, we can still discuss the naming
>> issue. :)
>>
>>
>>
>> -Matthias
>>
>>
>> On 3/22/17 8:53 PM, Jay Kreps wrote:
>>> I don't feel strongly on this, so I'm happy with whatever everyone else
>>> wants.
>>>
>>> Michael, I'm not arguing that people don't need to understand topologies, I
>>> just think it is like rocks db, you need to understand it when
>>> debugging/operating but not in the initial coding since the metaphor we're
>>> providing at this layer isn't a topology of processors but rather something
>>> like the collections api. Anyhow it won't hurt people to have it there.
>>>
>>> For the original KStreamBuilder thing, I think that came from the naming we
>>> discussed originally:
>>>
>>>    1. "kstreams" - the DSL
>>>    2. "processor api" - the lower level callback/topology api
>>>    3. KStream/KTable - entities in the kstreams dsl
>>>    4. "Kafka Streams" - General name for stream processing stuff in Kafka,
>>>    including both kstreams and the processor API plus the underlying
>>>    implementation.
>>>
>>> I think referring to the dsl as "kstreams" is cute and pneumonic and not
>>> particularly confusing. Just like referring to the "java collections
>>> library" isn't confusing even though it contains the Iterator interface
>>> which is not actually itself a collection.
>>>
>>> So I think KStreamBuilder should technically have been KstreamsBuilder and
>>> is intended not to be a builder of a KStream but rather the builder for the
>>> kstreams DSL. Okay, yes, that *is* slightly confusing. :-)
>>>
>>> -Jay
>>>
>>> On Wed, Mar 22, 2017 at 11:25 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>>>
>>>> Regarding the naming of `StreamsTopologyBuilder` v.s. `StreamsBuilder` that
>>>> are going to be used in DSL, I agree both has their arguments:
>>>>
>>>> 1. On one side, people using the DSL layer probably do not need to be aware
>>>> (or rather, "learn about") of the "topology" concept, although this concept
>>>> is a publicly exposed one in Kafka Streams.
>>>>
>>>> 2. On the other side, StreamsBuilder#build() returning a Topology object
>>>> sounds a little weird, at least to me (admittedly subjective matter).
>>>>
>>>>
>>>> Since the second bullet point seems to be more "subjective" and many people
>>>> are not worried about it, I'm OK to go with the other option.
>>>>
>>>>
>>>> Guozhang
>>>>
>>>>
>>>> On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io>
>>>> wrote:
>>>>
>>>>> Forwarding to kafka-user.
>>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Michael Noll <mich...@confluent.io>
>>>>> Date: Wed, Mar 22, 2017 at 8:48 AM
>>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
>>>>> To: d...@kafka.apache.org
>>>>>
>>>>>
>>>>> Matthias,
>>>>>
>>>>>> @Michael:
>>>>>>
>>>>>> You seemed to agree with Jay about not exposing the `Topology` concept
>>>>>> in our main entry class (ie, current KStreamBuilder), thus, I
>>>>>> interpreted that you do not want `Topology` in the name either (I am a
>>>>>> little surprised by your last response, that goes the opposite
>>>>> direction).
>>>>>
>>>>> Oh, sorry for not being clear.
>>>>>
>>>>> What I wanted to say in my earlier email was the following:  Yes, I do
>>>>> agree with most of Jay's reasoning, notably about carefully deciding how
>>>>> much and which parts of the API/concept "surface" we expose to users of
>>>> the
>>>>> DSL.  However, and this is perhaps where I wasn't very clear, I disagree
>>>> on
>>>>> the particular opinion about not exposing the topology concept to DSL
>>>>> users.  Instead, I think the concept of a topology is important to
>>>>> understand even for DSL users -- particularly because of the way the DSL
>>>> is
>>>>> currently wiring your processing logic via the builder pattern.  (As I
>>>>> noted, e.g. Akka uses a different approach where you might be able to get
>>>>> away with not exposing the "topology" concept, but even in Akka there's
>>>> the
>>>>> notion of graphs and flows.)
>>>>>
>>>>>
>>>>>>>     StreamsBuilder builder = new StreamsBuilder();
>>>>>>>
>>>>>>>     // And here you'd define your...well, what actually?
>>>>>>>     // Ah right, you are composing a topology here, though you are
>>>> not
>>>>>>> aware of it.
>>>>>>
>>>>>> Yes. You are not aware of if -- that's the whole point about it --
>>>> don't
>>>>>> put the Topology concept in the focus...
>>>>>
>>>>> Let me turn this around, because that was my point: it's confusing to
>>>> have
>>>>> a name "StreamsBuilder" if that thing isn't building streams, and it is
>>>>> not.
>>>>>
>>>>> As I mentioned before, I do think it is a benefit to make it clear to DSL
>>>>> users that there are two aspects at play: (1) defining the logic/plan of
>>>>> your processing, and (2) the execution of that plan.  I have a less
>>>> strong
>>>>> opinion whether or not having "topology" in the names would help to
>>>>> communicate this separation as well as combination of (1) and (2) to make
>>>>> your app work as expected.
>>>>>
>>>>> If we stick with `KafkaStreams` for (2) *and* don't like having
>>>> "topology"
>>>>> in the name, then perhaps we should rename `KStreamBuilder` to
>>>>> `KafkaStreamsBuilder`.  That at least gives some illusion of a combo of
>>>> (1)
>>>>> and (2).  IMHO, `KafkaStreamsBuilder` highlights better that "it is a
>>>>> builder/helper for the Kafka Streams API", rather than "a builder for
>>>>> streams".
>>>>>
>>>>> Also, I think some of the naming challenges we're discussing here are
>>>>> caused by having this builder pattern in the first place.  If the Streams
>>>>> API was implemented in Scala, for example, we could use implicits for
>>>>> helping us to "stitch streams/tables together to build the full
>>>> topology",
>>>>> thus using a different (better?) approach to composing your topologies
>>>> that
>>>>> through a builder pattern.  So: perhaps there's a better way then the
>>>>> builder, and that way would also be clearer on terminology?  That said,
>>>>> this might take this KIP off-scope.
>>>>>
>>>>> -Michael
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 22, 2017 at 12:33 AM, Matthias J. Sax <matth...@confluent.io
>>>>>
>>>>> wrote:
>>>>>
>>>>>> @Guozhang:
>>>>>>
>>>>>> I recognized that you want to have `Topology` in the name. But it seems
>>>>>> that more people preferred to not have it (Jay, Ram, Michael [?],
>>>>> myself).
>>>>>>
>>>>>> @Michael:
>>>>>>
>>>>>> You seemed to agree with Jay about not exposing the `Topology` concept
>>>>>> in our main entry class (ie, current KStreamBuilder), thus, I
>>>>>> interpreted that you do not want `Topology` in the name either (I am a
>>>>>> little surprised by your last response, that goes the opposite
>>>>> direction).
>>>>>>
>>>>>>>     StreamsBuilder builder = new StreamsBuilder();
>>>>>>>
>>>>>>>     // And here you'd define your...well, what actually?
>>>>>>>     // Ah right, you are composing a topology here, though you are
>>>> not
>>>>>>> aware of it.
>>>>>>
>>>>>> Yes. You are not aware of if -- that's the whole point about it --
>>>> don't
>>>>>> put the Topology concept in the focus...
>>>>>>
>>>>>> Furthermore,
>>>>>>
>>>>>>>>> So what are you building here with StreamsBuilder?  Streams (hint:
>>>>> No)?
>>>>>>>>> And what about tables -- is there a TableBuilder (hint: No)?
>>>>>>
>>>>>> I am not sure, if this is too much a concern. In contrast to
>>>>>> `KStreamBuilder` (singular) that contains `KStream` and thus puts
>>>>>> KStream concept in focus and thus degrade `KTable`, `StreamsBuilder`
>>>>>> (plural) focuses on "Streams API". IMHO, it does not put focus on
>>>>>> KStream. It's just a builder from the Streams API -- you don't need to
>>>>>> worry what you are building -- and you don't need to think about the
>>>>>> `Topology` concept (of course, you see that .build() return a
>>>> Topology).
>>>>>>
>>>>>>
>>>>>> Personally, I see pros and cons for both `StreamsBuilder` and
>>>>>> `StreamsTopologyBuilder` and thus, I am fine either way. Maybe Jay and
>>>>>> Ram can follow up and share their thoughts?
>>>>>>
>>>>>> I would also help a lot if other people put their vote for a name, too.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 3/21/17 2:11 PM, Guozhang Wang wrote:
>>>>>>> Just to clarify, I did want to have the term `Topology` as part of
>>>> the
>>>>>>> class name, for the reasons above. I'm not too worried about to be
>>>>>>> consistent with the previous names, but I feel the
>>>> `XXTopologyBuilder`
>>>>> is
>>>>>>> better than `XXStreamsBuilder` since it's build() function returns a
>>>>>>> Topology object.
>>>>>>>
>>>>>>>
>>>>>>> Guozhang
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 20, 2017 at 12:53 PM, Michael Noll <mich...@confluent.io
>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>>> Hmm, I must admit I don't like this last update all too much.
>>>>>>>>
>>>>>>>> Basically we would have:
>>>>>>>>
>>>>>>>>     StreamsBuilder builder = new StreamsBuilder();
>>>>>>>>
>>>>>>>>     // And here you'd define your...well, what actually?
>>>>>>>>     // Ah right, you are composing a topology here, though you are
>>>> not
>>>>>>>> aware of it.
>>>>>>>>
>>>>>>>>     KafkaStreams streams = new KafkaStreams(builder.build(),
>>>>>>>> streamsConfiguration);
>>>>>>>>
>>>>>>>> So what are you building here with StreamsBuilder?  Streams (hint:
>>>>> No)?
>>>>>>>> And what about tables -- is there a TableBuilder (hint: No)?
>>>>>>>>
>>>>>>>> I also interpret Guozhang's last response as that he'd prefer to
>>>> have
>>>>>>>> "Topology" in the class/interface names.  I am aware that we
>>>> shouldn't
>>>>>>>> necessarily use the status quo to make decisions about future
>>>> changes,
>>>>>> but
>>>>>>>> the very first concept we explain in the Kafka Streams documentation
>>>>> is
>>>>>>>> "Stream Processing Topology":
>>>>>>>> https://kafka.apache.org/0102/documentation/streams#streams_
>>>> concepts
>>>>>>>>
>>>>>>>> -Michael
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 20, 2017 at 7:55 PM, Matthias J. Sax <
>>>>> matth...@confluent.io
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> \cc users list
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
>>>>>>>>> Date: Mon, 20 Mar 2017 11:51:01 -0700
>>>>>>>>> From: Matthias J. Sax <matth...@confluent.io>
>>>>>>>>> Organization: Confluent Inc
>>>>>>>>> To: d...@kafka.apache.org
>>>>>>>>>
>>>>>>>>> I want to push this discussion further.
>>>>>>>>>
>>>>>>>>> Guozhang's argument about "exposing" the Topology class is valid.
>>>>> It's
>>>>>> a
>>>>>>>>> public class anyway, so it's not as issue. However, I think the
>>>>>> question
>>>>>>>>> is not too much about exposing but about "advertising" (ie, putting
>>>>> it
>>>>>>>>> into the focus) or not at DSL level.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If I interpret the last replies correctly, it seems that we could
>>>>> agree
>>>>>>>>> on "StreamsBuilder" as name. I did update the KIP accordingly.
>>>> Please
>>>>>>>>> correct me, if I got this wrong.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If there are not other objects -- this naming discussion was the
>>>> last
>>>>>>>>> open point to far -- I would like the start the VOTE thread.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/14/17 2:37 PM, Guozhang Wang wrote:
>>>>>>>>>> I'd like to keep the term "Topology" inside the builder class
>>>> since,
>>>>>> as
>>>>>>>>>> Matthias mentioned, this builder#build() function returns a
>>>>> "Topology"
>>>>>>>>>> object, whose type is a public class anyways. Although you can
>>>> argue
>>>>>> to
>>>>>>>>> let
>>>>>>>>>> users always call
>>>>>>>>>>
>>>>>>>>>> "new KafkaStreams(builder.build())"
>>>>>>>>>>
>>>>>>>>>> I think it is still more benefit to expose this concept.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Guozhang
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax <
>>>>>>>> matth...@confluent.io
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for your input Michael.
>>>>>>>>>>>
>>>>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the
>>>>>>>>> logical
>>>>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
>>>>>>>>>>>>> `KafkaStreams.table("input-topic")`.
>>>>>>>>>>>
>>>>>>>>>>> I don't thinks this is a good idea, for multiple reasons:
>>>>>>>>>>>
>>>>>>>>>>> (1) We would reuse a name for a completely different purpose. The
>>>>>> same
>>>>>>>>>>> argument for not renaming KStreamBuilder to TopologyBuilder. The
>>>>>>>>>>> confusion would just be too large.
>>>>>>>>>>>
>>>>>>>>>>> So if we would start from scratch, it might be ok to do so, but
>>>> now
>>>>>> we
>>>>>>>>>>> cannot make this move, IMHO.
>>>>>>>>>>>
>>>>>>>>>>> Also a clarification question: do you suggest to have static
>>>>> methods
>>>>>>>>>>> #stream and #table -- I am not sure if this would work?
>>>>>>>>>>> (or was you code snippet just simplification?)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> (2) Kafka Streams is basically a "processing client" next to
>>>>> consumer
>>>>>>>>>>> and producer client. Thus, the name KafkaStreams aligns to the
>>>>> naming
>>>>>>>>>>> schema of KafkaConsumer and KafkaProducer. I am not sure if it
>>>>> would
>>>>>>>> be
>>>>>>>>>>> a good choice to "break" this naming scheme.
>>>>>>>>>>>
>>>>>>>>>>> Btw: this is also the reason, why we have KafkaStreams#close() --
>>>>> and
>>>>>>>>>>> not KafkaStreams#stop() -- because #close() aligns with consumer
>>>>> and
>>>>>>>>>>> producer client.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> (3) On more argument against using KafkaStreams as DSL entry
>>>> class
>>>>>>>> would
>>>>>>>>>>> be, that it would need to create a Topology that can be given to
>>>>> the
>>>>>>>>>>> "runner/processing-client". Thus the pattern would be
>>>>>>>>>>>
>>>>>>>>>>>> Topology topology = streams.build();
>>>>>>>>>>>> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology)
>>>>>>>>>>>
>>>>>>>>>>> (or of course as a one liner).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On the other hand, there was the idea (that we intentionally
>>>>> excluded
>>>>>>>>>>> from the KIP), to change the "client instantiation" pattern.
>>>>>>>>>>>
>>>>>>>>>>> Right now, a new client in actively instantiated (ie, by calling
>>>>>>>> "new")
>>>>>>>>>>> and the topology if provided as a constructor argument. However,
>>>>>>>>>>> especially for DSL (not sure if it would make sense for PAPI),
>>>> the
>>>>>> DSL
>>>>>>>>>>> builder could create the client for the user.
>>>>>>>>>>>
>>>>>>>>>>> Something like this:
>>>>>>>>>>>
>>>>>>>>>>>> KStreamBuilder builder = new KStreamBuilder();
>>>>>>>>>>>> builder.whatever() // use the builder
>>>>>>>>>>>>
>>>>>>>>>>>> StreamsConfig config = ....
>>>>>>>>>>>> KafkaStreams streams = builder.getKafkaStreams(config);
>>>>>>>>>>>
>>>>>>>>>>> If we change the patter like this, the notion a the "DSL builder"
>>>>>>>> would
>>>>>>>>>>> change, as it does not create a topology anymore, but it creates
>>>>> the
>>>>>>>>>>> "processing client". This would address Jay's concern about "not
>>>>>>>>>>> exposing concept users don't need the understand" and would not
>>>>>>>> require
>>>>>>>>>>> to include the word "Topology" in the DSL builder class name,
>>>>> because
>>>>>>>>>>> the builder does not build a Topology anymore.
>>>>>>>>>>>
>>>>>>>>>>> I just put some names that came to my mind first hand -- did not
>>>>>> think
>>>>>>>>>>> about good names. It's just to discuss the pattern.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3/14/17 3:36 AM, Michael Noll wrote:
>>>>>>>>>>>> I see Jay's point, and I agree with much of it -- notably about
>>>>>> being
>>>>>>>>>>>> careful which concepts we do and do not expose, depending on
>>>> which
>>>>>>>> user
>>>>>>>>>>>> group / user type is affected.  That said, I'm not sure yet
>>>>> whether
>>>>>>>> or
>>>>>>>>>>> not
>>>>>>>>>>>> we should get rid of "Topology" (or a similar term) in the DSL.
>>>>>>>>>>>>
>>>>>>>>>>>> For what it's worth, here's how related technologies define/name
>>>>>>>> their
>>>>>>>>>>>> "topologies" and "builders".  Note that, in all cases, it's
>>>> about
>>>>>>>>>>>> constructing a logical processing plan, which then is being
>>>>>>>>> executed/run.
>>>>>>>>>>>>
>>>>>>>>>>>> - `Pipeline` (Google Dataflow/Apache Beam)
>>>>>>>>>>>>     - To add a source you first instantiate the Source (e.g.
>>>>>>>>>>>> `TextIO.Read.from("gs://some/inputData.txt")`),
>>>>>>>>>>>>       then attach it to your processing plan via
>>>>>>>>>>> `Pipeline#apply(<source>)`.
>>>>>>>>>>>>       This setup is a bit different to our DSL because in our
>>>> DSL
>>>>>> the
>>>>>>>>>>>> builder does both, i.e.
>>>>>>>>>>>>       instantiating + auto-attaching to itself.
>>>>>>>>>>>>     - To execute the processing plan you call
>>>>> `Pipeline#execute()`.
>>>>>>>>>>>> - `StreamingContext`` (Spark): This setup is similar to our DSL.
>>>>>>>>>>>>     - To add a source you call e.g.
>>>>>>>>>>>> `StreamingContext#socketTextStream("localhost", 9999)`.
>>>>>>>>>>>>     - To execute the processing plan you call
>>>>>>>>>>> `StreamingContext#execute()`.
>>>>>>>>>>>> - `StreamExecutionEnvironment` (Flink): This setup is similar to
>>>>> our
>>>>>>>>> DSL.
>>>>>>>>>>>>     - To add a source you call e.g.
>>>>>>>>>>>> `StreamExecutionEnvironment#socketTextStream("localhost",
>>>> 9999)`.
>>>>>>>>>>>>     - To execute the processing plan you call
>>>>>>>>>>>> `StreamExecutionEnvironment#execute()`.
>>>>>>>>>>>> - `Graph`/`Flow` (Akka Streams), as a result of composing
>>>> Sources
>>>>> (~
>>>>>>>>>>>> `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`)
>>>>>>>>>>>>   into Flows, which are [Runnable]Graphs.
>>>>>>>>>>>>     - You instantiate a Source directly, and then compose the
>>>>> Source
>>>>>>>>> with
>>>>>>>>>>>> Sinks to create a RunnableGraph:
>>>>>>>>>>>>       see signature `Source#to[Mat2](sink: Graph[SinkShape[Out],
>>>>>>>>> Mat2]):
>>>>>>>>>>>> RunnableGraph[Mat]`.
>>>>>>>>>>>>     - To execute the processing plan you call `Flow#run()`.
>>>>>>>>>>>>
>>>>>>>>>>>> In our DSL, in comparison, we do:
>>>>>>>>>>>>
>>>>>>>>>>>> - `KStreamBuilder` (Kafka Streams API)
>>>>>>>>>>>>     - To add a source you call e.g.
>>>> `KStreamBuilder#stream("input-
>>>>>>>>>>> topic")`.
>>>>>>>>>>>>     - To execute the processing plan you create a `KafkaStreams`
>>>>>>>>> instance
>>>>>>>>>>>> from `KStreamBuilder`
>>>>>>>>>>>>       (where the builder will instantiate the topology =
>>>>> processing
>>>>>>>>> plan
>>>>>>>>>>> to
>>>>>>>>>>>> be executed), and then
>>>>>>>>>>>>       call `KafkaStreams#start()`.  Think of `KafkaStreams` as
>>>> our
>>>>>>>>>>> runner.
>>>>>>>>>>>>
>>>>>>>>>>>> First, I agree with the sentiment that the current name of
>>>>>>>>>>> `KStreamBuilder`
>>>>>>>>>>>> isn't great (which is why we're having this discussion).  Also,
>>>>> that
>>>>>>>>>>>> finding a good name is tricky. ;-)
>>>>>>>>>>>>
>>>>>>>>>>>> Second, even though I agree with many of Jay's points I'm not
>>>> sure
>>>>>>>>>>> whether
>>>>>>>>>>>> I like the `StreamsBuilder` suggestion (i.e. any name that does
>>>>> not
>>>>>>>>>>> include
>>>>>>>>>>>> "topology" or a similar term) that much more.  It still doesn't
>>>>>>>>> describe
>>>>>>>>>>>> what that class actually does, and what the difference to
>>>>>>>>> `KafkaStreams`
>>>>>>>>>>>> is.  IMHO, the point of `KStreamBuilder` is that it lets you
>>>>> build a
>>>>>>>>>>>> logical plan (what we call "topology"), and `KafkaStreams` is
>>>> the
>>>>>>>> thing
>>>>>>>>>>>> that executes that plan.  I'm not yet convinced that abstracting
>>>>>>>> these
>>>>>>>>>>> two
>>>>>>>>>>>> points away from the user is a good idea if the argument is that
>>>>>> it's
>>>>>>>>>>>> potentially confusing to beginners (a claim which I am not sure
>>>> is
>>>>>>>>>>> actually
>>>>>>>>>>>> true).
>>>>>>>>>>>>
>>>>>>>>>>>> That said, if we rather favor "good-sounding but perhaps less
>>>>>>>>> technically
>>>>>>>>>>>> correct names", I'd argue we should not even use something like
>>>>>>>>>>> "Builder".
>>>>>>>>>>>> We could, for example, also pick the following names:
>>>>>>>>>>>>
>>>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the
>>>>>>>> logical
>>>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
>>>>>>>>>>>> `KafkaStreams.table("input-topic")`.
>>>>>>>>>>>> - KafkaStreamsRunner as the new name for the executioner of the
>>>>>> plan,
>>>>>>>>>>> with
>>>>>>>>>>>> `KafkaStreamsRunner(KafkaStreams).run()`.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian <
>>>>>>>> r...@confluent.io>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> StreamsBuilder would be my vote.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io>
>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey Matthias,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Make sense, I'm more advocating for removing the word topology
>>>>>> than
>>>>>>>>> any
>>>>>>>>>>>>>> particular new replacement.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax <
>>>>>>>>>>> matth...@confluent.io
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jay,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> thanks for your feedback
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What if instead we called it KStreamsBuilder?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's the current name and I personally think it's not the
>>>>> best
>>>>>>>>> one.
>>>>>>>>>>>>>>> The main reason why I don't like KStreamsBuilder is, that we
>>>>> have
>>>>>>>>> the
>>>>>>>>>>>>>>> concepts of KStreams and KTables, and the builder creates
>>>> both.
>>>>>>>>>>> However,
>>>>>>>>>>>>>>> the name puts he focus on KStream and devalues KTable.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I understand your argument, and I am personally open the
>>>> remove
>>>>>>>> the
>>>>>>>>>>>>>>> "Topology" part, and name it "StreamsBuilder". Not sure what
>>>>>>>> others
>>>>>>>>>>>>>>> think about this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> About Processor API: I like the idea in general, but I thinks
>>>>>> it's
>>>>>>>>> out
>>>>>>>>>>>>>>> of scope for this KIP. KIP-120 has the focus on removing
>>>>> leaking
>>>>>>>>>>>>>>> internal APIs and do some cleanup how our API reflects some
>>>>>>>>> concepts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, I added your idea to API discussion Wiki page and we
>>>>>> take
>>>>>>>>> if
>>>>>>>>>>>>>>> from there:
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>>>>>>>>>>>>> Kafka+Streams+Discussions
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 3/13/17 11:52 AM, Jay Kreps wrote:
>>>>>>>>>>>>>>>> Two things:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   1. This is a minor thing but the proposed new name for
>>>>>>>>>>> KStreamBuilder
>>>>>>>>>>>>>>>>   is StreamsTopologyBuilder. I actually think we should not
>>>>> put
>>>>>>>>>>>>>>> topology in
>>>>>>>>>>>>>>>>   the name as topology is not a concept you need to
>>>> understand
>>>>>> at
>>>>>>>>> the
>>>>>>>>>>>>>>>>   kstreams layer right now. I'd think of three categories of
>>>>>>>>>>> concepts:
>>>>>>>>>>>>>>> (1)
>>>>>>>>>>>>>>>>   concepts you need to understand to get going even for a
>>>>> simple
>>>>>>>>>>>>>>> example, (2)
>>>>>>>>>>>>>>>>   concepts you need to understand to operate and debug a
>>>> real
>>>>>>>>>>>>>>> production app,
>>>>>>>>>>>>>>>>   (3) concepts we truly abstract and you don't need to ever
>>>>>>>>>>> understand.
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>   think in the kstream layer topologies are currently
>>>> category
>>>>>>>> (2),
>>>>>>>>>>> and
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>   is where they belong. By introducing the name in even the
>>>>>>>>> simplest
>>>>>>>>>>>>>>> example
>>>>>>>>>>>>>>>>   it means the user has to go read about toplogies to really
>>>>>>>>>>> understand
>>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>>   this simple snippet. What if instead we called it
>>>>>>>>> KStreamsBuilder?
>>>>>>>>>>>>>>>>   2. For the processor api, I think this api is mostly not
>>>> for
>>>>>>>> end
>>>>>>>>>>>>>>> users.
>>>>>>>>>>>>>>>>   However this are a couple cases where it might make sense
>>>> to
>>>>>>>>> expose
>>>>>>>>>>>>>>> it. I
>>>>>>>>>>>>>>>>   think users coming from Samza, or JMS's MessageListener (
>>>>>>>>>>>>>>>>   https://docs.oracle.com/javaee/7/api/javax/jms/
>>>>>>>>>>> MessageListener.html)
>>>>>>>>>>>>>>>>   understand a simple callback interface for message
>>>>> processing.
>>>>>>>> In
>>>>>>>>>>>>>>> fact,
>>>>>>>>>>>>>>>>   people often ask why Kafka's consumer doesn't provide such
>>>>> an
>>>>>>>>>>>>>>> interface.
>>>>>>>>>>>>>>>>   I'd argue we do, it's KafkaStreams. The only issue is that
>>>>> the
>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>   API documentation is a bit scary for a person implementing
>>>>>> this
>>>>>>>>>>> type
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>   api. My observation is that people using this style of API
>>>>>>>> don't
>>>>>>>>>>> do a
>>>>>>>>>>>>>>> lot
>>>>>>>>>>>>>>>>   of cross-message operations, then just do single message
>>>>>>>>> operations
>>>>>>>>>>>>>>> and use
>>>>>>>>>>>>>>>>   a database for anything that spans messages. They also
>>>> don't
>>>>>>>>> factor
>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>   code into many MessageListeners and compose them, they
>>>> just
>>>>>>>> have
>>>>>>>>>>> one
>>>>>>>>>>>>>>>>   listener that has the complete handling logic. Say I am a
>>>>> user
>>>>>>>>> who
>>>>>>>>>>>>>>> wants to
>>>>>>>>>>>>>>>>   implement a single Processor in this style. Do we have an
>>>>> easy
>>>>>>>>> way
>>>>>>>>>>> to
>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>   that today (either with the .transform/.process methods in
>>>>>>>>> kstreams
>>>>>>>>>>>>>>> or with
>>>>>>>>>>>>>>>>   the topology apis)? Is there anything we can do in the way
>>>>> of
>>>>>>>>>>> trivial
>>>>>>>>>>>>>>>>   helper code to make this better? Also, how can we explain
>>>>> that
>>>>>>>>>>>>>>> pattern to
>>>>>>>>>>>>>>>>   people? I think currently we have pretty in-depth docs on
>>>>> our
>>>>>>>>> apis
>>>>>>>>>>>>>>> but I
>>>>>>>>>>>>>>>>   suspect a person trying to figure out how to implement a
>>>>>> simple
>>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>>>>   might get a bit lost trying to figure out how to wire it
>>>>> up. A
>>>>>>>>>>> simple
>>>>>>>>>>>>>>> five
>>>>>>>>>>>>>>>>   line example in the docs would probably help a lot. Not
>>>> sure
>>>>>> if
>>>>>>>>>>> this
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>   best addressed in this KIP or is a side comment.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax <
>>>>>>>>>>> matth...@confluent.io
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I did prepare a KIP to do some cleanup some of Kafka's
>>>>>> Streaming
>>>>>>>>>>> API.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please have a look here:
>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>> 120%3A+Cleanup+Kafka+Streams+builder+API
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Looking forward to your feedback!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -- Guozhang
>>>>
>>>
>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Reply via email to