Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Matthias J. Sax Sat, 11 Mar 2017 11:19:23 -0800

Thanks for your feedback Guozhang.


1) There are multiple ways to do this. Let me know what you think about
all options:

(a) I updated the KIP to this:

>     public final class Source implements Node {
>         public final String name;
>         // topicNames and topicPattern are mutually exclusive, i.e., only one 
> will be not-null
>         public final List<String> topicNames; // null if #addSource(..., 
> Pattern) was used
>         public final Pattern topicPattern; // null if #addSource(..., 
> String...) was used
>     }

(b) We could also go with a single variable (as originally proposed).
This would have the advantage (compared to (a)), that null checks are
not required accessing TopologyDescription#Source class.

> String topics; // can be comma separated list of topic names or pattern (as 
> String)

However, with an encoded list or an encoded pattern it's required to
parse the string again, what we want to avoid in the first place.

(c) Use a single variable as in (b)

> String topics; // always a pattern (as String)

We translate a list of topic names into a pattern
"topic1|topic2|topic3". We loose the information if the source was added
via list or via pattern.



2) Your understanding is correct. Added a comment to the KIP.



3) I would keep KafkaStreams#toString() -- it's conceptually two
different things and runtime information is useful, too. But as its
return value is ambiguous to parse (and must be parsed in the first
place what is cumbersome), we could add KafkaStreams#describe() as

> public synchronized KafkaStreamsDescription describe();

KafkaStreamsDescription class would be similar to TopologyDescription to
allow programmatic access to runtime information. I guess we could even
reuse (parts of) TopologyDescription within KafkaStreamsDescription to
avoid code duplication.

If you think this would be useful, I can extend the KIP accordingly.



-Matthias




On 3/10/17 1:38 PM, Guozhang Wang wrote:
> One more question here:
> 
> 3. with TopologyDescription, do we still want to keep the
> `KafkaStream.toString()` function? I think it may still have some advantage
> such that it contains tasks information after `KafkaStream#start()` has
> been called, but much of it is duplicate with the TopologyDescription, and
> it is only in the form of the string hence hard to programmatically
> leverage. So would like to hear your thoughts.
> 
> Guozhang
> 
> On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang <[email protected]> wrote:
> 
>> Thanks Matthias, the updated KIP lgtm overall. A couple of minor comments:
>>
>> 1. With regard to this class:
>>
>>     public final class Source implements Node {
>>         public final String name;
>>         public final String topic; // can be topic name or pattern (as
>> String)
>>     }
>>
>> Note that the source node could contain more than a single topic, i.e. a
>> list of topics besides a pattern.
>>
>> 2. With regard to
>>
>>           public synchronized TopologyDescription describe();
>>
>> My understand is that whenever the topology is modified, one needs to call
>> this function again to get a new description object, as the old one won't
>> be updated automatically. Hence the usage pattern would be:
>>
>> TopologyDescription description = topology.describe();
>>
>> topology.addProcessor(...)
>>
>> description = topology.describe(); // have to call again
>>
>> -----------
>>
>> Is that right? If yes could you clarify this in the wiki?
>>
>>
>>
>> Guozhang
>>
>> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <[email protected]> wrote:
>>
>>> Thanks for the update, Matthias.
>>>
>>> +1 to the points 1,2,3,4 you mentioned.
>>>
>>> Naming is always a tricky subject, but renaming KStreamBuilder
>>> to StreamsTopologyBuilder looks ok to me (I would have had a slight
>>> preference towards DslTopologyBuilder, but hey.)  The most important
>>> aspect
>>> is, IMHO, what you also pointed out:  to make it clear that the current
>>> KStreamBuilder actually builds a topology (though currently the latter is
>>> actually called `TopologyBuilder` currently), and not a `KStream`.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> sorry for not replying earlier and thanks for all your feedback. After
>>>> some more discussions I updated the KIP. The new proposal puts some
>>>> other design considerations into account, that I want to highlight
>>>> shortly. Those considerations, automatically resolve the concerns
>>> raised.
>>>>
>>>> First some answers:
>>>>
>>>>> The PAPI processors I use in my KStreams app are all functioning on
>>>> KTable
>>>>> internals.  I wouldn't be able to convert them to
>>> process()/transform().
>>>>>
>>>>> What's the harm in permitting both APIs to be used in the same
>>>> application?
>>>>
>>>> It's not about "harm" but about design. We want to switch from a
>>>> "inheritance" to a "composition" pattern.
>>>>
>>>> About the interface idea: using a shared interface would not help to get
>>>> a composition pattern
>>>>
>>>>
>>>> Next I want to give the design considerations leading to the updated
>>> KIP:
>>>>
>>>> 1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
>>>> KafkaStreams client executes a `Topology` and this execution should be
>>>> independent of the way the topology is "put together", ie, low-level API
>>>> or DSL.
>>>>
>>>> 2) Thus, we don't want to have any changes to KafkaStreams class.
>>>>
>>>> 3) Thus, KStreamBuilder needs to have a method `build()` that returns a
>>>> `Topology` that can be passed into KafakStreams.
>>>>
>>>> 4) Because `KStreamBuilder` should build a `Topology` I suggest to
>>>> rename the new class to `StreamsTopologyBuilder` (the name
>>>> TopologyBuilder would actually be more natural, but would be easily
>>>> confused with old low-level API TopologyBuilder).
>>>>
>>>> Thus, PAPI and DSL can be mixed-and-matched with full power, as
>>>> StreamsTopologyBuilder return the created Topology via #build().
>>>>
>>>> I also removed `final` for both builder classes.
>>>>
>>>>
>>>>
>>>> With regard to the larger scope of the overal API redesign, I also want
>>>> to point to a summary of API issues:
>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>> Kafka+Streams+Discussions
>>>>
>>>> Thus, this KIP is only one building block of a larger improvement
>>>> effort, and we hope to get as much as possible done for 0.11. If you
>>>> have any API improvement ideas, please share them so we can come up with
>>>> an holistic sound design (instead of uncoordinated local improvements
>>>> that might diverge)
>>>>
>>>>
>>>>
>>>> Looking forward to your feedback on this KIP and the other API issues.
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>>
>>>> On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
>>>>> On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
>>> [email protected]>
>>>>> wrote:
>>>>>
>>>>>> - We also removed method #topologyBuilder() from KStreamBuilder
>>> because
>>>>>> we think #transform() should provide all functionality you need to
>>>>>> mix-an-match Processor API and DSL. If there is any further concern
>>>>>> about this, please let us know.
>>>>>>
>>>>>
>>>>> Hi Matthias,
>>>>>
>>>>> Yes, I'm sorry I didn't respond sooner, but I still have a lot of
>>>> concerns
>>>>> about this.  You're correct to point out that transform() can be used
>>> for
>>>>> some of the output situations I pointed out; albeit it seems somewhat
>>>>> awkward to do so in a "transform" method; what do you do with the
>>> retval?
>>>>>
>>>>> The PAPI processors I use in my KStreams app are all functioning on
>>>> KTable
>>>>> internals.  I wouldn't be able to convert them to
>>> process()/transform().
>>>>>
>>>>> What's the harm in permitting both APIs to be used in the same
>>>> application?
>>>>>
>>>>> Mathieu
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> -- Guozhang
>>
> 
> 
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Reply via email to