Some thoughts on the mixture usage of DSL / PAPI:

There were some suggestions on mixing the usage of DSL and PAPI:
https://issues.apache.org/jira/browse/KAFKA-3455, and after thinking it a
bit more carefully, I'd rather not recommend users following this pattern,
since in DSL this can always be achieved in process() / transform(). Hence
I think it is okay to prevent such patterns in the new APIs. And for the
same reasons, I think we can remove KStreamBuilder#newName() from the
public APIs.

About KStreamBuilder#addInternalTopic(): I admit that we can optimize the
built topology for cases like "table.groupBy(..).aggregate(fn1);
table.groupBy(/*same
groupBy key*/).aggregate(fn2);" we can reuse the same repartition topic,
and we have plans to apply query optimization to the building process of
the topology. For now I'd rather suggest using KStream#through() to reuse
the topic.

And about printing the topology for debuggability: I agrees this is a
potential drawback, and I'd suggest maintain some functionality to build a
"dry topology" as Mathieu suggested; the difficulty is that, internally we
need a different "copy" of the topology for each thread so that they will
not share any states, so we cannot directly pass in the topology into
KafkaStreams instead of the topology builder. So how about adding a
`ToplogyBuilder#toString` function which calls `build()` internally then
prints the built dry topology?


Guozhang


On Tue, Feb 7, 2017 at 6:32 AM, Mathieu Fenniak <
mathieu.fenn...@replicon.com> wrote:

> On Mon, Feb 6, 2017 at 2:35 PM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
> > - adding KStreamBuilder#topologyBuilder() seems like be a good idea to
> > address any concern with limited access to TopologyBuilder and DSL/PAPI
> > mix-and-match approach. However, we should try to cover as much as
> > possible with #process(), #transform() etc.
> >
>
> That sounds like it'll work for me.
>
>
> > - about TopologyBuilder.nodeGroups & TopologyBuilder.build: not sure
> > what analysis you do -- there is also KafkaStreams#toString() that
> > describes the topology/DAG of the job. @Mathieu: Could you use this for
> > your analysis?
> >
>
> Well, I'd like to be able to output a graphviz diagram of my processor
> topology.  I am aware of KafkaStreams#toString(), but, it isn't the format
> I want, if I remember correctly I found it was ambiguous to parse &
> transform, and it also has the limitation of requiring a running and
> processing application as toString() doesn't return anything useful until
> the consumer stream threads are running.
>
> What I've whipped up with the existing ProcessorTopology API (
> https://gist.github.com/mfenniak/04f9c0bea8a1a2e0a747d678117df9f7) just
> builds a "dry" topology (ie. no data being processed) and outputs a graph.
> It's hooked into my app so that I can run with a specific command-line
> option to output the graph without having to start the processor.
>
> It's not the worst thing in the world to lose, or to have to jump through
> some reflection hoops to do. :-)  Perhaps a better approach would be to
> have an API designed specifically for this kind of introspection,
> independent of the much more commonly used API to build a topology.
>
> Mathieu
>



-- 
-- Guozhang

Reply via email to