Hi Srikanth,

How do you define if the "KTable is read completely" to get to the "current
content"? Since as you said that table is not purely static, but still with
maybe-low-traffic update streams, I guess "catch up to current content" is
still depending on some timestamp?

BTW about 1), we are consider adding "read-only global state" as well into
the DSL in the future, but like Matthias said it won't be available in
0.10.0, so you need to do it through the transform() call where you can
provide any customized processor.


Guozhang


On Mon, May 23, 2016 at 8:59 AM, Srikanth <srikanth...@gmail.com> wrote:

> Matthias,
>
> For (2), how do you achieve this using transform()?
>
> Thanks,
> Srikanth
>
> On Sat, May 21, 2016 at 9:10 AM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
> > Hi Srikanth,
> >
> > 1) there is no support on DSL level, but if you use Processor API you
> > can do "anything" you like. So yes, a map-like transform() that gets
> > initialized with the "broadcast-side" of the join should work.
> >
> > 2) Right now, there is no way to stall a stream -- a custom
> > TimestampExtractor will not do the tricker either, because Kafka Streams
> > allows for out-of-order/late arriving data -- thus, it processes what is
> > available without "waiting" for late data...
> >
> > Of course, you could build a custom solution via transfrom() again.
> >
> >
> > -Matthias
> >
> > On 05/21/2016 05:24 AM, Srikanth wrote:
> > > Hello,
> > >
> > > I'm writing a workflow using kafka streams where an incoming stream
> needs
> > > to be denormalized and joined with a few dimension table.
> > > It will be written back to another kafka topic. Fairly typical I
> believe.
> > >
> > > 1) Can I do broadcast join if my dimension table is small enough to be
> > held
> > > in each processor's local state?
> > > Not doing this will force KStream to be shuffled with an internal
> topic.
> > > If not should I write a transformer that reads the table in init() and
> > > cache's it. I can then do a map like transformation in transform() with
> > > cache lookup instead of a join.
> > >
> > >
> > > 2) When joining a KStream withe KTable for bigger tables, how do I
> stall
> > > reading KStream until KTable is completely materialized?
> > > I know completely read is a very loose term in stream processing :-)
> The
> > > KTable is going to be fairly static and needs the "current content" to
> be
> > > read completely before it can be used for joins.
> > > The only option for synchronizing streams I see is TimestampExtractor.
> I
> > > can't figure out how to use that because
> > >
> > >   i) The join between KStream and KTable is time agnostic and doesn't
> > > really fit into a window operation.
> > >   ii) TimestampExtractor is set as a stream config at global level.
> > > Timestamp extraction logic on the other hand will be specific to each
> > > stream.
> > >       How does one write a generic extractor?
> > >
> > > Thanks,
> > > Srikanth
> > >
> >
> >
>



-- 
-- Guozhang

Reply via email to