Matthias,

I would like to provide a suggestion here. Please check if this can be
converted into a KIP. Since GlobalKTable holds complete topic data, and
when the store underneath is in-memory store then the data in memory can
quickly grow to a large value. I think it would be good if while using
GlobalKTable with in-memory store, the memory limit (or no. of events) can
also be specified in which case the GlobalKTable will hold only that much
data in memory and rest of the data will be fetched from topic.
On top of it, the GlobalKTable can also be converted into most recently
used cache so whatever memory size is allocated to the table, it will
always hold the MRU on that cache.

On Thu, May 14, 2020 at 11:49 PM Matthias J. Sax <mj...@apache.org> wrote:

> Yeah, the current API doesn't make it very clear how to do it. You can
> set an in-memory like this:
>
> > builder.globalTable("topic",
> Materialized.as(Stores.inMemoryKeyValueStore("store-name")));
>
>
> We are already working on an improved API via KIP-591:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-591%3A+Add+Kafka+Streams+config+to+set+default+store+type
>
>
>
> -Matthias
>
>
> On 5/13/20 3:40 AM, Pushkar Deole wrote:
> > Matthias,
> >
> > For GlobalKTable, I am looking at the APIs provided by StreamsBuilder
> and I
> > don't see any option to mention in-memory store there: all these API
> > documentation states that  The resulting GlobalKTable
> > <
> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/kstream/GlobalKTable.html
> >
> > will
> > be materialized in a local KeyValueStore
> > <
> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/state/KeyValueStore.html
> >
> > with
> > an internal store name . It doesn't give an option whether in-memory or
> > backed by DB
> >
> > globalTable
> > <
> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/StreamsBuilder.html#globalTable-java.lang.String-
> >
> > (String
> > <
> https://docs.oracle.com/javase/8/docs/api/java/lang/String.html?is-external=true
> >
> >  topic)
> > globalTable
> > <
> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/StreamsBuilder.html#globalTable-java.lang.String-org.apache.kafka.streams.kstream.Consumed-org.apache.kafka.streams.kstream.Materialized-
> >
> > (String
> > <
> https://docs.oracle.com/javase/8/docs/api/java/lang/String.html?is-external=true
> >
> >  topic, Consumed
> > <
> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/kstream/Consumed.html
> >
> > <K,V> consumed, Materialized
> > <
> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/kstream/Materialized.html
> >
> > <K,V,KeyValueStore
> > <
> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/state/KeyValueStore.html
> >
> > <org.apache.kafka.common.utils.Bytes,byte[]>> materialized)
> >
> > On Tue, May 12, 2020 at 11:07 PM Matthias J. Sax <mj...@apache.org>
> wrote:
> >
> >> By default, RocksDB is used. You can also change it to use an in-memory
> >> store that is basically a HashMap.
> >>
> >>
> >> -Matthias
> >>
> >> On 5/12/20 10:16 AM, Pushkar Deole wrote:
> >>> Thanks Liam!
> >>>
> >>> On Tue, May 12, 2020, 15:12 Liam Clarke-Hutchinson <
> >>> liam.cla...@adscale.co.nz> wrote:
> >>>
> >>>> Hi Pushkar,
> >>>>
> >>>> GlobalKTables and KTables can have whatever data structure you like,
> if
> >> you
> >>>> provide the appropriate deserializers - for example, an Kafka Streams
> >> app I
> >>>> maintain stores model data (exported to a topic per entity from
> Postgres
> >>>> via Kafka Connect's JDBC Source) as a GlobalKTable of Jackson
> >> ObjectNode's
> >>>> keyed by entity id
> >>>>
> >>>> If you're worried about efficiency, just treat KTables/GlobalKTables
> as
> >> a
> >>>> HashMap<K, V> to and you're pretty much there. In terms of efficiency,
> >>>> we're joining model  data to about 7 - 10 TB of transactional data a
> >> day,
> >>>> and on average, run about 5 - 10 instances of our enrichment app with
> >> about
> >>>> 2GB max heap.
> >>>>
> >>>> Kind regards,
> >>>>
> >>>> Liam "Not a part of the Confluent team, but happy to help"
> >>>> Clarke-Hutchinson
> >>>>
> >>>> On Tue, May 12, 2020 at 9:35 PM Pushkar Deole <pdeole2...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hello confluent team,
> >>>>>
> >>>>> Could you provide some information on what data structures are used
> >>>>> internally by GlobalKTable and KTables. The application that I am
> >> working
> >>>>> on has a requirement to read cached data from GlobalKTable on every
> >>>>> incoming event, so the reads from GlobalKTable need to be efficient.
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Reply via email to