Actually it looks like the better way would be to output the counts to a new topic then ingest that topic into the DB itself. Is that the correct way?
On Fri, Mar 2, 2018 at 9:24 AM, Matt Daum <m...@setfive.com> wrote: > I am new to Kafka but I think I have a good use case for it. I am trying > to build daily counts of requests based on a number of different attributes > in a high throughput system (~1 million requests/sec. across all 8 > servers). The different attributes are unbounded in terms of values, and > some will spread across 100's of millions values. This is my current > through process, let me know where I could be more efficient or if there is > a better way to do it. > > I'll create an AVRO object "Impression" which has all the attributes of > the inbound request. My application servers then will on each request > create and send this to a single kafka topic. > > I'll then have a consumer which creates a stream from the topic. From > there I'll use the windowed timeframes and groupBy to group by the > attributes on each given day. At the end of the day I'd need to read out > the data store to an external system for storage. Since I won't know all > the values I'd need something similar to the KVStore.all() but for > WindowedKV Stores. This appears that it'd be possible in 1.1 with this > commit: https://github.com/apache/kafka/commit/ > 1d1c8575961bf6bce7decb049be7f10ca76bd0c5 . > > Is this the best approach to doing this? Or would I be better using the > stream to listen and then an external DB like Aerospike to store the counts > and read out of it directly end of day. > > Thanks for the help! > Daum >