Re: Partitioning at the edges

2016-09-07 Thread Andy Chambers
Looks like re-partitioning is probably the way to go. I've seen reference to this pattern a couple of times but wanted to make sure I wasn't missing something obvious. Looks like kafka streams makes this kind of thing a bit easier than samza. Thanks for sharing your wisdom folks :-) On Wed, Sep

Re: Partitioning at the edges

2016-09-07 Thread David Garcia
Obviously for the keys you don’t have, you would have to look them up…sorry, I kinda missed that part. That is indeed a pain. The job that looks those keys up would probably have to batch queries to the external system. Maybe you could use kafka-connect-jdbc to stream in updates to that syste

Re: Partitioning at the edges

2016-09-07 Thread David Garcia
The “simplest” way to solve this is to “repartition” your data (i.e. the streams you wish to join) with the partition key you wish to join on. This obviously introduces redundancy, but it will solve your problem. For example.. suppose you want to join topic T1 and topic T2…but they aren’t part

Re: Partitioning at the edges

2016-09-06 Thread Eno Thereska
given that the transactions do not have the customer ID). It's worth mentioning that in Kafka trunk the repartitioning happens automatically (while in 0.10.0.0 the user needs to manually repartition topics). Eno Begin forwarded message: *From: *Andy Chambers *Subject: **Re: Partitioning at

Re: Partitioning at the edges

2016-09-03 Thread Andy Chambers
Hi Eno, I'll try. We have a feed of transaction data from the bank. Each of which we must try to associate with a customer in our system. Unfortunately the transaction data doesn't include the customer-id itself but rather a variety of other identifiers that we can use to lookup the customer-id in

Re: Partitioning at the edges

2016-09-03 Thread Eno Thereska
Hi Andy, Could you share a bit more info or pseudocode so that we can understand the scenario a bit better? Especially around the streams at the edges. How are they created and what is the join meant to do? Thanks Eno > On 3 Sep 2016, at 02:43, Andy Chambers wrote: > > Hey Folks, > > We are

Partitioning at the edges

2016-09-02 Thread Andy Chambers
Hey Folks, We are having quite a bit trouble modelling the flow of data through a very kafka centric system As I understand it, every stream you might want to join with another must be partitioned the same way. But often streams at the edges of a system *cannot* be partitioned the same way becaus