Re: sending data to a different partitions of the output stream

2015-04-07 Thread Vladimir Lebedev
Great, passing partition number as a partition key is enough for my needs. Many thanks, Tommy! On 04/07/2015 02:46 PM, Tommy Becker wrote: If you want to send to a specific partition number, you can just pass that number as the partition key. This works because the default partitioner is via

Dealing with partitioning mismatches between bootstrap and input streams

2015-04-07 Thread Tommy Becker
We have a Kafka topic containing data needed by several Samza jobs. These jobs will essentially read the data and build up state that will be used for processing their inputs. Ideally, we would use the topic as a bootstrap stream to build up this state. The problem with that is the topic

Producer performance in 0.9.0

2015-04-07 Thread Gian Merlino
Has anyone else seen issues with producer performance in 0.9.0? I updated a few of our jobs recently and ended up rolling one back to 0.8 since it was being really sluggish. I profiled it for a bit and a lot of time was being spent in BufferPool.allocate and the busy-loop in KafkaSystemProducer's

Re: Producer performance in 0.9.0

2015-04-07 Thread Chris Riccomini
Hey Gian, Hmm, this is strange. We ran some tests, and found that the new producer to be faster than the old producer default (sync), and almost as fast as the old producer's async producer. Could you paste all of your configs? Cheers, Chris On Tue, Apr 7, 2015 at 10:40 AM, Gian Merlino

Re: consistency between input, output and changelog streams

2015-04-07 Thread Yan Fang
Hi Bart, In terms of your assumption, * Ts = To , this is correction. The code backups this assumption is here: in RunLoop https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala , the commit is called after each process and window methods.

Re: Dealing with partitioning mismatches between bootstrap and input streams

2015-04-07 Thread Chris Riccomini
Hey Tommy, Your summary sounds pretty accurate. One other way, which requires no change to Samza, would be to repartition the input topic properly for each task. This is kind of hacky, though. (2) is the ideal solution. It is a bit of work, but it might not be so bad. I think most of the changes

Re: Producer performance in 0.9.0

2015-04-07 Thread Gian Merlino
Hey Chris, I tried setting producer.batch.size to 256KB (the Kafka docs say the default is 16KB) and the throughput is much better. That job is running a bit faster than 0.8 now. Gian On Tue, Apr 7, 2015 at 2:02 PM, Chris Riccomini criccom...@apache.org wrote: Hey Gian, Hmm, this is strange.