Re: Questions about partitioning

2015-04-24 Thread Yi Pan
Hi, Susan, Welcome to Samza! First I will try to answer your question about partition assignment in Samza. The assignment from stream partition to Samza tasks is determined by the SystemStreamPartitionGrouper. The default implementation include two assignment methods: 1 task per system stream

Re: Questions about partitioning

2015-04-24 Thread Jakob Homan
Hey Susan- That volume of topics (or partitions) would be a significant burden on both the Kafka cluster and underlying YARN cluster (for the Samza job). A 'large number of partitions' even at places with huge Kafka clusters is on the order of 512 or so. It sounds like you're trying to use

Re: Questions about partitioning

2015-04-24 Thread Naveen S
Hey Susan, As far as I know, there is very minimal differences between Partition vs Topic strategy in terms of performance - in terms of how they are allocated in the memory they should be very similar, but I'll get some Kafka experts to comment on that. From Samza's perspective,

Questions about partitioning

2015-04-24 Thread Susan Luong
Hi there, I'm new to Samza/Kafka and we're evaluating Samza to see whether it would be a good fit for our application. I just had a few questions about how partitioning works. I understand there is a limitation on the number of topics we can create [1], and I was wondering, if we need more than,