Kafka and Spark 1.3.0

2015-03-13 Thread Niek Sanders
The newest version of Spark came out today. https://spark.apache.org/releases/spark-release-1-3-0.html Apparently they made improvements to the Kafka connector for Spark Streaming (see Approach 2): http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html Best, Niek

Re: Announcing Confluent

2014-11-06 Thread Niek Sanders
Congrats! On Thu, Nov 6, 2014 at 10:28 AM, Jay Kreps jay.kr...@gmail.com wrote: Hey all, I’m happy to announce that Jun Rao, Neha Narkhede and I are creating a company around Kafka called Confluent. We are planning on productizing the kind of Kafka-based real-time data platform we built out

Re: starting of at a small scale, single ec2 instance with 7.5 GB RAM with kafka

2014-05-20 Thread Niek Sanders
If you really only care about small scale (no HA, no horizontal scaling), you could also consider using Redis instead of Kafka for queueing. - Niek On Tue, May 20, 2014 at 2:23 PM, S Ahmed sahmed1...@gmail.com wrote: Yes agreed, but I have done some load testing before and kafka was doing

Re: Memory consumption in Kafka

2014-03-24 Thread Niek Sanders
How are you measuring memory usage? I would expect the OS page cache to take 100% of unused memory, but that's not the same as being OOM. - Niek On Mon, Mar 24, 2014 at 12:38 PM, Cassa L lcas...@gmail.com wrote: Hi, We have been doing some evaluation testing against Kafka.We have 48GB RAM

Re: 0.8.1 stability

2014-03-18 Thread Niek Sanders
A point release focusing on stability would definitely be nice. And maybe a table in a wiki marking the stability of various features (core, replication, synchronous messageing, compaction, rebalancing, topic create, etc), so that people don't end up in the danger zone on prod deployments. -

Re: Why would one choose a partition when producing?

2013-11-05 Thread Niek Sanders
Using a custom partitioner lets you do a gather step and exploit data locality. Example use case: topic messages consumer splits message by customer id. Each customer id has their own database table. With a custom partitioner, you can send all data for a given customer id to same partition and