The newest version of Spark came out today.
https://spark.apache.org/releases/spark-release-1-3-0.html
Apparently they made improvements to the Kafka connector for Spark
Streaming (see Approach 2):
http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
Best,
Niek
Congrats!
On Thu, Nov 6, 2014 at 10:28 AM, Jay Kreps jay.kr...@gmail.com wrote:
Hey all,
I’m happy to announce that Jun Rao, Neha Narkhede and I are creating a
company around Kafka called Confluent. We are planning on productizing the
kind of Kafka-based real-time data platform we built out
If you really only care about small scale (no HA, no horizontal
scaling), you could also consider using Redis instead of Kafka for
queueing.
- Niek
On Tue, May 20, 2014 at 2:23 PM, S Ahmed sahmed1...@gmail.com wrote:
Yes agreed, but I have done some load testing before and kafka was doing
How are you measuring memory usage? I would expect the OS page cache
to take 100% of unused memory, but that's not the same as being OOM.
- Niek
On Mon, Mar 24, 2014 at 12:38 PM, Cassa L lcas...@gmail.com wrote:
Hi,
We have been doing some evaluation testing against Kafka.We have 48GB RAM
A point release focusing on stability would definitely be nice.
And maybe a table in a wiki marking the stability of various features
(core, replication, synchronous messageing, compaction, rebalancing,
topic create, etc), so that people don't end up in the danger zone on
prod deployments.
-
Using a custom partitioner lets you do a gather step and exploit data
locality.
Example use case: topic messages consumer splits message by customer id.
Each customer id has their own database table. With a custom partitioner,
you can send all data for a given customer id to same partition and