24/7 Spark Streaming on YARN in Production

2017-01-01 Thread Bernhard Schäfer
Two weeks ago I have published a blogpost about our experiences running 24/7 Spark Streaming applications on YARN in production: https://www.inovex.de/blog/247-spark-streaming-on-yarn-in-production/ Amongst others it contains

Re: [Spark Streaming] Joining Kafka and Cassandra DataFrames

2016-02-09 Thread bernhard
The statement "To push down partition keys, all of them must be included, but not more than one predicate per partition key, otherwise nothing is pushed down." Does not apply IMO? Bernhard Quoting Mohammed Guller : Hi Bernhard, Take a look at the examples shown under the &

Re: [Spark Streaming] Joining Kafka and Cassandra DataFrames

2016-02-09 Thread bernhard
istributed collections per filter per join). The statement "To push down partition keys, all of them must be included, but not more than one predicate per partition key, otherwise nothing is pushed down." Does not apply IMO? Bernhard Quoting Mohammed Guller : Hi Bernhard,

Re: [Spark Streaming] Joining Kafka and Cassandra DataFrames

2016-02-09 Thread bernhard
(instance == null) { // Load DataFrame with C* data-source instance = sqlContext.read .format("org.apache.spark.sql.cassandra") .options(Map("table" -> "cf", "keyspace" -> "ks")) .load() } instance

[Spark Streaming] Joining Kafka and Cassandra DataFrames

2016-02-09 Thread bernhard
All, I'm new to Spark and I'm having a hard time doing a simple join of two DFs Intent: - I'm receiving data from Kafka via direct stream and would like to enrich the messages with data from Cassandra. The Kafka messages (Protobufs) are decoded into DataFrames and then joined with a (supp