.
El lun., 21 may. 2018 a las 4:34, Saulo Sobreiro
(mailto:saulo.sobre...@outlook.pt>>) escribió:
Hi Javier,
Thank you a lot for the feedback.
Indeed the CPU is a huge limitation. I got a lot of trouble trying to run this
use case in yarn-client mode. I managed to run this in standalon
If the CPU is close to 100% then you are hitting the limit. I don't think that
moving to Scala will make a difference. Both Spark and Cassandra are CPU
hungry, your setup is small in terms of CPUs. Try running Spark on another
(physical) machine so that the 2 cores are dedicated to Cassandra.
table schema can make
a big difference.
On Sun, 29 Apr 2018, 19:02 Saulo Sobreiro,
<saulo.sobre...@outlook.pt<mailto:saulo.sobre...@outlook.pt>> wrote:
Hi Javier,
I removed the map and used "map" directly instead of using transform, but the
kafkaStream is created with Ka
not used python but in Scala the cassandra-spark connector can save
directly to Cassandra without a foreachRDD.
Finally I would use the spark UI to find which stage is the bottleneck here.
On Sun, 29 Apr 2018, 01:17 Saulo Sobreiro,
<saulo.sobre...@outlook.pt<mailto:saulo.sobre...@outlook.pt&g
Hi all,
I am implementing a use case where I read some sensor data from Kafka with
SparkStreaming interface (KafkaUtils.createDirectStream) and, after some
transformations, write the output (RDD) to Cassandra.
Everything is working properly but I am having some trouble with the
performance.
Hi all,
I am implementing a use case where I read some sensor data from Kafka with
SparkStreaming interface (KafkaUtils.createDirectStream) and, after some
transformations, write the output (RDD) to Cassandra.
Everything is working properly but I am having some trouble with the
performance.