Re: [Spark2.1] SparkStreaming to Cassandra performance problem

2018-05-28 Thread Saulo Sobreiro
. El lun., 21 may. 2018 a las 4:34, Saulo Sobreiro (mailto:saulo.sobre...@outlook.pt>>) escribió: Hi Javier, Thank you a lot for the feedback. Indeed the CPU is a huge limitation. I got a lot of trouble trying to run this use case in yarn-client mode. I managed to run this in standalon

Re: [Spark2.1] SparkStreaming to Cassandra performance problem

2018-05-20 Thread Saulo Sobreiro
If the CPU is close to 100% then you are hitting the limit. I don't think that moving to Scala will make a difference. Both Spark and Cassandra are CPU hungry, your setup is small in terms of CPUs. Try running Spark on another (physical) machine so that the 2 cores are dedicated to Cassandra.

Re: [Spark2.1] SparkStreaming to Cassandra performance problem

2018-04-30 Thread Saulo Sobreiro
table schema can make a big difference. On Sun, 29 Apr 2018, 19:02 Saulo Sobreiro, <saulo.sobre...@outlook.pt<mailto:saulo.sobre...@outlook.pt>> wrote: Hi Javier, I removed the map and used "map" directly instead of using transform, but the kafkaStream is created with Ka

Re: [Spark2.1] SparkStreaming to Cassandra performance problem

2018-04-29 Thread Saulo Sobreiro
not used python but in Scala the cassandra-spark connector can save directly to Cassandra without a foreachRDD. Finally I would use the spark UI to find which stage is the bottleneck here. On Sun, 29 Apr 2018, 01:17 Saulo Sobreiro, <saulo.sobre...@outlook.pt<mailto:saulo.sobre...@outlook.pt&g

[Spark2.1] SparkStreaming to Cassandra performance problem

2018-04-28 Thread Saulo Sobreiro
Hi all, I am implementing a use case where I read some sensor data from Kafka with SparkStreaming interface (KafkaUtils.createDirectStream) and, after some transformations, write the output (RDD) to Cassandra. Everything is working properly but I am having some trouble with the performance.

[Spark2.X] SparkStreaming to Cassandra performance problem

2018-04-28 Thread Saulo Sobreiro
Hi all, I am implementing a use case where I read some sensor data from Kafka with SparkStreaming interface (KafkaUtils.createDirectStream) and, after some transformations, write the output (RDD) to Cassandra. Everything is working properly but I am having some trouble with the performance.