How fast is Cassandra without Spark on the count operation? cqsh> SELECT COUNT(*) FROM hello
(this is not equivalent with what you are doing but might help you find the root of the cause) On Thu, Nov 24, 2016 at 9:03 AM, kant kodali <kanth...@gmail.com> wrote: > I have the following code > > I invoke spark-shell as follows > > ./spark-shell --conf spark.cassandra.connection.host=170.99.99.134 > --executor-memory 15G --executor-cores 12 --conf > spark.cassandra.input.split.size_in_mb=67108864 > > code > > scala> val df = spark.sql("SELECT test from hello") // Billion rows in > hello and test column is 1KB > > df: org.apache.spark.sql.DataFrame = [test: binary] > > scala> df.count > > [Stage 0:> (0 + 2) / 13] // I dont know what these numbers mean > precisely. > > If I invoke spark-shell as follows > > ./spark-shell --conf spark.cassandra.connection.host=170.99.99.134 > > code > > > val df = spark.sql("SELECT test from hello") // This has about billion > rows > > scala> df.count > > > [Stage 0:=> (686 + 2) / 24686] // What are these numbers precisely? > > > Both of these versions didn't work Spark keeps running forever and I have > been waiting for more than 15 mins and no response. Any ideas on what could > be wrong and how to fix this? > > I am using Spark 2.0.2 > and spark-cassandra-connector_2.11-2.0.0-M3.jar > > -- -- Anastasios Zouzias <a...@zurich.ibm.com>