I have the following code I invoke spark-shell as follows
./spark-shell --conf spark.cassandra.connection.host=170.99.99.134 --executor-memory 15G --executor-cores 12 --conf spark.cassandra.input.split.size_in_mb=67108864 code scala> val df = spark.sql("SELECT test from hello") // Billion rows in hello and test column is 1KB df: org.apache.spark.sql.DataFrame = [test: binary] scala> df.count [Stage 0:> (0 + 2) / 13] // I dont know what these numbers mean precisely. If I invoke spark-shell as follows ./spark-shell --conf spark.cassandra.connection.host=170.99.99.134 code val df = spark.sql("SELECT test from hello") // This has about billion rows scala> df.count [Stage 0:=> (686 + 2) / 24686] // What are these numbers precisely? Both of these versions didn't work Spark keeps running forever and I have been waiting for more than 15 mins and no response. Any ideas on what could be wrong and how to fix this? I am using Spark 2.0.2 and spark-cassandra-connector_2.11-2.0.0-M3.jar