I am using the Spark Cassandra connector to work with a table with 3 million
records. Using .where() API to work with only a certain rows in this table.
Where clause filters the data to 10000 rows.

CassandraJavaUtil.javaFunctions(sparkContext) .cassandraTable(KEY_SPACE,
MY_TABLE, CassandraJavaUtil.mapRowTo(MyClass.class)).where(cqlDataFilter,
cqlFilterParams) 


Also using parameter spark.cassandra.input.split.size=1000 

As this job is processed by Spark cluster, it created 3000 partitions
instead of 10. On spark cluster 3000 tasks are being executed. As the data
in our table grows to 30 million rows, this will create 30,000 tasks instead
of 10. 

Is there a better way to approach process these 10,000 records with 10
tasks.

Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cassandra-connector-number-of-Tasks-tp22820.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to