Hi, I have a Spark Batch job for reading timeseries data from Cassandra which has 50,000 rows.
JavaRDD<String> cassandraRowsRDD = javaFunctions.cassandraTable("iotdata", "coordinate") .map(new Function<CassandraRow, String>() { @Override public String call(CassandraRow cassandraRow) throws Exception { return cassandraRow.toString(); } }); List<String> lm = cassandraRowsRDD.collect(); I am testing in local mode where I am observing Spark is creating 770870 tasks (one job, one stage) which is taking many hours to complete. Can any please suggest, what could be possible issues. Stage Id Description Submitted Duration Tasks: Succeeded/Total Input Output Shuffle Read Shuffle Write 0 collect at CassandraSpark.java:94<http://localhost:4040/stages/stage?id=0&attempt=0>+details 2016/03/10 21:01:15 9 s 137/770870 Thank You Prateek "DISCLAIMER: This message is proprietary to Aricent and is intended solely for the use of the individual to whom it is addressed. It may contain privileged or confidential information and should not be circulated or used for any purpose other than for what it is intended. If you have received this message in error, please notify the originator immediately. If you are not the intended recipient, you are notified that you are strictly prohibited from using, copying, altering, or disclosing the contents of this message. Aricent accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus."