Hi,

I have a Spark Batch job for reading timeseries data from Cassandra which has 
50,000 rows.


JavaRDD<String> cassandraRowsRDD = javaFunctions.cassandraTable("iotdata", 
"coordinate")
                .map(new Function<CassandraRow, String>() {
                    @Override
                    public String call(CassandraRow cassandraRow) throws 
Exception {
                        return cassandraRow.toString();
                    }
                });

List<String> lm = cassandraRowsRDD.collect();


I am testing in local mode where I am observing Spark is creating 770870 tasks 
(one job, one stage) which is taking many hours to complete. Can any please 
suggest, what could be possible issues.


Stage Id

Description

Submitted

Duration

Tasks: Succeeded/Total

Input

Output

Shuffle Read

Shuffle Write

0

collect at 
CassandraSpark.java:94<http://localhost:4040/stages/stage?id=0&attempt=0>+details

2016/03/10 21:01:15

9 s

137/770870



Thank You

Prateek
"DISCLAIMER: This message is proprietary to Aricent and is intended solely for 
the use of the individual to whom it is addressed. It may contain privileged or 
confidential information and should not be circulated or used for any purpose 
other than for what it is intended. If you have received this message in error, 
please notify the originator immediately. If you are not the intended 
recipient, you are notified that you are strictly prohibited from using, 
copying, altering, or disclosing the contents of this message. Aricent accepts 
no responsibility for loss or damage arising from the use of the information 
transmitted by this email including damage from virus."

Reply via email to