Thanks, Andrew.  As it turns out, the tasks were getting processed in
parallel in separate threads on the same node.  Using the parallel
collection of hadoop files was sufficient to trigger that but my expectation
that the tasks would be spread across nodes rather than cores on a single
node led me not to see that right away.

val matches = hadoopFiles.par.map((hadoopFile) ...








--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/processing-s3n-files-in-parallel-tp4989p5116.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to