Thanks, Andrew. As it turns out, the tasks were getting processed in parallel in separate threads on the same node. Using the parallel collection of hadoop files was sufficient to trigger that but my expectation that the tasks would be spread across nodes rather than cores on a single node led me not to see that right away.
val matches = hadoopFiles.par.map((hadoopFile) ... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/processing-s3n-files-in-parallel-tp4989p5116.html Sent from the Apache Spark User List mailing list archive at Nabble.com.