I've seen a couple of issues posted about this, but I never saw a resolution.
When I'm using Spark 1.0.2 (and the spark-submit script to submit my jobs) and AWS SDK 1.8.7, I get the stack trace below. However, if I drop back to AWS SDK 1.3.26 (or anything from the AWS SDK 1.4.* family) then everything works fine. It would appear that after AWS SDK 1.4, there became a dependency on HTTP Client 4.2 (instead of 4.1). I would like to use the more recent versions of the AWS SDK (and not use something nearly 2 years old) so I'm curious whether anyone has figured out a workaround to this problem. Thanks. Darin. java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:119) at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:408) at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:390) at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:374) at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:313) at com.elsevier.s3.SimpleStorageService.<clinit>(SimpleStorageService.java:27) at com.elsevier.spark.XMLKeyPair.call(SDKeyMapKeyPairRDD.java:75) at com.elsevier.spark.XMLKeyPair.call(SDKeyMapKeyPairRDD.java:65) at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750) at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:779) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:769) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680)