Hi, Thanks for your help. Problem solved, but I thought I should add something in case this problem is encountered by others.
Both responses are correct; BasicAWSCredentialsProvider is gone, but simply making the substitution leads to the traceback just below. java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, java.lang.Object, java.lang.Object)' I found that changing all of the Hadoop packages from 3.3.0 to 3.2.0 *and* changing the options away from BasicAWSCredentialsProvider solved the problem. Thanks. Regards, Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/daniel/packages/spark-3.0.0-bin-hadoop3.2/python/pyspark/sql/readwriter.py", line 535, in csv return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path))) File "/home/daniel/packages/spark-3.0.0-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ File "/home/daniel/packages/spark-3.0.0-bin-hadoop3.2/python/pyspark/sql/utils.py", line 131, in deco return f(*a, **kw) File "/home/daniel/packages/spark-3.0.0-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o39.csv. : java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, java.lang.Object, java.lang.Object)' at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:893) at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:869) at org.apache.hadoop.fs.s3a.S3AUtils.getEncryptionAlgorithm(S3AUtils.java:1580) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:341) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:361) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:279) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:268) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:268) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:705) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:834) On Thu, 6 Aug 2020 at 17:19, Stephen Coy <s...@infomedia.com.au> wrote: > Hi Daniel, > > It looks like …BasicAWSCredentialsProvider has > become org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider. > > However, the way that the username and password are provided appears to > have changed so you will probably need to look in to that. > > Cheers, > > Steve C > > On 6 Aug 2020, at 11:15 am, Daniel Stojanov <m...@danielstojanov.com> > wrote: > > Hi, > > I am trying to read/write files to S3 from PySpark. The procedure that I > have used is to download Spark, start PySpark with the hadoop-aws, guava, > aws-java-sdk-bundle packages. The versions are explicitly specified by > looking up the exact dependency version on Maven. Allowing dependencies to > be auto determined does not work. This procedure works for Spark 3.0.0 with > Hadoop 2.7, but does not work for Hadoop 3.2. I get this exception: > java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider > > > Can somebody point to a procedure for doing this using Spark bundled with > Hadoop 3.2? > > Regards, > > > > > > To replicate: > > Download Spark 3.0.0 with support for Hadoop 3.2. > > Launch spark with: > > ./pyspark --packages > org.apache.hadoop:hadoop-common:3.3.0,org.apache.hadoop:hadoop-client:3.3.0,org.apache.hadoop:hadoop-aws:3.3.0,com.amazonaws:aws-java-sdk-bundle:1.11.563,com.google.guava:guava:27.1-jre > > Then run the following Python. > > # Set these 4 parameters as appropriate. > aws_bucket = "" > aws_filename = "" > aws_access_key = "" > aws_secret_key = "" > > > spark._jsc.hadoopConfiguration().set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem") > spark._jsc.hadoopConfiguration().set("com.amazonaws.services.s3.enableV4", > "true") > > spark._jsc.hadoopConfiguration().set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider") > spark._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3.amazonaws.com > <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fs3.amazonaws.com%2F&data=02%7C01%7Cscoy%40infomedia.com.au%7Cdcd4f52987e54d13b71008d839a6415e%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637322733538454440&sdata=2x5VyEzpNcC6CrAM1eRI3Qn%2BPZZgpjvTCpZDUvHXyaQ%3D&reserved=0> > ") > spark._jsc.hadoopConfiguration().set("fs.s3a.access.key", aws_access_key) > spark._jsc.hadoopConfiguration().set("fs.s3a.secret.key", aws_secret_key) > df = spark.read.option("header", > "true").csv(f"s3a://{aws_bucket}/{aws_filename}") > > > > Leads to this error message: > > > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File > "/home/daniel/.local/share/Trash/files/spark-3.3.0.0-bin-hadoop3.2/python/pyspark/sql/readwriter.py", > line 535, in csv > return > self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path))) > File > "/home/daniel/.local/share/Trash/files/spark-3.3.0.0-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", > line 1305, in __call__ > File > "/home/daniel/.local/share/Trash/files/spark-3.3.0.0-bin-hadoop3.2/python/pyspark/sql/utils.py", > line 131, in deco > return f(*a, **kw) > File > "/home/daniel/.local/share/Trash/files/spark-3.3.0.0-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", > line 328, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o39.csv. > : java.io.IOException: From option fs.s3a.aws.credentials.provider > java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider not found > at > org.apache.hadoop.fs.s3a.S3AUtils.loadAWSProviderClasses(S3AUtils.java:645) > at > org.apache.hadoop.fs.s3a.S3AUtils.buildAWSProviderList(S3AUtils.java:668) > at > org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:619) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(S3AFileSystem.java:636) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:390) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) > at > org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:361) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:279) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:268) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:268) > at > org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:705) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:282) > at > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:238) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider not found > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2499) > at > org.apache.hadoop.conf.Configuration.getClasses(Configuration.java:2570) > at > org.apache.hadoop.fs.s3a.S3AUtils.loadAWSProviderClasses(S3AUtils.java:642) > ... 28 more > > > > > > This email contains confidential information of and is the copyright of > Infomedia. It must not be forwarded, amended or disclosed without consent > of the sender. If you received this message by mistake, please advise the > sender and delete all copies. Security of transmission on the internet > cannot be guaranteed, could be infected, intercepted, or corrupted and you > should ensure you have suitable antivirus protection in place. By sending > us your or any third party personal details, you consent to (or confirm you > have obtained consent from such third parties) to Infomedia’s privacy > policy. http://www.infomedia.com.au/privacy-policy/ > -- *Daniel Stojanov* *Web:* http://danielstojanov.com *Linkedin:* https://tiny.cc/DSLinkedin