Hi, I am not sure about this but is there any requirement to use S3a at all ?
Regards, Gourav On Tue, Jul 21, 2020 at 12:07 PM Steve Loughran <ste...@cloudera.com.invalid> wrote: > > > On Tue, 7 Jul 2020 at 03:42, Stephen Coy <s...@infomedia.com.au.invalid> > wrote: > >> Hi Steve, >> >> While I understand your point regarding the mixing of Hadoop jars, this >> does not address the java.lang.ClassNotFoundException. >> >> Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or >> Hadoop 3.2. Not Hadoop 3.1. >> > > sorry, I should have been clearer. Hadoop 3.2.x has everything you need. > > > >> >> The only place that I have found that missing class is in the Spark >> “hadoop-cloud” source module, and currently the only way to get the jar >> containing it is to build it yourself. If any of the devs are listening it >> would be nice if this was included in the standard distribution. It has a >> sizeable chunk of a repackaged Jetty embedded in it which I find a bit odd. >> >> But I am relatively new to this stuff so I could be wrong. >> >> I am currently running Spark 3.0 clusters with no HDFS. Spark is set up >> like: >> >> hadoopConfiguration.set("spark.hadoop.fs.s3a.committer.name", >> "directory"); >> hadoopConfiguration.set("spark.sql.sources.commitProtocolClass", >> "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol"); >> hadoopConfiguration.set("spark.sql.parquet.output.committer.class", >> "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter"); >> hadoopConfiguration.set("fs.s3a.connection.maximum", >> Integer.toString(coreCount * 2)); >> >> Querying and updating s3a data sources seems to be working ok. >> >> Thanks, >> >> Steve C >> >> On 29 Jun 2020, at 10:34 pm, Steve Loughran <ste...@cloudera.com.INVALID> >> wrote: >> >> you are going to need hadoop-3.1 on your classpath, with hadoop-aws and >> the same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is >> doomed. using a different aws sdk jar is a bit risky, though more recent >> upgrades have all be fairly low stress >> >> On Fri, 19 Jun 2020 at 05:39, murat migdisoglu < >> murat.migdiso...@gmail.com> wrote: >> >>> Hi all >>> I've upgraded my test cluster to spark 3 and change my comitter to >>> directory and I still get this error.. The documentations are somehow >>> obscure on that. >>> Do I need to add a third party jar to support new comitters? >>> >>> java.lang.ClassNotFoundException: >>> org.apache.spark.internal.io.cloud.PathOutputCommitProtocol >>> >>> >>> On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu < >>> murat.migdiso...@gmail.com> wrote: >>> >>>> Hello all, >>>> we have a hadoop cluster (using yarn) using s3 as filesystem with >>>> s3guard is enabled. >>>> We are using hadoop 3.2.1 with spark 2.4.5. >>>> >>>> When I try to save a dataframe in parquet format, I get the following >>>> exception: >>>> java.lang.ClassNotFoundException: >>>> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol >>>> >>>> My relevant spark configurations are as following: >>>> >>>> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory", >>>> "fs.s3a.committer.name >>>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F&data=02%7C01%7Cscoy%40infomedia.com.au%7C25d6f7b564dd4cb53e5508d81c28e645%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637290309277792405&sdata=jxbuOsgSShhHZcXjrjkZmJ4DCXIXstzRFSOaOEEadRE%3D&reserved=0>": >>>> "magic", >>>> "fs.s3a.committer.magic.enabled": true, >>>> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem", >>>> >>>> While spark streaming fails with the exception above, apache beam >>>> succeeds writing parquet files. >>>> What might be the problem? >>>> >>>> Thanks in advance >>>> >>>> >>>> -- >>>> "Talkers aren’t good doers. Rest assured that we’re going there to use >>>> our hands, not our tongues." >>>> W. Shakespeare >>>> >>> >>> >>> -- >>> "Talkers aren’t good doers. Rest assured that we’re going there to use >>> our hands, not our tongues." >>> W. Shakespeare >>> >> >> >> <https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature&utm_source=Internal&utm_medium=Email&utm_content=Driving%20Force> >> This email contains confidential information of and is the copyright of >> Infomedia. It must not be forwarded, amended or disclosed without consent >> of the sender. If you received this message by mistake, please advise the >> sender and delete all copies. Security of transmission on the internet >> cannot be guaranteed, could be infected, intercepted, or corrupted and you >> should ensure you have suitable antivirus protection in place. By sending >> us your or any third party personal details, you consent to (or confirm you >> have obtained consent from such third parties) to Infomedia’s privacy >> policy. http://www.infomedia.com.au/privacy-policy/ >> >