Re: spark-submit and the portable runner

Trevor Kramer Thu, 24 Jun 2021 06:43:36 -0700

Thanks. That was the issue. The latest release of Beam indicates it
supports Spark 3 and I find references in the code to Hadoop 3.2.1. Is it
possible to configure beam to run on Hadoop 3.2.1?


Trevor

On Mon, Jun 21, 2021 at 6:19 PM Kyle Weaver <[email protected]> wrote:

> Looks like a version mismatch between Hadoop dependencies [1]. Beam's
> Python wrapper for Spark is currently pinned to Spark 2.4.8 [2]. Which
> Spark and Hadoop versions are your cluster using?
>
> [1]
> https://stackoverflow.com/questions/62880009/error-through-remote-spark-job-java-lang-illegalaccesserror-class-org-apache-h
> [2] https://issues.apache.org/jira/browse/BEAM-12094
>
> On Mon, Jun 21, 2021 at 9:52 AM Trevor Kramer <[email protected]>
> wrote:
>
>> Does anyone have an example of using spark-submit to run a beam job using
>> the portable runner? The documentation indicates this is possible but
>> doesn't give an example of how to do it.
>>
>> I am using the following pipeline options to generate the jar
>>
>> options = PipelineOptions(['--runner=SparkRunner',
>>                            '--environment_type=DOCKER',
>>                            '--environment_config=path-to-image:latest',
>>                            '--output_executable_path=output.jar'
>>                            ])
>>
>> and am trying to run it with
>>
>> spark-submit --master yarn --deploy-mode cluster --class
>> org.apache.beam.runners.spark.SparkPipelineRunner output.jar
>>
>>
>> but I get the following error
>>
>>
>> Exception in thread "main" java.lang.IllegalAccessError: class
>> org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterface
>> org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator
>>
>> at java.lang.ClassLoader.defineClass1(Native Method)
>>
>> at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
>>
>> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>
>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
>>
>> at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
>>
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
>>
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>>
>> at java.security.AccessController.doPrivileged(Native Method)
>>
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
>>
>> at java.lang.Class.forName0(Native Method)
>>
>> at java.lang.Class.forName(Class.java:348)
>>
>> at
>> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
>>
>> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
>>
>> at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
>>
>> at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3268)
>>
>> at
>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3313)
>>
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3352)
>>
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
>>
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:482)
>>
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:230)
>>
>> at
>> org.apache.spark.deploy.yarn.Client.$anonfun$appStagingBaseDir$2(Client.scala:138)
>>
>> at scala.Option.getOrElse(Option.scala:189)
>>
>> at org.apache.spark.deploy.yarn.Client.<init>(Client.scala:138)
>>
>> at
>> org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
>>
>> at org.apache.spark.deploy.SparkSubmit.org
>> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
>>
>> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>>
>> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>>
>> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>>
>> at
>> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
>>
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
>>
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>

Re: spark-submit and the portable runner

Reply via email to