Re: Spark can not access jar from HDFS !!

Ravindra Mon, 11 May 2015 09:23:27 -0700

After upgrading to spark 1.3, these statements on hivecontext are working
fine. Thanks


On Mon, May 11, 2015, 12:15 Ravindra <ravindra.baj...@gmail.com> wrote:

> Hi All,
>
> Thanks for suggestions. What I tried is -
> hiveContext.sql ("add jar ....") and that helps to complete the "create
> temporary function" but while using this function I get ClassNotFound for
> the class handling this function. The same class is present in the jar
> added .
>
> Please note that the same works fine from the Hive Shell.
>
> Is there an issue with Spark while distributing jars across workers? May
> be that is causing the problem. Also can you please suggest the manual way
> of copying the jars to the workers, I just want to ascertain my assumption.
>
> Thanks,
> Ravi
>
> On Sun, May 10, 2015 at 1:40 AM Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> That code path is entirely delegated to hive.  Does hive support this?
>> You might try instead using sparkContext.addJar.
>>
>> On Sat, May 9, 2015 at 12:32 PM, Ravindra <ravindra.baj...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I am trying to create custom udfs with hiveContext as given below -
>>> scala> hiveContext.sql ("CREATE TEMPORARY FUNCTION sample_to_upper AS
>>> 'com.abc.api.udf.MyUpper' USING JAR
>>> 'hdfs:///users/ravindra/customUDF2.jar'")
>>>
>>> I have put the udf jar in the hdfs at the path given above. The same
>>> command works well in the hive shell but failing here in the spark shell.
>>> And it fails as given below. -
>>> 15/05/10 00:41:51 ERROR Task: FAILED:
>>> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR
>>> hdfs:///users/ravindra/customUDF2.jar
>>> 15/05/10 00:41:51 INFO FunctionTask: create function:
>>> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR
>>> hdfs:///users/ravindra/customUDF2.jar
>>> at
>>> org.apache.hadoop.hive.ql.exec.FunctionTask.addFunctionResources(FunctionTask.java:305)
>>> at
>>> org.apache.hadoop.hive.ql.exec.FunctionTask.createTemporaryFunction(FunctionTask.java:179)
>>> at
>>> org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:81)
>>> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>>> at
>>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>>> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
>>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
>>> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>>> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305)
>>> at
>>> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
>>> at
>>> org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
>>> at
>>> org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
>>> at
>>> org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)
>>> at
>>> org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30)
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
>>> at
>>> org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
>>> at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:108)
>>> at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94)
>>> at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:18)
>>> at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23)
>>> at $line17.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
>>> at $line17.$read$$iwC$$iwC$$iwC.<init>(<console>:27)
>>> at $line17.$read$$iwC$$iwC.<init>(<console>:29)
>>> at $line17.$read$$iwC.<init>(<console>:31)
>>> at $line17.$read.<init>(<console>:33)
>>> at $line17.$read$.<init>(<console>:37)
>>> at $line17.$read$.<clinit>(<console>)
>>> at $line17.$eval$.<init>(<console>:7)
>>> at $line17.$eval$.<clinit>(<console>)
>>> at $line17.$eval.$print(<console>)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>> at
>>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
>>> at
>>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
>>> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
>>> at
>>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
>>> at
>>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
>>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)
>>> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)
>>> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)
>>> at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)
>>> at
>>> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)
>>> at
>>> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
>>> at
>>> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
>>> at
>>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)
>>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)
>>> at org.apache.spark.repl.Main$.main(Main.scala:31)
>>> at org.apache.spark.repl.Main.main(Main.scala)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>> 15/05/10 00:41:51 ERROR Driver: FAILED: Execution Error, return code 1
>>> from org.apache.hadoop.hive.ql.exec.FunctionTask
>>> 15/05/10 00:41:51 INFO PerfLogger: </PERFLOG method=Driver.execute
>>> start=1431198710959 end=1431198711073 duration=114
>>> from=org.apache.hadoop.hive.ql.Driver>
>>> 15/05/10 00:41:51 INFO PerfLogger: <PERFLOG method=releaseLocks
>>> from=org.apache.hadoop.hive.ql.Driver>
>>> 15/05/10 00:41:51 INFO PerfLogger: </PERFLOG method=releaseLocks
>>> start=1431198711074 end=1431198711074 duration=0
>>> from=org.apache.hadoop.hive.ql.Driver>
>>> 15/05/10 00:41:51 ERROR HiveContext:
>>> ======================
>>> HIVE FAILURE OUTPUT
>>> ======================
>>> converting to local hdfs:///users/ravindra/customUDF2.jar
>>> Failed to read external resource hdfs:///users/ravindra/customUDF2.jar
>>> FAILED: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load
>>> JAR hdfs:///users/ravindra/customUDF2.jar
>>> FAILED: Execution Error, return code 1 from
>>> org.apache.hadoop.hive.ql.exec.FunctionTask
>>>
>>> ======================
>>> END HIVE FAILURE OUTPUT
>>> ======================
>>>
>>> org.apache.spark.sql.execution.QueryExecutionException: FAILED:
>>> Execution Error, return code 1 from
>>> org.apache.hadoop.hive.ql.exec.FunctionTask
>>> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:309)
>>> at
>>> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
>>>
>>>

Re: Spark can not access jar from HDFS !!

Reply via email to