Hi All, Thanks for suggestions. What I tried is - hiveContext.sql ("add jar ....") and that helps to complete the "create temporary function" but while using this function I get ClassNotFound for the class handling this function. The same class is present in the jar added .
Please note that the same works fine from the Hive Shell. Is there an issue with Spark while distributing jars across workers? May be that is causing the problem. Also can you please suggest the manual way of copying the jars to the workers, I just want to ascertain my assumption. Thanks, Ravi On Sun, May 10, 2015 at 1:40 AM Michael Armbrust <mich...@databricks.com> wrote: > That code path is entirely delegated to hive. Does hive support this? > You might try instead using sparkContext.addJar. > > On Sat, May 9, 2015 at 12:32 PM, Ravindra <ravindra.baj...@gmail.com> > wrote: > >> Hi All, >> >> I am trying to create custom udfs with hiveContext as given below - >> scala> hiveContext.sql ("CREATE TEMPORARY FUNCTION sample_to_upper AS >> 'com.abc.api.udf.MyUpper' USING JAR >> 'hdfs:///users/ravindra/customUDF2.jar'") >> >> I have put the udf jar in the hdfs at the path given above. The same >> command works well in the hive shell but failing here in the spark shell. >> And it fails as given below. - >> 15/05/10 00:41:51 ERROR Task: FAILED: >> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR >> hdfs:///users/ravindra/customUDF2.jar >> 15/05/10 00:41:51 INFO FunctionTask: create function: >> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR >> hdfs:///users/ravindra/customUDF2.jar >> at >> org.apache.hadoop.hive.ql.exec.FunctionTask.addFunctionResources(FunctionTask.java:305) >> at >> org.apache.hadoop.hive.ql.exec.FunctionTask.createTemporaryFunction(FunctionTask.java:179) >> at >> org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:81) >> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) >> at >> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) >> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) >> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) >> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) >> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) >> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) >> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305) >> at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) >> at >> org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) >> at >> org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) >> at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) >> at >> org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30) >> at >> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) >> at >> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) >> at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) >> at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:108) >> at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94) >> at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:18) >> at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23) >> at $line17.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25) >> at $line17.$read$$iwC$$iwC$$iwC.<init>(<console>:27) >> at $line17.$read$$iwC$$iwC.<init>(<console>:29) >> at $line17.$read$$iwC.<init>(<console>:31) >> at $line17.$read.<init>(<console>:33) >> at $line17.$read$.<init>(<console>:37) >> at $line17.$read$.<clinit>(<console>) >> at $line17.$eval$.<init>(<console>:7) >> at $line17.$eval$.<clinit>(<console>) >> at $line17.$eval.$print(<console>) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) >> at >> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) >> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) >> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) >> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) >> at >> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) >> at >> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873) >> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) >> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628) >> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636) >> at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) >> at >> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916) >> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011) >> at org.apache.spark.repl.Main$.main(Main.scala:31) >> at org.apache.spark.repl.Main.main(Main.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> >> 15/05/10 00:41:51 ERROR Driver: FAILED: Execution Error, return code 1 >> from org.apache.hadoop.hive.ql.exec.FunctionTask >> 15/05/10 00:41:51 INFO PerfLogger: </PERFLOG method=Driver.execute >> start=1431198710959 end=1431198711073 duration=114 >> from=org.apache.hadoop.hive.ql.Driver> >> 15/05/10 00:41:51 INFO PerfLogger: <PERFLOG method=releaseLocks >> from=org.apache.hadoop.hive.ql.Driver> >> 15/05/10 00:41:51 INFO PerfLogger: </PERFLOG method=releaseLocks >> start=1431198711074 end=1431198711074 duration=0 >> from=org.apache.hadoop.hive.ql.Driver> >> 15/05/10 00:41:51 ERROR HiveContext: >> ====================== >> HIVE FAILURE OUTPUT >> ====================== >> converting to local hdfs:///users/ravindra/customUDF2.jar >> Failed to read external resource hdfs:///users/ravindra/customUDF2.jar >> FAILED: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load >> JAR hdfs:///users/ravindra/customUDF2.jar >> FAILED: Execution Error, return code 1 from >> org.apache.hadoop.hive.ql.exec.FunctionTask >> >> ====================== >> END HIVE FAILURE OUTPUT >> ====================== >> >> org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution >> Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask >> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:309) >> at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) >> >>