After upgrading to spark 1.3, these statements on hivecontext are working fine. Thanks
On Mon, May 11, 2015, 12:15 Ravindra <ravindra.baj...@gmail.com> wrote: > Hi All, > > Thanks for suggestions. What I tried is - > hiveContext.sql ("add jar ....") and that helps to complete the "create > temporary function" but while using this function I get ClassNotFound for > the class handling this function. The same class is present in the jar > added . > > Please note that the same works fine from the Hive Shell. > > Is there an issue with Spark while distributing jars across workers? May > be that is causing the problem. Also can you please suggest the manual way > of copying the jars to the workers, I just want to ascertain my assumption. > > Thanks, > Ravi > > On Sun, May 10, 2015 at 1:40 AM Michael Armbrust <mich...@databricks.com> > wrote: > >> That code path is entirely delegated to hive. Does hive support this? >> You might try instead using sparkContext.addJar. >> >> On Sat, May 9, 2015 at 12:32 PM, Ravindra <ravindra.baj...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> I am trying to create custom udfs with hiveContext as given below - >>> scala> hiveContext.sql ("CREATE TEMPORARY FUNCTION sample_to_upper AS >>> 'com.abc.api.udf.MyUpper' USING JAR >>> 'hdfs:///users/ravindra/customUDF2.jar'") >>> >>> I have put the udf jar in the hdfs at the path given above. The same >>> command works well in the hive shell but failing here in the spark shell. >>> And it fails as given below. - >>> 15/05/10 00:41:51 ERROR Task: FAILED: >>> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR >>> hdfs:///users/ravindra/customUDF2.jar >>> 15/05/10 00:41:51 INFO FunctionTask: create function: >>> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR >>> hdfs:///users/ravindra/customUDF2.jar >>> at >>> org.apache.hadoop.hive.ql.exec.FunctionTask.addFunctionResources(FunctionTask.java:305) >>> at >>> org.apache.hadoop.hive.ql.exec.FunctionTask.createTemporaryFunction(FunctionTask.java:179) >>> at >>> org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:81) >>> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) >>> at >>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) >>> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) >>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) >>> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) >>> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305) >>> at >>> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) >>> at >>> org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) >>> at >>> org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) >>> at >>> org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) >>> at >>> org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30) >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) >>> at >>> org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) >>> at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:108) >>> at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94) >>> at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:18) >>> at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23) >>> at $line17.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25) >>> at $line17.$read$$iwC$$iwC$$iwC.<init>(<console>:27) >>> at $line17.$read$$iwC$$iwC.<init>(<console>:29) >>> at $line17.$read$$iwC.<init>(<console>:31) >>> at $line17.$read.<init>(<console>:33) >>> at $line17.$read$.<init>(<console>:37) >>> at $line17.$read$.<clinit>(<console>) >>> at $line17.$eval$.<init>(<console>:7) >>> at $line17.$eval$.<clinit>(<console>) >>> at $line17.$eval.$print(<console>) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at >>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) >>> at >>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) >>> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) >>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) >>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) >>> at >>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) >>> at >>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873) >>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) >>> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628) >>> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636) >>> at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641) >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968) >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) >>> at >>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916) >>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011) >>> at org.apache.spark.repl.Main$.main(Main.scala:31) >>> at org.apache.spark.repl.Main.main(Main.scala) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) >>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) >>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>> >>> 15/05/10 00:41:51 ERROR Driver: FAILED: Execution Error, return code 1 >>> from org.apache.hadoop.hive.ql.exec.FunctionTask >>> 15/05/10 00:41:51 INFO PerfLogger: </PERFLOG method=Driver.execute >>> start=1431198710959 end=1431198711073 duration=114 >>> from=org.apache.hadoop.hive.ql.Driver> >>> 15/05/10 00:41:51 INFO PerfLogger: <PERFLOG method=releaseLocks >>> from=org.apache.hadoop.hive.ql.Driver> >>> 15/05/10 00:41:51 INFO PerfLogger: </PERFLOG method=releaseLocks >>> start=1431198711074 end=1431198711074 duration=0 >>> from=org.apache.hadoop.hive.ql.Driver> >>> 15/05/10 00:41:51 ERROR HiveContext: >>> ====================== >>> HIVE FAILURE OUTPUT >>> ====================== >>> converting to local hdfs:///users/ravindra/customUDF2.jar >>> Failed to read external resource hdfs:///users/ravindra/customUDF2.jar >>> FAILED: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load >>> JAR hdfs:///users/ravindra/customUDF2.jar >>> FAILED: Execution Error, return code 1 from >>> org.apache.hadoop.hive.ql.exec.FunctionTask >>> >>> ====================== >>> END HIVE FAILURE OUTPUT >>> ====================== >>> >>> org.apache.spark.sql.execution.QueryExecutionException: FAILED: >>> Execution Error, return code 1 from >>> org.apache.hadoop.hive.ql.exec.FunctionTask >>> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:309) >>> at >>> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) >>> >>>