Yes.
We are trying to run a custom script written in C# using TRANSFORM, but cannot
get it work.
The query and error are below. Any suggestions? Thank you!
Spark version: 1.3
Here is how we add and invoke the script:
scala> hiveContext.sql("""ADD FILE wasb://… /NSSGraphHelper.exe""")
…
scala> hiveContext.sql("""SELECT TRANSFORM (dc, attribute, key, time, value)
USING 'NSSGraphHelper. exe' FROM SourceTable""").collect()
The query throws an exception that it cannot find the file specified:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 16.0 failed 4 times, most recent fail
ure: Lost task 0.3 in stage 16.0 (TID 1273,
workernode1.nsssparkcluster.g10.internal.cloudapp.net): java.io.IOException:
Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find
the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
at
org.apache.spark.sql.hive.execution.ScriptTransformation$$anonfun$1.apply(ScriptTransformation.scala:61)
at
org.apache.spark.sql.hive.execution.ScriptTransformation$$anonfun$1.apply(ScriptTransformation.scala:58)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find
the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:385)
at java.lang.ProcessImpl.start(ProcessImpl.java:136)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)
... 16 more
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(
DAGScheduler.scala:1204)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
From: Michael Armbrust [mailto:[email protected]]
Sent: Tuesday, October 20, 2015 10:21 AM
To: Yang Wu (Tata Consultancy Services) <[email protected]>
Cc: user <[email protected]>
Subject: Re: Hive custom transform scripts in Spark?
We support TRANSFORM. Are you having a problem using it?
On Tue, Oct 20, 2015 at 8:21 AM, wuyangjack
<[email protected]<mailto:[email protected]>> wrote:
How to reuse hive custom transform scripts written in python or c++?
These scripts process data from stdin and print to stdout in spark.
They use the Transform Syntax in Hive:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fcwiki.apache.org%2fconfluence%2fdisplay%2fHive%2fLanguageManual%2bTransform&data=01%7c01%7cv-wuyang%40microsoft.com%7ca204316fb2bd41492b2708d2d972dde2%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=pL6wNubIoOntZPeD%2fGld%2b7ZPm57tpFKw4Q6Ab0YZ%2bV4%3d>
Example in Hive:
SELECT TRANSFORM(stuff)
USING 'script.exe'
AS thing1, thing2
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Hive-custom-transform-scripts-in-Spark-tp25142.html<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fapache-spark-user-list.1001560.n3.nabble.com%2fHive-custom-transform-scripts-in-Spark-tp25142.html&data=01%7c01%7cv-wuyang%40microsoft.com%7ca204316fb2bd41492b2708d2d972dde2%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=E19LCyw%2ft%2b75qAtLbc1lCcOfCG02S8xts3e51HIEVE4%3d>
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]<mailto:[email protected]>
For additional commands, e-mail:
[email protected]<mailto:[email protected]>