Hi,

I am using HDP2.3.2 with Spark 1.4.1 and trying to insert data in hive
table using hive context.

Below is the sample code


   1. spark-shell   --master yarn-client --driver-memory 512m
--executor-memory 512m
   2. //Sample code
   3. import org.apache.spark.sql.SQLContext
   4. import sqlContext.implicits._
   5. val sqlContext = new org.apache.spark.sql.SQLContext(sc)
   6. val people = sc.textFile("/user/spark/people.txt")
   7. val schemaString = "name age"
   8. import org.apache.spark.sql.Row;
   9. import org.apache.spark.sql.types.{StructType,StructField,StringType};
   10. val schema =
   11.   StructType(
   12.     schemaString.split(" ").map(fieldName =>
StructField(fieldName, StringType, true)))
   13. val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))
   14. //Create hive context
   15. val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
   16. //Apply the schema to the
   17. val df = hiveContext.createDataFrame(rowRDD, schema);
   18. val options = Map("path" ->
"hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/personhivetable")
   19. 
df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").options(options).saveAsTable("personhivetable")

Getting below error :


   1. org.apache.spark.SparkException: Task failed while writing rows.
   2.   at 
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:191)
   3.   at 
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160)
   4.   at 
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160)
   5.   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   6.   at org.apache.spark.scheduler.Task.run(Task.scala:70)
   7.   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   8.   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   9.   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   10.  at java.lang.Thread.run(Thread.java:745)
   11. Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
   12.  at 
$line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(<console>:29)
   13.  at 
$line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(<console>:29)
   14.  at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
   15.  at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
   16.  at 
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:182)
   17.  ... 8 more

Is it configuration issue?

When I googled it I found out that Environment variable named HIVE_CONF_DIR
should be there in spark-env.sh

Then I checked spark-env.sh in HDP2.3.2,I couldnt find the Environment
variable named HIVE_CONF_DIR .

Do I need to add above mentioned variables to insert spark output data to
hive tables.

Would really appreciate pointers.

Thanks,

Divya
Add comment
<https://community.hortonworks.com/questions/6023/orgapachesparksparkexception-task-failed-while-wri.html#>

Reply via email to