Hi, I am using HDP2.3.2 with Spark 1.4.1 and trying to insert data in hive table using hive context.
Below is the sample code 1. spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m 2. //Sample code 3. import org.apache.spark.sql.SQLContext 4. import sqlContext.implicits._ 5. val sqlContext = new org.apache.spark.sql.SQLContext(sc) 6. val people = sc.textFile("/user/spark/people.txt") 7. val schemaString = "name age" 8. import org.apache.spark.sql.Row; 9. import org.apache.spark.sql.types.{StructType,StructField,StringType}; 10. val schema = 11. StructType( 12. schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true))) 13. val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim)) 14. //Create hive context 15. val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) 16. //Apply the schema to the 17. val df = hiveContext.createDataFrame(rowRDD, schema); 18. val options = Map("path" -> "hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/personhivetable") 19. df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").options(options).saveAsTable("personhivetable") Getting below error : 1. org.apache.spark.SparkException: Task failed while writing rows. 2. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:191) 3. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160) 4. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160) 5. at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) 6. at org.apache.spark.scheduler.Task.run(Task.scala:70) 7. at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 8. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 9. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 10. at java.lang.Thread.run(Thread.java:745) 11. Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 12. at $line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(<console>:29) 13. at $line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(<console>:29) 14. at scala.collection.Iterator$anon$11.next(Iterator.scala:328) 15. at scala.collection.Iterator$anon$11.next(Iterator.scala:328) 16. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:182) 17. ... 8 more Is it configuration issue? When I googled it I found out that Environment variable named HIVE_CONF_DIR should be there in spark-env.sh Then I checked spark-env.sh in HDP2.3.2,I couldnt find the Environment variable named HIVE_CONF_DIR . Do I need to add above mentioned variables to insert spark output data to hive tables. Would really appreciate pointers. Thanks, Divya Add comment <https://community.hortonworks.com/questions/6023/orgapachesparksparkexception-task-failed-while-wri.html#>