Re: hiveContext.sql NullPointerException

patcharee Thu, 11 Jun 2015 14:01:47 -0700

Hi,

Doesdf.write.partitionBy("partitions").format("format").mode("overwrite").saveAsTable("tbl")support orc file?

I tried df.write.partitionBy("zone", "z", "year","month").format("orc").mode("overwrite").saveAsTable("tbl"), but afterthe insert my table "tbl" schema has been changed to something I did notexpected ..


-- FROM --
CREATE EXTERNAL TABLE `4dim`(`u` float,   `v` float)
PARTITIONED BY (`zone` int, `z` int, `year` int, `month` int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES (
  'orc.compress'='ZLIB',
  'transient_lastDdlTime'='1433016475')

-- TO --
CREATE TABLE `4dim`(`col` array<string> COMMENT 'from deserializer')

PARTITIONED BY (`zone` int COMMENT '', `z` int COMMENT '', `year` intCOMMENT '', `month` int COMMENT '')ROW FORMAT SERDE'org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe'

STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
TBLPROPERTIES (
  'EXTERNAL'='FALSE',
  'spark.sql.sources.provider'='orc',
  'spark.sql.sources.schema.numParts'='1',

'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"u\",\"type\":\"float\",\"nullable\":true,\"metadata\":{}},{\"name\":\"v\",\"type\":\"float\",\"nullable\":true,\"metadata\":{}},{\"name\":\"zone\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"z\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"year\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"month\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}]}',

  'transient_lastDdlTime'='1434055247')

I noticed there are files stored in hdfs as *.orc, but when I tried toquery from hive I got nothing. How can I fix this? Any suggestions please


BR,
Patcharee


On 07. juni 2015 16:40, Cheng Lian wrote:

Spark SQL supports Hive dynamic partitioning, so one possibleworkaround is to create a Hive table partitioned by zone, z, year, andmonth dynamically, and then insert the whole dataset into it directly.
In 1.4, we also provides dynamic partitioning support for non-Hiveenvironment, and you can do something like this:
df.write.partitionBy("zone", "z", "year","month").format("parquet").mode("overwrite").saveAsTable("tbl")
Cheng

On 6/7/15 9:48 PM, patcharee wrote:
Hi,
How can I expect to work on HiveContext on the executor? If only thedriver can see HiveContext, does it mean I have to collect alldatasets (very large) to the driver and use HiveContext there? Itwill be memory overload on the driver and fail.
BR,
Patcharee

On 07. juni 2015 11:51, Cheng Lian wrote:
Hi,
This is expected behavior. HiveContext.sql (and alsoDataFrame.registerTempTable) is only expected to be invoked ondriver side. However, the closure passed to RDD.foreach is executedon executor side, where no viable HiveContext instance exists.
Cheng

On 6/7/15 10:06 AM, patcharee wrote:
Hi,
I try to insert data into a partitioned hive table. The groupByKeyis to combine dataset into a partition of the hive table. After thegroupByKey, I converted the iterable[X] to DB by X.toList.toDF().But the hiveContext.sql throws NullPointerException, see below.Any suggestions? What could be wrong? Thanks!
val varWHeightFlatRDD =varWHeightRDD.flatMap(FlatMapUtilClass().flatKeyFromWrf).groupByKey()
      .foreach(
        x => {
          val zone = x._1._1
          val z = x._1._2
          val year = x._1._3
          val month = x._1._4
          val df_table_4dim = x._2.toList.toDF()
          df_table_4dim.registerTempTable("table_4Dim")
hiveContext.sql("INSERT OVERWRITE table 4dim partition(zone=" + zone + ",z=" + z + ",year=" + year + ",month=" + month +") " +"select date, hh, x, y, height, u, v, w, ph, phb, t, p,pb, qvapor, qgraup, qnice, qnrain, tke_pbl, el_pbl from table_4Dim");
})


java.lang.NullPointerException
atorg.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:100)atno.uni.computing.etl.LoadWrfIntoHiveOptReduce1$$anonfun$7.apply(LoadWrfIntoHiveOptReduce1.scala:113)atno.uni.computing.etl.LoadWrfIntoHiveOptReduce1$$anonfun$7.apply(LoadWrfIntoHiveOptReduce1.scala:103)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
atorg.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)atorg.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:798)atorg.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:798)atorg.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1511)atorg.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1511)atorg.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
atorg.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: hiveContext.sql NullPointerException

Reply via email to