Try setting following Param: conf.set("spark.sql.hive.convertMetastoreParquet","false")
On Tue, Jun 13, 2017 at 3:34 PM, Angel Francisco Orta < angel.francisco.o...@gmail.com> wrote: > Hello, > > Do you use df.write or you make with hivecontext.sql(" insert into ...")? > > Angel. > > El 12 jun. 2017 11:07 p. m., "Yong Zhang" <java8...@hotmail.com> escribió: > >> We are using Spark *1.6.2* as ETL to generate parquet file for one >> dataset, and partitioned by "brand" (which is a string to represent brand >> in this dataset). >> >> >> After the partition files generated in HDFS like "brand=a" folder, we add >> the partitions in the Hive. >> >> >> The hive version is *1.2.1 *(In fact, we are using HDP 2.5.0). >> >> >> Now the problem is that for 2 brand partitions, we cannot query the data >> generated in Spark, but it works fine for the rest of partitions. >> >> >> Below is the error in the Hive CLI and hive.log I got if I query the bad >> partitions like "select * from tablename where brand='*BrandA*' limit >> 3;" >> >> >> Failed with exception java.io.IOException:org.apache >> .hadoop.hive.ql.metadata.HiveException: >> java.lang.UnsupportedOperationException: >> Cannot inspect org.apache.hadoop.io.LongWritable >> >> >> Caused by: java.lang.UnsupportedOperationException: Cannot inspect >> org.apache.hadoop.io.LongWritable >> at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.Parquet >> StringInspector.getPrimitiveWritableObject(ParquetStringInsp >> ector.java:52) >> at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveU >> TF8(LazyUtils.java:222) >> at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize >> (LazySimpleSerDe.java:307) >> at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize >> Field(LazySimpleSerDe.java:262) >> at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeFi >> eld(DelimitedJSONSerDe.java:72) >> at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSeriali >> ze(LazySimpleSerDe.java:246) >> at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.ser >> ialize(AbstractEncodingAwareSerDe.java:50) >> at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert >> (DefaultFetchFormatter.java:71) >> at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert >> (DefaultFetchFormatter.java:40) >> at org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(List >> SinkOperator.java:90) >> ... 22 more >> >> There are not too much I can find by googling this error message, but it >> points to that the schema in Hive is different as in parquet file. >> But this is a very strange case, as the same schema works fine for other >> brands, which defined as a partition column, and share the whole Hive >> schema as the above. >> >> If I query like: "select * from tablename where brand='*BrandB*' limit >> 3:", everything works fine. >> >> So is this really caused by the Hive schema mismatch with parquet file >> generated by Spark, or by the data within different partitioned keys, or >> really a compatible issue between Spark/Hive? >> >> Thanks >> >> Yong >> >> >> -- Best Regards, Ayan Guha