Hello, Do you use df.write or you make with hivecontext.sql(" insert into ...")?
Angel. El 12 jun. 2017 11:07 p. m., "Yong Zhang" <java8...@hotmail.com> escribió: > We are using Spark *1.6.2* as ETL to generate parquet file for one > dataset, and partitioned by "brand" (which is a string to represent brand > in this dataset). > > > After the partition files generated in HDFS like "brand=a" folder, we add > the partitions in the Hive. > > > The hive version is *1.2.1 *(In fact, we are using HDP 2.5.0). > > > Now the problem is that for 2 brand partitions, we cannot query the data > generated in Spark, but it works fine for the rest of partitions. > > > Below is the error in the Hive CLI and hive.log I got if I query the bad > partitions like "select * from tablename where brand='*BrandA*' limit 3;" > > > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.UnsupportedOperationException: Cannot inspect > org.apache.hadoop.io.LongWritable > > > Caused by: java.lang.UnsupportedOperationException: Cannot inspect > org.apache.hadoop.io.LongWritable > at org.apache.hadoop.hive.ql.io.parquet.serde.primitive. > ParquetStringInspector.getPrimitiveWritableObject( > ParquetStringInspector.java:52) > at org.apache.hadoop.hive.serde2.lazy.LazyUtils. > writePrimitiveUTF8(LazyUtils.java:222) > at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe. > serialize(LazySimpleSerDe.java:307) > at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField( > LazySimpleSerDe.java:262) > at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField( > DelimitedJSONSerDe.java:72) > at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe. > doSerialize(LazySimpleSerDe.java:246) > at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize( > AbstractEncodingAwareSerDe.java:50) > at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter. > convert(DefaultFetchFormatter.java:71) > at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter. > convert(DefaultFetchFormatter.java:40) > at org.apache.hadoop.hive.ql.exec.ListSinkOperator.process( > ListSinkOperator.java:90) > ... 22 more > > There are not too much I can find by googling this error message, but it > points to that the schema in Hive is different as in parquet file. > But this is a very strange case, as the same schema works fine for other > brands, which defined as a partition column, and share the whole Hive > schema as the above. > > If I query like: "select * from tablename where brand='*BrandB*' limit > 3:", everything works fine. > > So is this really caused by the Hive schema mismatch with parquet file > generated by Spark, or by the data within different partitioned keys, or > really a compatible issue between Spark/Hive? > > Thanks > > Yong > > >