I wanted to clarify something. It works if the Hive-Parquet table is a plain vanilla table. But if the table is a partitioned table, then the error occurs after adding new fields to the table. Any ideas on how to handle this ? hive> create table nvctest_part(col1 string,col2 string, col3 int) partitioned by (partcol string) > ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' > OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'; hive> insert into table parquet_part partition (partcol) select code, description, salary,'1' from sample_08 limit 4;
hive> select col1 from parquet_part where partcol='1'; 00-000011-000011-101111-1021 hive> alter table parquet_part add columns (NewField1 string,Newfield2 string,newfield3 string);OKTime taken: 0.104 secondshive> desc parquet_part > ;OKcol1 string from deserializercol2 string from deserializercol3 int from deserializernewfield1 string from deserializernewfield2 string from deserializernewfield3 string from deserializerpartcol stringTime taken: 0.123 seconds hive> select col1 from parquet_part where partcol='1'; Task with the most failures(4):-----Task ID: task_201411191237_9181_m_000000 URL: http://hadoop3-mgt.hdp.us.grid.nuance.com:50030/taskdetails.jsp?jobid=job_201411191237_9181&tipid=task_201411191237_9181_m_000000-----Diagnostic Messages for this Task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList at parquet.hive.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:133) at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:354) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:220) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:669) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.ja FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTaskMapReduce Jobs Launched:Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAILTotal MapReduce CPU Time Spent: 0 msec On Wednesday, January 14, 2015 4:20 PM, Kumar V <kumarbuyonl...@yahoo.com> wrote: Hi, Thanks for your response.I can't do another insert as the data is already in the table. Also, since there is a lot of data in the table already, I am trying to find a way to avoid reprocessing/reloading. Thanks. On Wednesday, January 14, 2015 2:47 PM, Daniel Haviv <daniel.ha...@veracity-group.com> wrote: Hi Kumar,Altering the table just update's Hive's metadata without updating parquet's schema.I believe that if you'll insert to your table (after adding the column) you'll be able to later on select all 3 columns. Daniel On 14 בינו׳ 2015, at 21:34, Kumar V <kumarbuyonl...@yahoo.com> wrote: Hi, Any ideas on how to go about this ? Any insights you have would be helpful. I am kinda stuck here. Here are the steps I followed on hive 0.13 1) create table t (f1 String, f2 string) stored as Parquet;2) upload parquet files with 2 fields3) select * from t; <---- Works fine.4) alter table t add columns (f3 string);5) Select * from t; <----- ERROR "Caused by: java.lang.IllegalStateException: Column f3 at index 2 does not exist at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:116) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65) On Wednesday, January 7, 2015 2:55 PM, Kumar V <kumarbuyonl...@yahoo.com> wrote: Hi, I have a Parquet format Hive table with a few columns. I have loaded a lot of data to this table already and it seems to work.I have to add a few new columns to this table. If I add new columns, queries don't work anymore since I have not reloaded the old data.Is there a way to add new fields to the table and not reload the old Parquet files and make the query work ? I tried this in Hive 0.10 and also on hive 0.13. Getting an error in both versions. Please let me know how to handle this. Regards,Kumar.