Re: UPDATE : Adding new columns to parquet based Hive table

Kumar V Thu, 29 Jan 2015 07:43:06 -0800

I wanted to clarify something.  It works if the Hive-Parquet table is a plain 
vanilla table.  But if the table is a partitioned table, then the error occurs 
after adding new fields to the table.   Any ideas on how to handle this ?
hive> create table nvctest_part(col1 string,col2 string, col3 int) partitioned 
by (partcol string)    >  ROW FORMAT SERDE 
'parquet.hive.serde.ParquetHiveSerDe'    > STORED AS INPUTFORMAT  
'parquet.hive.DeprecatedParquetInputFormat'    > OUTPUTFORMAT 
'parquet.hive.DeprecatedParquetOutputFormat';
hive> insert into table  parquet_part partition (partcol)  select code, 
description, salary,'1' from sample_08 limit 4;


hive> select col1 from parquet_part where partcol='1';
00-000011-000011-101111-1021

hive> alter table parquet_part add columns (NewField1 string,Newfield2 
string,newfield3 string);OKTime taken: 0.104 secondshive> desc parquet_part    
> ;OKcol1    string  from deserializercol2    string  from deserializercol3    
int     from deserializernewfield1       string  from deserializernewfield2     
  string  from deserializernewfield3       string  from deserializerpartcol 
stringTime taken: 0.123 seconds
hive> select col1 from parquet_part where partcol='1';

Task with the most failures(4):-----Task ID:  task_201411191237_9181_m_000000
URL:  
http://hadoop3-mgt.hdp.us.grid.nuance.com:50030/taskdetails.jsp?jobid=job_201411191237_9181&tipid=task_201411191237_9181_m_000000-----Diagnostic
 Messages for this Task:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row [Error getting row data with exception 
java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList     
   at 
parquet.hive.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:133)
        at 
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128)
        at 
org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:354)   
     at 
org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:220)     
   at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:669)  
      at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141)     
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)        at 
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)        at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)        at 
org.apache.hadoop.mapred.Child$4.run(Child.ja
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTaskMapReduce Jobs Launched:Job 0: Map: 1  
 HDFS Read: 0 HDFS Write: 0 FAILTotal MapReduce CPU Time Spent: 0 msec

 

     On Wednesday, January 14, 2015 4:20 PM, Kumar V <kumarbuyonl...@yahoo.com> 
wrote:
   

 Hi,    Thanks for your response.I can't do another insert as the data is 
already in the table. Also, since there is a lot of data in the table already, 
I am trying to find a way to avoid reprocessing/reloading.
Thanks. 

     On Wednesday, January 14, 2015 2:47 PM, Daniel Haviv 
<daniel.ha...@veracity-group.com> wrote:
   

 Hi Kumar,Altering the table just update's Hive's metadata without updating 
parquet's schema.I believe that if you'll insert to your table (after adding 
the column) you'll be able to later on select all 3 columns.
Daniel
On 14 בינו׳ 2015, at 21:34, Kumar V <kumarbuyonl...@yahoo.com> wrote:


Hi,
    Any ideas on how to go about this ? Any insights you have would be helpful. 
I am kinda stuck here.
Here are the steps I followed on hive 0.13
1) create table t (f1 String, f2 string) stored as Parquet;2) upload parquet 
files with 2 fields3) select * from t; <---- Works fine.4) alter table t add 
columns (f3 string);5) Select * from t; <----- ERROR  "Caused by: 
java.lang.IllegalStateException: Column f3 at index 2 does not exist at 
org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:116)
  at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
  at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79)
  at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
  at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
  at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)


 

     On Wednesday, January 7, 2015 2:55 PM, Kumar V <kumarbuyonl...@yahoo.com> 
wrote:
   

 Hi,    I have a Parquet format Hive table with a few columns.  I have loaded a 
lot of data to this table already and it seems to work.I have to add a few new 
columns to this table.  If I add new columns, queries don't work anymore since 
I have not reloaded the old data.Is there a way to add new fields to the table 
and not reload the old Parquet files and make the query work ?
I tried this in Hive 0.10 and also on hive 0.13.  Getting an error in both 
versions.
Please let me know how to handle this.
Regards,Kumar.

Re: UPDATE : Adding new columns to parquet based Hive table

Reply via email to