Hi,everyone, Is there an option to ignore malformatted records while loading data into hive table? Or an option to ignore bad rows while querying data?
For instance: 1. Specify a row format explicitly for a new table. hive>create table tb (id int, pref string, zip string) row format delimited fields terminated by ',' lines terminated by '\n'; 2. Load data into the table from a csv file that with bad records. hive>load data local inpath 'data.csv' overwrite into table tb; The data.csv might look like: 32,aaa,4200002 <--Blank line 33:bbb:4200003 <--Invalid field delimiter ":" aa,ccc,4200004 <--Non-int number "aa" 3. Select data hive> select * from tb; OK 32 aaa 4200002 NULL NULL NULL NULL NULL NULL NULL ccc 4200004 Time taken: 0.196 seconds I have tried to set mapred.skip.map.max.skip.records,but it seems not to work. Thanks in advance. Regards, Xie -- Best Regards Xie Xianshan -------------------------------------------------- Xie Xianshan Dept.IV of Technology and Development Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road, Nanjing, China PostCode: 210012 PHONE: +86+25-86630566-8522 FUJITSU INTERNAL: 7998-8522 MAIL: xi...@cn.fujitsu.com -------------------------------------------------- This communication is for use by the intended recipient(s) only and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not an intended recipient of this communication, you are hereby notified that any dissemination, distribution or copying hereof is strictly prohibited. If you have received this communication in error, please notify me by reply e-mail, permanently delete this communication from your system, and destroy any hard copies you may have printed