Thanks for your reply,Ashutosh.
I found that the Syntax of Load Data in MySql supports serveral options
to ignore rows,
such as TERMINATED BY, LINES STARTING BY and so on.
But unfortunately,it seems to be unsupported by hive so far.



于 2011年08月25日 06:09, Ashutosh Chauhan 写道:
> One possibility is to filter out NULLs, something like following:
>
> hive> select * from tb where id != NULL or pref != NULL or zip != NULL;
>
> This is not most efficient, but will work.
>
> 2011/8/18 XieXianshan <xi...@cn.fujitsu.com <mailto:xi...@cn.fujitsu.com>>
>
>     Hi,everyone,
>
>     Is there an option to ignore malformatted records while loading data
>     into hive table?
>     Or an option to ignore bad rows while querying data?
>
>     For instance:
>     1. Specify a row format explicitly for a new table.
>     hive>create table tb (id int, pref string, zip string) row format
>     delimited fields terminated by ',' lines terminated by '\n';
>
>     2. Load data into the table from a csv file that with bad records.
>     hive>load data local inpath 'data.csv' overwrite into table tb;
>
>     The data.csv might look like:
>     32,aaa,4200002
>     <--Blank line
>     33:bbb:4200003 <--Invalid field delimiter ":"
>     aa,ccc,4200004 <--Non-int number "aa"
>
>     3. Select data
>     hive> select * from tb;
>     OK
>     32 aaa 4200002
>     NULL NULL NULL
>     NULL NULL NULL
>     NULL ccc 4200004
>     Time taken: 0.196 seconds
>
>     I have tried to set mapred.skip.map.max.skip.records,but it seems
>     not to
>     work.
>
>     Thanks in advance.
>
>     Regards,
>     Xie
>
>     --
>     Best Regards
>     Xie Xianshan
>     --------------------------------------------------
>     Xie Xianshan
>     Dept.IV of Technology and Development
>     Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
>     No. 6 Wenzhu Road, Nanjing, China
>     PostCode: 210012
>     PHONE: +86+25-86630566-8522
>     FUJITSU INTERNAL: 7998-8522
>     MAIL: xi...@cn.fujitsu.com <mailto:xi...@cn.fujitsu.com>
>     --------------------------------------------------
>     This communication is for use by the intended recipient(s) only
>     and may
>     contain information that is privileged, confidential and exempt from
>     disclosure under applicable law. If you are not an intended
>     recipient of
>     this communication, you are hereby notified that any dissemination,
>     distribution or copying hereof is strictly prohibited. If you have
>     received this communication in error, please notify me by reply
>     e-mail,
>     permanently delete this communication from your system, and
>     destroy any
>     hard copies you may have printed
>
>


-- 
Best Regards
Xie Xianshan
--------------------------------------------------
Xie Xianshan
Dept.IV of Technology and Development
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, China
PostCode: 210012
PHONE: +86+25-86630566-8522
FUJITSU INTERNAL: 7998-8522
MAIL: xi...@cn.fujitsu.com
--------------------------------------------------
This communication is for use by the intended recipient(s) only and may
contain information that is privileged, confidential and exempt from
disclosure under applicable law. If you are not an intended recipient of
this communication, you are hereby notified that any dissemination,
distribution or copying hereof is strictly prohibited.  If you have
received this communication in error, please notify me by reply e-mail,
permanently delete this communication from your system, and destroy any
hard copies you may have printed 

Reply via email to