U may have to remove non-printable chars first, save an intermediate file and 
then load into Hive

tr -cd '[:print:]\r\n\t'

Or if u have strings function that will only output printable chars


From: Raj Hadoop <hadoop...@yahoo.com<mailto:hadoop...@yahoo.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>, Raj Hadoop 
<hadoop...@yahoo.com<mailto:hadoop...@yahoo.com>>
Date: Monday, July 8, 2013 1:52 PM
To: Hive <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Special characters in web log file causing issues


Hi ,

The log file that I am trying to load throuh Hive has some special characters

The field is shown below and the special characters ¿¿are also shown.

    Shockwave Flash;Chrome Remote Desktop Viewer;Native Client;Chrome PDF 
Viewer;Adobe Acrobat;Microsoft Office 2010;Motive Plug-
    in;Motive Management Plug-in;Google Update;Java(TM) Platform SE 7 
U21;McAfee SiteAdvisor;McAfee Virtual Technician;Windows     Live¿¿ Photo 
Gallery;McAfee SecurityCenter;Silverlig


The above is causing the record to be terminated and loading another line.  How 
can I avoid this type of issues and how to load the proper data ? Any 
suggestions please.

Thanks,
Raj

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

Reply via email to