Hi all,

I’m trying to ingest data that contains what I think are invalid characters, 
and Flume is behaving a bit strangely.  There’s a single agent with a Spooling 
Directory source, and  a HDFS sink, ingesting CSV files to be queried with 
Drill. Whenever Flume attempts to ingest the bad row, it doesn’t log any error, 
but instead writes a truncated row to HDFS. Drill then fails to query any data 
including this row, as there is a newline in a quoted CSV string. Is there any 
way to try and handle this? I wrote a custom interceptor to replace characters 
using a regex with ‘\p{C}’, but that didn’t help.

Data in Spooling directory CSV:
"13 Jan 2013 11:23:11 GMT ","Field 1","Field 2","Field 
3","Field���򔪌i�@�V%20�C%20�","Field 5"

Data written to HDFS:
"13 Jan 2013 11:23:11 GMT ","Field 1","Field 2","Field 3","Field���

Output of ‘cat –v’, which prints unprintable characters:
Data in spooling directory:
"13 Jan 2013 11:23:11 GMT ","Field 1","Field 2","Field 
3","FieldM-oM-?M-=M-oM-?M-=M-oM-?M-=M-rM-^TM-*M-^LiM-oM-?M-=@M-oM-?M-=V%20M-oM-?M-=C%20M-oM-?M-=","Field
 5"

Data written to HDFS:
"13 Jan 2013 11:23:11 GMT ","Field 1","Field 2","Field 
3","FieldM-oM-?M-=M-oM-?M-=M-oM-?M-=

Regards,
Chris


This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm

Reply via email to