Thanks guys!!!

It was as simple as to run a simple unix command and clean the source data.

sed -i 's/\r//g' <filename


After applying this command on the dataset to removed carraige returns I
was able to load the hive table with expected record count.

Thanks,
Balajee Venkatesh
On 30-May-2017 10:29, "JP gupta" <[email protected]> wrote:

Assuming that special characters have been added by Windows platform as
mentioned by  Shakti Singh, one easy way to cleanup the file is using the
command “*dos2unix filename*”.



*From:* shakti singh Shekhawat [mailto:[email protected]]
*Sent:* 30 May 2017 10:02
*To:* [email protected]
*Subject:* Re: Table count is more than File count after loading in hive



Hi Balajee,



The best way will be to clean the file in Unix(or perl or python) before
loading the file in HIVE. The root cause should be most probably carriage
return(occurs as mostly the files generated on Microsoft platform consists
of ^M characters in file). To identify whether carriage return is the
problem, try the below few steps:

1. `file` command will give you all Line terminators(\n,etc) in your file
but it will be in ASCII value.

Ex: file yourfilename

yourfilename: UTF-8 Unicode text, with CRLF, CR, LF line terminators

2. To find what CR(\r), LF(\n) and CRLF(\r\n) mean, try:

man ascii

Till here you will know whether there are carriage returns(\r) in your file
which breaks the record in HIVE.

3. To identify at which place the carriage return is there, open the file
in vi-editor

Press Esc

Type   :set list

This should display all the ^M characters highlighted. Find the record
where you can see ^M in between the record. Go to Hive table do a select on
this record, you will see that the HIVE record is broken exactly where the
^M is seen in the file.



Please let us know if this works in identifying the issue, if carriage
return is the problem, next step is to remove carriage return from your
file(you can easily find commands in stack overflow, let me know if nothing
works).



Thanks,

Shakti

Reply via email to