Have you confirmed there are no extraneous tabs, newlines, or carriage
returns in your data?

On Mon, Nov 23, 2015 at 5:37 AM, Vijaya Narayana Reddy Bhoomi Reddy <
[email protected]> wrote:

> Hi,
>
> I am loading a CSV file, which has 177692 records. However, if I perform a
> row count after I load the CSV file into Pig, it gives an output of 177700,
> which is 8 records more than the data present in the original file. I am
> not doing any processing, but just loading and displaying the record count.
>
> src_data = LOAD '/user/src_data.csv' USING
> org.apache.pig.piggybank.storage.CSVExcelStorage
> (',','YES_MULTILINE','UNIX','SKIP_INPUT_HEADER') AS
> (col1:chararray, col2:chararray,col3:chararray, col4:chararray);
>
> alias_for_count  = GROUP src_data ALL;
> alias_for_join_count = FOREACH alias_for_count  GENERATE
> COUNT_STAR (src_data ) AS num_rows;
>
> DUMP alias_for_join_count;
>
> May I know what could be the reason for this behavior?
>
>
> Thanks & Regards
> Vijay
>
> --
> The contents of this e-mail are confidential and for the exclusive use of
> the intended recipient. If you receive this e-mail in error please delete
> it from your system immediately and notify us either by e-mail or
> telephone. You should not copy, forward or otherwise disclose the content
> of the e-mail. The views expressed in this communication may not
> necessarily be the view held by WHISHWORKS.
>

Reply via email to