Have you confirmed there are no extraneous tabs, newlines, or carriage returns in your data?
On Mon, Nov 23, 2015 at 5:37 AM, Vijaya Narayana Reddy Bhoomi Reddy < [email protected]> wrote: > Hi, > > I am loading a CSV file, which has 177692 records. However, if I perform a > row count after I load the CSV file into Pig, it gives an output of 177700, > which is 8 records more than the data present in the original file. I am > not doing any processing, but just loading and displaying the record count. > > src_data = LOAD '/user/src_data.csv' USING > org.apache.pig.piggybank.storage.CSVExcelStorage > (',','YES_MULTILINE','UNIX','SKIP_INPUT_HEADER') AS > (col1:chararray, col2:chararray,col3:chararray, col4:chararray); > > alias_for_count = GROUP src_data ALL; > alias_for_join_count = FOREACH alias_for_count GENERATE > COUNT_STAR (src_data ) AS num_rows; > > DUMP alias_for_join_count; > > May I know what could be the reason for this behavior? > > > Thanks & Regards > Vijay > > -- > The contents of this e-mail are confidential and for the exclusive use of > the intended recipient. If you receive this e-mail in error please delete > it from your system immediately and notify us either by e-mail or > telephone. You should not copy, forward or otherwise disclose the content > of the e-mail. The views expressed in this communication may not > necessarily be the view held by WHISHWORKS. >
