Thanks for your inputs.
I have written a UDF which I have used in addition to the previous
YES_MULTILINE argument in the CSVExcelStorage() of piggybank and it worked
perfect. The particular column has got additional parenthesis surrounded to it.
But that’s ok and I can deal with it.
Here is how I have done
A = LOAD '/path/to/file/location/filename.csv' USING
org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'YES_MULTILINE', 'UNIX',
'SKIP_INPUT_HEADER');
B = FOREACH A GENERATE $0,$1,$2,clean($3); clean is the UDF
to replace \n character in the string
store B into '/desired/file /location using PigStorage('\t');
Sunilmanohar Kancharlapalli
Engineer - IT
[email protected]
Phone:
Cisco Systems Limited
US
Cisco.com
Think before you print.
This email may contain confidential and privileged material for the sole use of
the intended recipient. Any review, use, distribution or disclosure by others
is strictly prohibited. If you are not the intended recipient (or authorized to
receive for the recipient), please contact the sender by reply email and delete
all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
-----Original Message-----
From: Divya Gehlot [mailto:[email protected]]
Sent: Wednesday, July 22, 2015 11:49 PM
To: [email protected]
Subject: Re: Loading data from a CSV file which has '\n' character in a field
you can try this
http://pig.apache.org/docs/r0.7.0/udf.html#Load%2FStore+Functions
On 23 July 2015 at 09:24, Sunilmanohar Kancharlapalli -X (sunkanch - ZENSAR
TECHNOLOGIES INC at Cisco) <[email protected]> wrote:
> I am trying to load a csv file which has ‘\n’ character in the field
> and Pig is considering that as a new record. I am missing the data in
> that particular column and getting additional records in the output table.
>
>
>
> I am using d = LOAD
> '/location/of/the/file/name_of_the_fiel.csv' USING
> org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'YES_MULTILINE',
> 'UNIX', 'SKIP_INPUT_HEADER'); to allow the multi-line possibility in a
> field. Still I am facing the same issue. Where the data is shifting
> into next row.
>
>
>
> Appreciate any help.
>
>
>
>
>
> Thanks
>
> Sunil Kancharlapalli
>
>
>
> [image:
> http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
>
> *Sunilmanohar Kancharlapalli*
> Engineer - IT
> [email protected]
> Phone:
>
> *Cisco Systems Limited*
>
>
>
>
> US
> Cisco.com <http://www.cisco.com>
>
>
>
> [image: Think before you print.]Think before you print.
>
> This email may contain confidential and privileged material for the
> sole use of the intended recipient. Any review, use, distribution or
> disclosure by others is strictly prohibited. If you are not the
> intended recipient (or authorized to receive for the recipient),
> please contact the sender by reply email and delete all copies of this
> message.
>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
>
>
>
>
>