The problem is the parameter will pass to TextInputFormat without
interpreting escape sequences, makes it hard to pass \n character.
One alternative approach is to write a simple LoadFunc and passing the
parameter using Java string, which will interpreting escape sequences, for
example:
public class PigStorageNewLine extends PigStorage {
@Override
public void setLocation(String location, Job job) throws IOException {
job.getConfiguration().set("textinputformat.record.delimiter", "\n");
super.setLocation(location, job);
}
}
Thanks,
Daniel
On 11/11/15, 11:59 PM, "Bhagwan S. Soni" <[email protected]> wrote:
>Hi,
>
>I have a file which is coming from any of the source system to *HDFS* with
>more than one *newline character* like *\n* and *\r* which is creating
>extra lines while a MapReduce/Pig job gets invoked.
>I'm ok with having *\n* as newline and just want to avoid *\r*.
>I'm setting newline character while running my pig job using below
>property:
>
>
>
>*-D textinputformat.record.delimiter*
>I tried many of values to set newline character but it is not making any
>difference and reading whole file as a single row.
>Below are some values which i have already tried to set \n as newline
>character -
>
>-D textinputformat.record.delimiter=\\n
>-D textinputformat.record.delimiter=\\u000a
>-D textinputformat.record.delimiter=\u000a
>-D textinputformat.record.delimiter=0x0a
>-D textinputformat.record.delimiter=0x0A
>-D textinputformat.record.delimiter=00001010
>-D textinputformat.record.delimiter=\
\;
>
>Is there any possible value which I'm missing?
>
>I was also looking into creating a custom loader for this and planning
>to extend PigStorage class
>
>but I'm not sure to do that i have to write my own RecordReader as well?
>
>
>*Thanks,*