On 11 сент. 2014 г., at 0:47, Felix Chern <idry...@gmail.com> wrote:
> If you don’t want anything get inserted, just set your output to key only or > value only. > TextOutputFormat$LineRecordWriter won’t insert anything unless both values > are set: If I output value only, for instance, and my line contains TAB then everything before TAB will be lost? If I output key only, and my line contains TAB then everything after TAB will be lost? > > public synchronized void write(K key, V value) > throws IOException { > > boolean nullKey = key == null || key instanceof NullWritable; > boolean nullValue = value == null || value instanceof NullWritable; > if (nullKey && nullValue) { > return; > } > if (!nullKey) { > writeObject(key); > } > if (!(nullKey || nullValue)) { > out.write(keyValueSeparator); > } > if (!nullValue) { > writeObject(value); > } > out.write(newline); > } > > On Sep 10, 2014, at 1:37 PM, Dmitry Sivachenko <trtrmi...@gmail.com> wrote: > >> >> On 10 сент. 2014 г., at 22:33, Felix Chern <idry...@gmail.com> wrote: >> >>> Use ‘tr -s’ to stripe out tabs? >>> >>> $ echo -e "a\t\t\tb" >>> a b >>> >>> $ echo -e "a\t\t\tb" | tr -s "\t" >>> a b >>> >> >> There can be tabs in the input, I want to keep input lines without any >> modification. >> >> Actually it is rather standard task: process lines one by one without >> inserting extra characters. There should be standard solution for it IMO. >> >