Re: bad data output

2018-04-03 Thread Darshan Singh
Thanks, As of now I have decided to write it to hdfs from within the function. Thanks On Tue, Apr 3, 2018 at 10:58 AM, Kostas Kloudas wrote: > Hi Darshan, > > You can use side outputs [1] and a process function to split the data in > as many streams as you want, > e.g. correct, fixable and wro

Re: bad data output

2018-04-03 Thread Kostas Kloudas
Hi Darshan, You can use side outputs [1] and a process function to split the data in as many streams as you want, e.g. correct, fixable and wrong. Each side output will be a separate stream that your can process individually. You can always send the “bad data” directly from your process functio

Re: bad data output

2018-03-29 Thread 杨力
You can use a split operator, generating 2 streams. Darshan Singh 于 2018年3月30日周五 上午2:53写道: > Hi > > I have a dataset which has almost 99% of correct data. As of now if say > some data is bad I just ignore it and log it and return only correct data. > I do this inside a map function. > > The part

bad data output

2018-03-29 Thread Darshan Singh
Hi I have a dataset which has almost 99% of correct data. As of now if say some data is bad I just ignore it and log it and return only correct data. I do this inside a map function. The part which decides whether data is correct or not is expensive one. Now I want to store the bad data somewher