Thanks,
As of now I have decided to write it to hdfs from within the function.
Thanks
On Tue, Apr 3, 2018 at 10:58 AM, Kostas Kloudas wrote:
> Hi Darshan,
>
> You can use side outputs [1] and a process function to split the data in
> as many streams as you want,
> e.g. correct, fixable and wro
Hi Darshan,
You can use side outputs [1] and a process function to split the data in as
many streams as you want,
e.g. correct, fixable and wrong. Each side output will be a separate stream
that your can process individually.
You can always send the “bad data” directly from your process functio
You can use a split operator, generating 2 streams.
Darshan Singh 于 2018年3月30日周五 上午2:53写道:
> Hi
>
> I have a dataset which has almost 99% of correct data. As of now if say
> some data is bad I just ignore it and log it and return only correct data.
> I do this inside a map function.
>
> The part
Hi
I have a dataset which has almost 99% of correct data. As of now if say
some data is bad I just ignore it and log it and return only correct data.
I do this inside a map function.
The part which decides whether data is correct or not is expensive one.
Now I want to store the bad data somewher