Hi all,
I'm working on an ETL task with Spark. As part of this work, I'd like to
mark records with some info such as:
1. Whether the record is good or bad (e.g, Either)
2. Originating file and lines
Part of my motivation is to prevent errors with individual records from
stopping the entire
I'm considering a few approaches -- one of which is to provide new
functions like mapLeft, mapRight, filterLeft, etc.
But this all falls shorts with DataFrames. RDDs can easily be extended
from RDD[T] to RDD[Record[T]]. I guess with DataFrames, I could add
special columns?
On Wed, Jul 15, 2015
Yea - I'd just add a bunch of columns. Doesn't seem like that big of a deal.
On Wed, Jul 15, 2015 at 10:53 AM, RJ Nowling rnowl...@gmail.com wrote:
I'm considering a few approaches -- one of which is to provide new
functions like mapLeft, mapRight, filterLeft, etc.
But this all falls shorts