Record metadata with RDDs and DataFrames

2015-07-15 Thread RJ Nowling
Hi all, I'm working on an ETL task with Spark. As part of this work, I'd like to mark records with some info such as: 1. Whether the record is good or bad (e.g, Either) 2. Originating file and lines Part of my motivation is to prevent errors with individual records from stopping the entire

Re: Record metadata with RDDs and DataFrames

2015-07-15 Thread RJ Nowling
I'm considering a few approaches -- one of which is to provide new functions like mapLeft, mapRight, filterLeft, etc. But this all falls shorts with DataFrames. RDDs can easily be extended from RDD[T] to RDD[Record[T]]. I guess with DataFrames, I could add special columns? On Wed, Jul 15, 2015

Re: Record metadata with RDDs and DataFrames

2015-07-15 Thread Reynold Xin
Yea - I'd just add a bunch of columns. Doesn't seem like that big of a deal. On Wed, Jul 15, 2015 at 10:53 AM, RJ Nowling rnowl...@gmail.com wrote: I'm considering a few approaches -- one of which is to provide new functions like mapLeft, mapRight, filterLeft, etc. But this all falls shorts