Re: CSVSplitter - Splittable DoFn

2018-06-18 Thread Robert Bradshaw
Anecdotal evidence is that most people are reading the csv files line-by-line with TextIO and then parsing into columns in a subsequent DoFn, ignoring (or asserting) that quoted newlines won't occur in their data. On Mon, Jun 18, 2018 at 11:27 AM Austin Bennett wrote: > Hi Beam Users/Dev, > > Ho

Re: CSVSplitter - Splittable DoFn

2018-06-18 Thread Austin Bennett
Hi Beam Users/Dev, How are people handling currently handling CSVs as input to Beam (or not really doing so)? I see the things listed at the start of this thread -- any others? I have many batch workflows involve getting multi-GB CSV files from third party data aggregators (ex: hourly) and inges