The general approach in your example looks reasonable to me. I don't think
there's anything built in to Beam to help with parsing the tar file format
and I don't know how robust the method of replacing "^@" and then splitting
on newlines will be. I'd likely use Apache's commons-compress library for
Hi,
I am newbie to apache beam
I am trying to write a simple pipeline using apache beam java sdk.
the pipleline will read a bunch of tgz files.
each tgz files have multiple CSV files with data
public static final void main(String args[]) throws Exception {
PipelineOptions options = PipelineO