rouped(100).toList.par.map(groupedParts =>
> spark.read.parquet(groupedParts: _*))
>
> val finalDF = dfs.seq.grouped(100).toList.par.map(dfgroup =>
> dfgroup.reduce(_ union _)).reduce(_ union _).coalesce(2000)
>
>
>
> *From: *Ben Kaylor
> *Date: *Tuesday, March 16, 2021 at 3:2
:
> P.S.: 3. If fast updates are required, one way would be capturing S3
> events & putting the paths/modifications dates/etc of the paths into
> DynamoDB/your DB of choice.
>
>
>
> *From:* Boris Litvak
> *Sent:* Tuesday, 16 March 2021 9:03
> *To:* Ben Kaylor ;
Not sure on answer on this, but am solving similar issues. So looking for
additional feedback on how to do this.
My thoughts if unable to do via spark and S3 boto commands, then have apps
self report those changes. Where instead of having just mappers discovering
the keys, you have services self