Re: [DISCUSS][SQL] Control the number of output files

2018-07-25 Thread Forest Fang
won't fix. On Wed, Jul 25, 2018 at 2:26 PM Forest Fang mailto:forest.f...@outlook.com>> wrote: Has there been any discussion to simply support Hive's merge small files configuration? It simply adds one additional stage to inspect size of each output file, recompute the desired par

Re: [DISCUSS][SQL] Control the number of output files

2018-07-25 Thread Forest Fang
won't fix. On Wed, Jul 25, 2018 at 2:26 PM Forest Fang mailto:forest.f...@outlook.com>> wrote: Has there been any discussion to simply support Hive's merge small files configuration? It simply adds one additional stage to inspect size of each output file, recompute the desired par

Re: [DISCUSS][SQL] Control the number of output files

2018-07-25 Thread Forest Fang
Has there been any discussion to simply support Hive's merge small files configuration? It simply adds one additional stage to inspect size of each output file, recompute the desired parallelism to reach a target size, and runs a map-only coalesce before committing the final files. Since AFAIK S

Re: [DISCUSS][SQL] Control the number of output files

2018-07-25 Thread Forest Fang
Has there been any discussion to simply support Hive's merge small files configuration? It simply adds one additional stage to inspect size of each output file, recompute the desired parallelism to reach a target size, and runs a map-only coalesce before committing the final files. Since AFAIK S

Re: [DISCUSS][SQL] Control the number of output files

2018-07-25 Thread Forest Fang
Has there been any discussion to simply support Hive's merge small files configuration? It simply adds one additional stage to inspect size of each output file, recompute the desired parallelism to reach a target size, and runs a map-only coalesce before committing the final files. Since AFAIK S

Re: [DISCUSS][SQL] Control the number of output files

2018-07-25 Thread Forest Fang
won't fix. On Wed, Jul 25, 2018 at 2:26 PM Forest Fang mailto:forest.f...@outlook.com>> wrote: Has there been any discussion to simply support Hive's merge small files configuration? It simply adds one additional stage to inspect size of each output file, recompute the desired par

Re: [DISCUSS][SQL] Control the number of output files

2018-07-25 Thread Forest Fang
won't fix. On Wed, Jul 25, 2018 at 2:26 PM Forest Fang mailto:forest.f...@outlook.com>> wrote: Has there been any discussion to simply support Hive's merge small files configuration? It simply adds one additional stage to inspect size of each output file, recompute the desired par

Re: [DISCUSS][SQL] Control the number of output files

2018-07-25 Thread Forest Fang
Has there been any discussion to simply support Hive's merge small files configuration? It simply adds one additional stage to inspect size of each output file, recompute the desired parallelism to reach a target size, and runs a map-only coalesce before committing the final files. Since AFAIK S