[GitHub] [hudi] abhijeetkushe commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-09-22 Thread GitBox
abhijeetkushe commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-696926015 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] abhijeetkushe commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-09-22 Thread GitBox
abhijeetkushe commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-696934366 S3 parquet files ![S3_ParquetFiles](https://user-images.githubusercontent.com/2093096/93928593-8c9e0580-fce8-11ea-9af0-16c5a179a647.jpg) .hoodie files

[GitHub] [hudi] abhijeetkushe commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-09-22 Thread GitBox
abhijeetkushe commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-696926015 @n3nash Apologies for the delayed response.I tried a bunch of heuristics from the available config options for both COW and MOR and I think I got a idea of how the file creation

[GitHub] [hudi] abhijeetkushe commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-09-12 Thread GitBox
abhijeetkushe commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-691268965 @bvaradar The hudi version we are using 0.5.2-incubating deployed on EMR. Good point on the terminology.I will rephrase my question COW with 'hoodie.cleaner.commits.retained':

[GitHub] [hudi] abhijeetkushe commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-09-12 Thread GitBox
abhijeetkushe commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-691268965 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] abhijeetkushe commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-09-12 Thread GitBox
abhijeetkushe commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-691268965 @bvaradar The hudi version we are using 0.5.2-incubating deployed on EMR. Good point on the terminology.I will rephrase my question COW with 'hoodie.cleaner.commits.retained':

[GitHub] [hudi] abhijeetkushe commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-09-11 Thread GitBox
abhijeetkushe commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-691268965 @bvaradar The hudi version we are using 0.5.2-incubating deployed on EMR. Good point on the terminology.I will rephrase my question COW with 'hoodie.cleaner.commits.retained':

[GitHub] [hudi] abhijeetkushe commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-09-10 Thread GitBox
abhijeetkushe commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-690685990 I am facing a similar problem.I am doing a POC for Hudi and am using with the same data for both COW and MOR.I see the compaction happening for both table types as new versions