[GitHub] [hudi] umehrot2 commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2020-09-15 Thread GitBox
umehrot2 commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-693142002 @rubenssoto No this is not solved in 0.6.0. RFC 15 is still under development. As @bvaradar it is being targeted in a 1 - 2 months timeframe.

[GitHub] [hudi] umehrot2 commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2020-08-24 Thread GitBox
umehrot2 commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-679408763 @rubenssoto yes currently EMR presto is on 0.232, but in upcoming releases you will see later versions of presto where you will be able to use this patch. If you want to

[GitHub] [hudi] umehrot2 commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2020-08-20 Thread GitBox
umehrot2 commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-678000764 @rubenssoto until this is fixed would you been okay querying through `spark-sql` instead ? Since you are using COW, you can make your spark-sql queries use spark's listing

[GitHub] [hudi] umehrot2 commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2020-08-20 Thread GitBox
umehrot2 commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-677977073 I understand that recently we made changes in Presto to use `Path Filter` instead. Athena is on an older version and does not have the `Path Filter` patch in Presto. So I am not sure

[GitHub] [hudi] umehrot2 commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2020-08-20 Thread GitBox
umehrot2 commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-677974115 @vinothchandar `native parquet readers` are used only in `COW` use-case, but even then splits are fetched through `InputFormat` which also in the process does `listing`. For `MOR`

[GitHub] [hudi] umehrot2 commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2020-08-19 Thread GitBox
umehrot2 commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-676855516 @vinothchandar @rubenssoto I am thinking this could just be the difference between presto's performance over regular parquet where it completely uses its native parquet readers, vs