[GitHub] [hudi] bvaradar commented on issue #1847: [SUPPORT] querying MoR tables on S3 becomes slow with number of files growing

2020-09-02 Thread GitBox
bvaradar commented on issue #1847: URL: https://github.com/apache/hudi/issues/1847#issuecomment-685841451 @zuyanton : I am not sure if this is still an issue. Since, this seems specific to EMR, can you open a ticket with EMR folks directly ? --

[GitHub] [hudi] bvaradar commented on issue #1847: [SUPPORT] querying MoR tables on S3 becomes slow with number of files growing

2020-07-24 Thread GitBox
bvaradar commented on issue #1847: URL: https://github.com/apache/hudi/issues/1847#issuecomment-663450319 @bschell : Thanks for the information. As getLen() is used extensively both on read and write side, can you let us elaborate more on what cases does it actually result in RPC calls ? I

[GitHub] [hudi] bvaradar commented on issue #1847: [SUPPORT] querying MoR tables on S3 becomes slow with number of files growing

2020-07-20 Thread GitBox
bvaradar commented on issue #1847: URL: https://github.com/apache/hudi/issues/1847#issuecomment-661461345 @zuyanton : I am not sure if I can find the source code of this class. @umehrot2 : Can you let me know if the current implementation of FileStatus returned S3NativeFileSystem overrides

[GitHub] [hudi] bvaradar commented on issue #1847: [SUPPORT] querying MoR tables on S3 becomes slow with number of files growing

2020-07-19 Thread GitBox
bvaradar commented on issue #1847: URL: https://github.com/apache/hudi/issues/1847#issuecomment-660836081 @zuyanton : Thanks for the detailed write-up. This is very interesting. If you look at the base implementation of FileStatus getLen() method, it returns a cached copy of the length.