luffyd commented on issue #1913:
URL: https://github.com/apache/hudi/issues/1913#issuecomment-677888677
We started executing the emr job in cluster mode and not seeing this issue
now.
Is your job running in client mode or cluster mode?
--
luffyd commented on issue #1913:
URL: https://github.com/apache/hudi/issues/1913#issuecomment-670385009
It is this, seems latest. This is whatever comes in AWS emr
/mnt2/yarn/usercache/hadoop/appcache/application_1596743154329_0001/container_1596743154329_0001_01_01/__spark_libs__/pa
luffyd commented on issue #1913:
URL: https://github.com/apache/hudi/issues/1913#issuecomment-670211630
From Spark ENV tab, parquet version seems to be this
/mnt2/yarn/usercache/hadoop/appcache/application_1596743154329_0001/container_1596743154329_0001_01_01/__spark_libs__/parquet-f
luffyd commented on issue #1913:
URL: https://github.com/apache/hudi/issues/1913#issuecomment-670039732
Thanks for the input @bvaradar
"Too many open files on IOException" issue also seems to be co-related with
having 2G as max file limit.
Will confirm the parquet version.
Reg
luffyd commented on issue #1913:
URL: https://github.com/apache/hudi/issues/1913#issuecomment-669674913
Have built jars from master branch, latest commit at the time of build was
```
commit 8c4ff185f1752b5041c4e66ac595bd90c2693137 (HEAD -> master, 0r)
Author: mabin001
Date: Tu
luffyd commented on issue #1913:
URL: https://github.com/apache/hudi/issues/1913#issuecomment-668679565
Spark side configuration questions - I have been using client mode, does it
makes a difference using in cluster mode?
Th