Please try with maxBytesPerTrigger option, probably files are big enough to
crash the JVM.
Please give some info on Executors and file info ( size etc)
Regards,
..Piyush
On Sun, Jul 19, 2020 at 3:29 PM Rachana Srivastava
wrote:
> *Issue:* I am trying to process 5000+ files of gzipped json file
Can you reduce maxFilesPerTrigger further and see if the OOM still persists, if
it does then the problem may be somewhere else.
> On Jul 19, 2020, at 5:37 AM, Jungtaek Lim
> wrote:
>
> Please provide logs and dump file for the OOM case - otherwise no one could
> say what's the cause.
>
>
Please provide logs and dump file for the OOM case - otherwise no one could
say what's the cause.
Add JVM options to driver/executor => -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="...dir..."
On Sun, Jul 19, 2020 at 6:56 PM Rachana Srivastava
wrote:
> *Issue:* I am trying to process 5000+
Issue: I am trying to process 5000+ files of gzipped json file periodically
from S3 using Structured Streaming code.
Here are the key steps:
-
Read json schema and broadccast to executors
-
Read Stream
Dataset inputDS = sparkSession.readStream() .format("text")