Hi, Zain:
The taskmanager.out only contains contents outputted by stdout. Sometimes
some fatal exceptions, like JVM exit exceptions and so on will be outputted
to the .out file. If you don't specify the file path for the gc log, the
content of the gc log will be saved into the .out file, too. Howe
Hi, Puneet:
Like Terry says, if you find your job failed unexpectedly, you could check
the configuration restart-strategy in your flink-conf.yaml. If the restart
strategy is set to be disabled or none, the job will transition to failed
once it encounters a failover. The job would also fail itself
Hi, Aryan:
You could refer to the official docs [1] for how to submit PyFlink jobs.
$ ./bin/flink run \
--target yarn-per-job
--python examples/python/table/word_count.py
With this command you can submit a per-job application to YARN. The docs
[2] and [3] describe how to submit jobs
Hi, Jiaqiao:
Since your job enables checkpoint, you can just try to remove the restart
strategy config. The default value will be fixed-delay with
Integer.MAX_VALUE restart attempts and '1 s' delay, as mentioned in [1]. In
this way when a failover occurs, your job will wait for 1 seconds before it
ptions during the
> restart, and the task manager was restarted a few times until it was
> stabilized.
>
>
>
> You can find the log here:
>
> jobmanager-log.txt.gz
> <https://nokia-my.sharepoint.com/:u:/p/ifat_afek/EUsu4rb_-BpNrkpvSwzI-vgBtBO9OQlIm0CHtW0gsZ7Gqg?email=zh
Hi, Afek!
When a TaskManager is killed, JobManager will not be acknowledged until a
heartbeat timeout happens. Currently, the default value of
heartbeat.timeout is 50 seconds [1]. That's why it takes more than 30
seconds for Flink to trigger a failover. If you'd like to shorten the time
a failover
Thank you for writing this blog post, Daisy and Kevin! It helps me to
understand what sort-based shuffle is and how to use it. Looking forward to
your future improvements!
On Wed, Nov 3, 2021 at 6:32 PM Yuxin Tan wrote:
> Thanks Daisy and Kevin! The IO scheduling idea of the sequential reading
>