Hi community,
We are currently using* Externalized Checkpoints* to prevent abrupt YARN
application failures, as it saves a "_metadata" file within the checkpoint
folder which is essential for the job's cold recovery.
As it is designed in Flink, the completed checkpoint paths are like
*hdfs:///fli
wrote:
> Great, thanks a lot Weike. I think the first step would be to open a JIRA
> issue, get assigned and then start on fixing it and opening a PR.
>
> Cheers,
> Till
>
> On Fri, Oct 16, 2020 at 10:02 AM DONG, Weike
> wrote:
>
>> Hi all,
>>
>> Than
plit
> assignments and for the LocationPreferenceSlotSelectionStrategy to
> calculate how many TMs run on the same machine).
>
> Do you want to fix this issue?
>
> Cheers,
> Till
>
> On Thu, Oct 15, 2020 at 11:38 AM DONG, Weike
> wrote:
>
>> Hi Till and community,
>>
>&g
high variance, i. e. normally it completes fast but occasionally
some slow results would block the thread. So an unstable DNS server might
have a great impact on the performance of Flink job startup.
Best,
Weike
On Thu, Oct 15, 2020 at 5:19 PM DONG, Weike wrote:
> Hi Till and commun
k at them. My suspicion
>> would be that there is some operation blocking the JobMaster's main thread
>> which causes the registrations from the TMs to time out. Maybe the logs
>> allow me to validate/falsify this suspicion.
>>
>> Cheers,
>> Till
>>
>> O
://gist.github.com/kylemeow/740c470d9b5a1ab3552376193920adce
TaskManager-1-1:
https://gist.github.com/kylemeow/41b9a8fe91975875c40afaf58276c2fe
Thanks : )
Best regards,
Weike
On Mon, Oct 12, 2020 at 4:14 PM DONG, Weike wrote:
> Hi community,
>
> Recently we have noticed a strange behavior
Hi community,
Recently we have noticed a strange behavior for Flink jobs on Kubernetes
per-job mode: when the parallelism increases, the time it takes for the
TaskManagers to register with *JobManager *becomes abnormally long (for a
task with parallelism of 50, it could take 60 ~ 120 seconds or ev
gt;> remember whether a request is currently ongoing or not.
>>
>> Cheers,
>> Till
>>
>> On Tue, Mar 17, 2020 at 9:01 AM DONG, Weike
>> wrote:
>>
>>> Hi Tison & Till and all,
>>>
>>> I have uploaded the client, taskmanager an
gt;> RestServer which then is not able to serve the response to the client. I'm
>>> pulling in Aljoscha and Tison who introduced this change. They might be
>>> able to verify my theory and propose a solution for it.
>>>
>>> [1] https://issues.apa
hy the task executor
> is killed? If it is killed by Yarn, you might get such info in Yarn
> NM/RM logs.
>
> Best,
> Yangze Guo
>
> Best,
> Yangze Guo
>
>
> On Fri, Mar 13, 2020 at 12:31 PM DONG, Weike
> wrote:
> >
> > Hi,
> >
> > Recently
Hi,
Recently I have encountered a strange behavior of Flink on YARN, which is
that when I try to cancel a Flink job running in per-job mode on YARN using
commands like
"cancel -m yarn-cluster
-yid application_1559388106022_9412 ed7e2e0ab0a7316c1b65df6047bc6aae"
the client happily found and conne
Hi,
>From the Flink 1.10 official document (
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/queries.html),
we could see that GROUPING SETS is only supported in Batch mode.
[image: image.png]
However, we also found that in
https://issues.apache.org/jira/browse/FLINK-1
12 matches
Mail list logo