Hi Regina,
I've taken another look at the problem I think we could improve the
situation by reordering the calls we do in
YarnResourceManager#onContainersAllocated. I've created a PR [1] for the
re-opened issue [2]. Would it be possible for you to verify the fix? What
you need to do is to check
Hi Regina,
at the moment the community works towards the 1.10 release with a lot of
features trying to be completed. The intended feature freeze is end of
November. Due to this it is quite hard to tell when exactly this problem
will be properly fixed but we'll try our best.
Cheers,
Till
On Thu,
Hi Regina, sorry for not getting back to you earlier. I've gone through the
logs and I couldn't find something suspicious. What I can see though is the
following:
When you start the cluster, you submit a couple of jobs. This starts at
9:20. In total 120 slots are being required to run these jobs.
Hi Chan,
If it is a bug, i think it is critical. Could you share the job manager
logs with me too? I have some time to
analyze and hope to find the root cause.
Best,
Yang
Chan, Regina 于2019年10月30日周三 上午10:55写道:
> Till, were you able find anything? Do you need more logs?
>
>
>
>
>
> *From:*
Forget my last email. I received the on time code and could access the logs.
Cheers,
Till
On Sat, Oct 26, 2019 at 6:49 PM Till Rohrmann wrote:
> Hi Regina,
>
> I couldn't access the log files because LockBox asked to create a new
> password and now it asks me for the one time code to confirm
Hi Regina,
I couldn't access the log files because LockBox asked to create a new
password and now it asks me for the one time code to confirm this change.
It says that it will send the one time code to my registered email which I
don't have.
Cheers,
Till
On Fri, Oct 25, 2019 at 10:14 PM Till
Great, thanks a lot Regina. I'll check the logs tomorrow. If info level is
not enough, then I'll let you know.
Cheers,
Till
On Fri, Oct 25, 2019, 21:20 Chan, Regina wrote:
> Till, I added you to this lockbox area where you should be able to
> download the logs. You should have also received an
Could you provide me with the full logs of the cluster
entrypoint/JobManager. I'd like to see what's going on there.
Cheers,
Till
On Fri, Oct 25, 2019, 19:10 Chan, Regina wrote:
> Till,
>
>
>
> We’re still seeing a large number of returned containers even with this
> heart beat set to
Till,
We’re still seeing a large number of returned containers even with this heart
beat set to something higher. Do you have hints as to what’s going on? It seems
to be bursty in nature. The bursty requests cause the job to fail with the
cluster not having enough resources because it’s in the
Yeah thanks for the responses. We’re in the process of testing 1.9.1 after we
found https://issues.apache.org/jira/browse/FLINK-12342 as the cause of the
original issue. FLINK-9455 makes sense as to why it didn’t work on legacy mode.
From: Till Rohrmann
Sent: Wednesday, October 23, 2019 5:32
Hi Regina,
When using the FLIP-6 mode, you can control how long it takes for an idle
TaskManager to be released via resourcemanager.taskmanager-timeout. Per
default it is set to 30s.
In the Flink version you are using, 1.6.4, we do not support TaskManagers
with multiple slots properly [1]. The
Hi Chan,
After FLIP-6, the Flink ResourceManager dynamically allocate resource from
Yarn on demand.
What's your flink version? On the current code base, if the pending
containers in resource manager
is zero, then it will releaseall the excess containers. Could you please
check the
"Remaining
12 matches
Mail list logo