Hi Cliff,

the TaskManger fail to start with exit code 31 which indicates an
initialization error on startup. If you check the TaskManager logs via
`yarn logs -applicationId <APP_ID>` you should see the problem why the TMs
don't start up.

Cheers,
Till

On Fri, Nov 9, 2018 at 8:32 PM Cliff Resnick <cre...@gmail.com> wrote:

> Hi Till,
>
> Here are Job Manager logs, same job in both 1.6.0 and 1.6.2 at DEBUG
> level. I saw several errors in 1.6.2, hope it's informative!
>
> Cliff
>
> On Fri, Nov 9, 2018 at 8:34 AM Till Rohrmann <trohrm...@apache.org> wrote:
>
>> Hi Cliff,
>>
>> this sounds not right. Could you share the logs of the Yarn cluster
>> entrypoint with the community for further debugging? Ideally on DEBUG
>> level. The Yarn logs would also be helpful to fully understand the problem.
>> Thanks a lot!
>>
>> Cheers,
>> Till
>>
>> On Thu, Nov 8, 2018 at 9:59 PM Cliff Resnick <cre...@gmail.com> wrote:
>>
>>> I'm running a YARN cluster of 8 * 4 core instances = 32 cores, with a
>>> configuration of 3 slots per TM. The cluster is dedicated to a single job
>>> that runs at full capacity in "FLIP6" mode. So in this cluster, the
>>> parallelism is 21 (7 TMs * 3, one container dedicated for Job Manager).
>>>
>>> When I run the job in 1.6.0, seven Task Managers are spun up as
>>> expected. But if I run with 1.6.2 only four Task Managers spin up and the
>>> job hangs waiting for more resources.
>>>
>>> Our Flink distribution is set up by script after building from source.
>>> So aside from flink jars, both 1.6.0 and 1.6.2 directories are identical.
>>> The job is the same, restarting from savepoint. The problem is repeatable.
>>>
>>> Has something changed in 1.6.2, and if so can it be remedied with a
>>> config change?
>>>
>>>
>>>
>>>
>>>
>>>

Reply via email to