Ananth,

The heartbeat timeout means that the operator is not sending back the
window heartbeat information to the app master. It usually happens because
of one of two reasons.

1. System failure - container died, network failure etc.
2. Windows not moving forward in the operator. Some business logic in the
operator is blocking the windows. You can observe the window IDs on the UI
for the given operator when it is running to quickly find out if this is
the issue.

Regards,
Ashwin.
On May 17, 2016 11:05 PM, "Ananth Gundabattula" <agundabatt...@gmail.com>
wrote:

Hello Sandeep,

Thanks for the response. Please find attached the app master log.

It looks like it got killed due to a heartbeat timeout. I will have to see
why I am getting a heartbeat timeout. I also see a JSON parser exception in
the logs in the log attached. Is it a harmless exception  ?


Regards,
Ananth

On Wed, May 18, 2016 at 2:45 PM, Sandeep Deshmukh <sand...@datatorrent.com>
wrote:

> Dear Ananth,
>
> Could you please check the STRAM logs for any details of these containers.
> The first guess would be container going out of memory .
>
> Regards,
> Sandeep
>
> On Wed, May 18, 2016 at 10:05 AM, Ananth Gundabattula <
> agundabatt...@gmail.com> wrote:
>
>> Hello All,
>>
>> I was wondering what would be the case for a container to be killed by
>> the application master ?
>>
>> I see the following in the UI when I click on details :
>>
>> "
>>
>> Container killed by the ApplicationMaster.
>> Container killed on request. Exit code is 143
>> Container exited with a non-zero exit code 143
>>
>> "
>>
>> I see zome exceptions in the dtgateway.log and am not sure if they are 
>> related.
>>
>> I am running Apex 3.3.0 on CDH 5.7 and HA enabled (HA for YARN as well as 
>> HDFS is enabled).
>>
>>
>>
>>
>>
>

Reply via email to