Thanks for the response Chinmay.

Yes, this issue is during restart of a failed application.

As per my observation, my dag/application failed during Resource Manager 
failover!! And during failover something went wrong to one of the files writing 
to hdfs!!! Application tried many a times to restore the file by launching the 
operator which is writing to hdfs on many containers and failed!!

When I restarted the application, the application again tried many times to 
restore the hdfs file… and still launched many containers to recover….. App 
took really a very long time say 4 – 5hrs to successfully launch those hdfs 
operators and resume!!!


Regards,
Raja.

From: Chinmay Kolhatkar 
<chin...@datatorrent.com<mailto:chin...@datatorrent.com>>
Reply-To: "users@apex.apache.org<mailto:users@apex.apache.org>" 
<users@apex.apache.org<mailto:users@apex.apache.org>>
Date: Saturday, July 23, 2016 at 12:11 AM
To: "users@apex.apache.org<mailto:users@apex.apache.org>" 
<users@apex.apache.org<mailto:users@apex.apache.org>>
Subject: Re: hdfs output file operator

Hi Raja,

I can see such a log message in AbstractFileOutputOperator at line 455.

As this code is called from setup of the operator, the operator is getting 
deployed and then failing while restoring existing file because of mismatch in 
length of the file and the offset the operator has stored previously.

From the code it looks like it takes care of such cases and restores the file.

From what I understand either the file got changes by some other way or the 
offset management has a problem.

Are you restarting the application from previous application Id?

To narrow down the problem, can you please try to change the destination path 
and see if that works?

Thanks,
Chinmay.



On Sat, Jul 23, 2016 at 5:00 AM, Sandesh Hegde 
<sand...@datatorrent.com<mailto:sand...@datatorrent.com>> wrote:
Please check,
         1. AppMaster logs
         2. Cluster resources

On Fri, Jul 22, 2016 at 1:14 PM Raja.Aravapalli 
<raja.aravapa...@target.com<mailto:raja.aravapa...@target.com>> wrote:

Hi,

I have File output operator which writes to hdfs files!!

Application is trying to deploy the operator which writes to hdfs files in many 
different containers for a long time… but is not succeeding!!! Status is 
showing as PENDING_DEPLOY

In the logs of the container which the Application is trying to deploy hdfs 
write operator, I can only see, path corrupted!!


Can someone please guide or suggest me on this ?



Regards,
Raja.

Reply via email to