Re: [slider-agent] Different "base-dir" for STOP and STATUS commands

2016-07-08 Thread Billie Rinaldi
Ah, I think this is happening because you're using OS X. Normally YARN kills the entire process group (including child processes) when the container is killed, but it cannot do this on OS X. So the agent process is living longer than it should. A question we may want to look into would be why the A

Re: [slider-agent] Different "base-dir" for STOP and STATUS commands

2016-07-07 Thread Gour Saha
Is this a multi node cluster? If yes, can you check the time across all the nodes and make sure they are in sync. If not, did you override any timeout properties via the app config or resources? If you could share these json files which you used to start the app, it will help to debug further. -

Re: [slider-agent] Different "base-dir" for STOP and STATUS commands

2016-07-07 Thread Sarthak Kukreti
Thanks! That was helpful. (Strangely) As it turns out, the container is released (and cleaned up) even before the STOP command is queued. Some more logs: Node Manager: - 2016-07-07 15:50:14,148 [AmExecutor-006] INFO state.AppState - Role ConnectD flexed from 2 to 1 2016-07-07

Re: [slider-agent] Different "base-dir" for STOP and STATUS commands

2016-07-07 Thread Billie Rinaldi
If you look for the container ID in the nodemanager log on the host where the container was running, you should be able to see when the container stopped and was cleaned up. Looks like it even logs when it deletes the container directories. On Thu, Jul 7, 2016 at 2:04 PM, Sarthak Kukreti wrote:

Re: [slider-agent] Different "base-dir" for STOP and STATUS commands

2016-07-07 Thread Sarthak Kukreti
kafka,py is still present in the filecache directory: its just the "container_1467829690678_0022_01_03" directory that seems to be deleted before the runCommand() call - Sarthak On Thu, Jul 7, 2016 at 12:35 PM, Billie Rinaldi wrote: > I think that > /private/tmp/hdfs/nm-local-dir/usercache/s

Re: [slider-agent] Different "base-dir" for STOP and STATUS commands

2016-07-07 Thread Billie Rinaldi
I think that /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_03/app/definition is linked to /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/113/slider-kafka-package-1.0.

[slider-agent] Different "base-dir" for STOP and STATUS commands

2016-07-07 Thread Sarthak Kukreti
Hello! I am trying to use Slider to distribute an application over a YARN cluster. While attempting to use "slider flex" to decrease the number of containers allocated for the application (using the kafka app-package as reference), I came across the following error: ERROR 2016-07-07 10:57:36,461