Could be a lot of different reasons. Memory problems, algorithm problems,
etc.

I recommend you to focus in reach the logs instead of guessing why the
worker's are dying. Maybe you are looking in the wrong place, maybe you can
access to them though web ui instead of command line.

>From terminal, doing yarn logs -applicationId "id" doing will be enough for
seeing them. If you want to access your phyisical files in your nodes, you
should go to all nodes and check everyone of them, and search for the
different containers of your application in the directory where those are.

Another link with help:
http://stackoverflow.com/questions/32713587/how-to-keep-yarns-log-files.

Maybe you could test the algorithm locally instead of running it on the
cluster, for a better understanding of the relation between yarn and Giraph.

Bye

-- 
*José Luis Larroque*
Analista Programador Universitario - Facultad de Informática - UNLP
Desarrollador Java y .NET  en LIFIA

2017-02-27 12:27 GMT-03:00 Sai Ganesh Muthuraman <saiganesh...@gmail.com>:

> Hi,
>
> The first container in the application logs usually contains the gam logs.
> But the first container logs are not available. Hence no gam logs.
> What could be the possible reasons for the dying of some workers?
>
>
> Sai Ganesh
>
>
>
> On Feb 25, 2017, at 9:30 PM, José Luis Larroque <user@giraph.apache.org>
> wrote:
>
> You are probably looking at your giraph application manager (gam) logs.
> You should look for your workers logs, each one have a log (container's
> logs). If you can't find them, you should look at your yarn configuration
> in order to know where are them, see this: http://stackoverflow.com/
> questions/21621755/where-does-hadoop-store-the-logs-of-yarn-applications.
>
> I don't recommend you to enable checkpointing until you now the specific
> error that you are facing. If you are facing out of memory errors for
> example, checkpointing won't be helpful in my experience, the same error
> will happen over and over.
>
> --
> *José Luis Larroque*
> Analista Programador Universitario - Facultad de Informática - UNLP
> Desarrollador Java y .NET  en LIFIA
>
> 2017-02-25 12:38 GMT-03:00 Sai Ganesh Muthuraman <saiganesh...@gmail.com>:
>
> Hi Jose,
>
> Which logs do I have to look into exactly, because in the application
> logs, I found the error message that I mentioned and it was also mentioned
> that there was *No good last checkpoint.*
> I am not able to figure out the reason for the failure of a worker for
> bigger files. What do I have to look for in the logs?
> Also, How do I enable Checkpointing?
>
>
> - Sai Ganesh Muthuraman
>
>
>
>
>
>
>

Reply via email to