Hello Paul,

>From the logs, it looks like, on starting the mesos slave, it is trying to
do slave recovery (
http://mesos.apache.org/documentation/latest/slave-recovery/) but since the
resources.info is unavailable, it is unable to perform the recovery & hence
end up killing itself.

If you are fine with loosing any running existing mesos tasks/executors,
then you can just cleanup the mesos default working directory where it
keeps the checkpoint information($ rm -rf /tmp/mesos) and then try to
restart the mesos slave.

On Tue, Mar 29, 2016 at 10:08 PM, Paul Bell <arach...@gmail.com> wrote:

> Hi,
>
> I am hoping someone can shed some light on this.
>
> An agent node failed to start, that is, when I did "service mesos-slave
> start" the service came up briefly & then stopped. Before stopping it
> produced the log shown below. The last thing it wrote is "Trying to create
> path '/mesos' in Zookeeper".
>
> This mention of the mesos znode prompted me to go for a clean slate by
> removing the mesos znode from Zookeeper.
>
> After doing this, the mesos-slave service started perfectly.
>
> What might be happening here, and also what's the right way to
> trouble-shoot such a problem? Mesos is version 0.23.0.
>
> Thanks for your help.
>
> -Paul
>
>
> Log file created at: 2016/03/29 14:19:39
> Running on machine: 71.100.202.193
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> I0329 14:19:39.512249  5870 logging.cpp:172] INFO level logging started!
> I0329 14:19:39.512564  5870 main.cpp:162] Build: 2015-07-24 10:05:39 by
> root
> I0329 14:19:39.512588  5870 main.cpp:164] Version: 0.23.0
> I0329 14:19:39.512600  5870 main.cpp:167] Git tag: 0.23.0
> I0329 14:19:39.512612  5870 main.cpp:171] Git SHA:
> 4ce5475346a0abb7ef4b7ffc9836c5836d7c7a66
> I0329 14:19:39.615172  5870 containerizer.cpp:111] Using isolation:
> posix/cpu,posix/mem
> I0329 14:19:39.615697  5870 main.cpp:249] Starting Mesos slave
> I0329 14:19:39.616267  5870 slave.cpp:190] Slave started on 1)@
> 71.100.202.193:5051
> I0329 14:19:39.616286  5870 slave.cpp:191] Flags at startup:
> --attributes="hostType:shard1" --authenticatee="crammd5"
> --cgroups_cpu_enable_pids_and_tids_count="false"
> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
> --cgroups_limit_swap="false" --cgroups_root="mesos"
> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
> --default_role="*" --disk_watch_interval="1mins"
> --docker="/usr/local/ecxmcc/weaveShim" --docker_kill_orphans="true"
> --docker_remove_delay="6hrs"
> --docker_sandbox_directory="/mnt/mesos/sandbox"
> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="15secs"
> --enforce_container_disk_quota="false"
> --executor_registration_timeout="5mins"
> --executor_shutdown_grace_period="5secs"
> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
> --hadoop_home="" --help="false" --hostname="71.100.202.193"
> --initialize_driver_logging="true" --ip="71.100.202.193"
> --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos"
> --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO"
> --master="zk://71.100.202.191:2181/mesos"
> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
> --registration_backoff_factor="1secs"
> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
> --strict="true" --switch_user="true" --version="false"
> --work_dir="/tmp/mesos"
> I0329 14:19:39.616835  5870 slave.cpp:354] Slave resources: cpus(*):4;
> mem(*):23089; disk(*):122517; ports(*):[31000-32000]
> I0329 14:19:39.617032  5870 slave.cpp:384] Slave hostname: 71.100.202.193
> I0329 14:19:39.617046  5870 slave.cpp:389] Slave checkpoint: true
> I0329 14:19:39.618841  5894 state.cpp:36] Recovering state from
> '/tmp/mesos/meta'
> I0329 14:19:39.618872  5894 state.cpp:672] Failed to find resources file
> '/tmp/mesos/meta/resources/resources.info'
> I0329 14:19:39.619730  5898 group.cpp:313] Group process (group(1)@
> 71.100.202.193:5051) connected to ZooKeeper
> I0329 14:19:39.619760  5898 group.cpp:787] Syncing group operations: queue
> size (joins, cancels, datas) = (0, 0, 0)
> I0329 14:19:39.619773  5898 group.cpp:385] Trying to create path '/mesos'
> in ZooKeeper
>
>


-- 
Regards,
Pradeep Chhetri

Reply via email to