I cannot check it before next week but I do not assume that it is an config error, since one agent on one node is running. I had the issue also several times with the hortonworks sandbox(locally in vmware) but was not able to reproduce it. I'll let you know the output...
BR Marco 2015-07-02 2:27 GMT+02:00 Sumit Mohanty <[email protected]>: > What does execution of > > pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > generate as output? *If you have a single agent you can also use ps aux > and grep for flume* > > > In my case, for example, I see > [root@smb201-1 ~]# pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*a1.* > 16794 > > Is it possible that the configuration for the flume agent "agent1" may > have some issue? > > > ------------------------------ > *From:* Marco <[email protected]> > *Sent:* Wednesday, July 01, 2015 7:27 AM > > *To:* [email protected] > *Subject:* Re: Restart of flume-agents bug > > error: > <<<< > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 214, in execute > method(env) > File > "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line > 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/FLUME/1.4.0.2.0/package/scripts/flume_handler.py", > line 56, in start > flume(action='start') > File > "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line > 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/FLUME/1.4.0.2.0/package/scripts/flume.py", > line 161, in flume > try_sleep=10) > File > "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line > 148, in __init__ > self.env.run() > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 152, in run > self.run_action(resource, action) > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 118, in run_action > provider_action() > File > "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", > line 274, in action_run > raise ex > Fail: Execution of 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > >>>> > > > output: > <<< > 2015-07-01 14:08:03,131 - u'Execute[\'ambari-sudo.sh su flume -l -s > /bin/bash -c \'export > > PATH=\'"\'"\'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent\'"\'"\\ > ' JAVA_HOME=/usr/jdk64/jdk1.7.0_67 ; > /usr/hdp/current/flume-server/bin/flume-ng agent --name agent1 --conf > /etc/flume/conf/agent1 --conf-file /etc/flume/conf/agent1/flume.conf > -Dflume.monitoring.type=org\ > .apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink > -Dflume.monitoring.node=hostname:6188 > /var/log/flume/agent1.out 2>&1\' > &\']' {'environment': {'JAVA_HOME': u'/usr/jdk64/jd\ > k1.7.0_67'}, 'wait_for_finish': False} > 2015-07-01 14:08:03,136 - u"Execute['pgrep -o -u flume -f > ^/usr/jdk64/jdk1.7.0_67.*agent1.* > /var/run/flume/agent1.pid']" > {'logoutput': True, 'tries': 20, 'try_sleep': 10} > 2015-07-01 14:08:03,179 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:08:13,233 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:08:23,280 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:08:33,334 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:08:43,389 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:08:53,440 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:09:03,511 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:09:13,565 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:09:23,619 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:09:33,673 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:09:43,722 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:09:53,772 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:10:03,826 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:10:13,880 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:10:23,928 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:10:33,982 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:10:44,037 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:10:54,083 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:11:04,137 - Retrying after 10 seconds. Reason: Execution of > 'pgrep -o -u flume -f ^/usr/jdk64/jdk1.7.0_67.*agent1.* > > /var/run/flume/agent1.pid' returned 1. > 2015-07-01 14:11:14,190 - Error while executing command 'start': > Traceback (most recent call last): > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 214, in execute > method(env) > > >>> > > Thanks, > Marco > > 2015-07-01 16:18 GMT+02:00 Sumit Mohanty <[email protected]>: > >> When you start Flume using Ambari - /var/lib/ambari-agent/data folder >> on the host will have corresponding command outputs/errors etc. Can you >> share those? >> >> >> Feel free to send a direct email as I think Apache email will not let >> attachments. >> ------------------------------ >> *From:* Marco <[email protected]> >> *Sent:* Wednesday, July 01, 2015 7:14 AM >> *To:* [email protected] >> *Subject:* Re: Restart of flume-agents bug >> >> I've tried this but do not find any related processes >> >> I've searched via >> pgrep -fl flume >> pgrep -fl agent1 >> >> Also, I've restarted the corresponding server. >> >> If I try to restart the flume agent, I get the same issue :( >> >> I've also tried to delete /var/run/flume and create it again....also no >> effect. >> >> BR Marco >> >> >> >> 2015-07-01 15:57 GMT+02:00 Sumit Mohanty <[email protected]>: >> >>> If flume agents are running then you need to kill those processes as >>> well along with deleting the pid files. >>> ------------------------------ >>> *From:* Marco <[email protected]> >>> *Sent:* Wednesday, July 01, 2015 6:42 AM >>> *To:* [email protected] >>> *Subject:* Restart of flume-agents bug >>> >>> Hi, >>> >>> I've troubles when restarting flume agents with ambari. >>> >>> I've found this jira entry >>> https://issues.apache.org/jira/browse/AMBARI-10657, which describes my >>> problem (var/run/flume/a2.pid' returned 1. >>> >>> Since I am using the hortonworks distribution (ambari 2.0.0) I cannot >>> just upgrade/patch...is there any workaround for this issue? I've tried to >>> delete the pid file but with no effect. >>> >>> Thanks, >>> Marco >>> >> >> >> >> -- >> Viele Grüße, >> Marco >> > > > > -- > Viele Grüße, > Marco > -- Viele Grüße, Marco
