Hi Again,

The following appears to work around the issue, but I'm not sure of the
long term affect of running these commands, so do not run them unless you
are willing to trash your cluster:

delete from topology_host_info;
delete from topology_logical_task;
delete from topology_host_task;
delete from topology_host_request;
delete from topology_hostgroup;
delete from topology_logical_request;
delete from topology_request;

I need to test if I can add new hosts once these tables have had there
entries cleared from them. I have a feeling I won't be able to scale
automatically as some of the information to do this is held within these
tables.

This happens for me every time I install a cluster using a blueprint, and
then scale using the api and host groups.

Cheers



On Wed, Sep 28, 2016 at 1:19 PM, cs user <[email protected]> wrote:

> Hi All,
>
> I've just had the exact same issue upgrading from 2.2.1.0 to 2.4.0.1. I'm
> using blueprints and then adding the nodes via a curl command, telling
> ambari which host group they should be in.
>
> This time however, my topology_request table looks fine. It just loops
> over and over again with these error messages:
>
> 28 Sep 2016 13:14:39,006  INFO [ambari-hearbeat-monitor] HostRequest:127 -
> HostRequest: Successfully recovered host request for host: Host Assignment
> Pending
> 28 Sep 2016 13:14:39,007  INFO [ambari-hearbeat-monitor]
> LogicalRequest:449 - LogicalRequest.createHostRequests: created new
> outstanding host request ID = 6
> 28 Sep 2016 13:14:39,012  INFO [ambari-hearbeat-monitor] HostRequest:127 -
> HostRequest: Successfully recovered host request for host: Host Assignment
> Pending
> 28 Sep 2016 13:14:39,012  INFO [ambari-hearbeat-monitor]
> LogicalRequest:449 - LogicalRequest.createHostRequests: created new
> outstanding host request ID = 8
> 28 Sep 2016 13:14:39,022  INFO [ambari-hearbeat-monitor] HostRequest:127 -
> HostRequest: Successfully recovered host request for host: Host Assignment
> Pending
> 28 Sep 2016 13:14:39,022  INFO [ambari-hearbeat-monitor]
> LogicalRequest:449 - LogicalRequest.createHostRequests: created new
> outstanding host request ID = 2
> 28 Sep 2016 13:14:39,028  INFO [ambari-hearbeat-monitor] HostRequest:127 -
> HostRequest: Successfully recovered host request for host: Host Assignment
> Pending
> 28 Sep 2016 13:14:39,028  INFO [ambari-hearbeat-monitor]
> LogicalRequest:449 - LogicalRequest.createHostRequests: created new
> outstanding host request ID = 5
> 28 Sep 2016 13:14:39,033  INFO [ambari-hearbeat-monitor] HostRequest:127 -
> HostRequest: Successfully recovered host request for host: Host Assignment
> Pending
> 28 Sep 2016 13:14:39,033  INFO [ambari-hearbeat-monitor]
> LogicalRequest:449 - LogicalRequest.createHostRequests: created new
> outstanding host request ID = 7
> 28 Sep 2016 13:14:39,042  INFO [ambari-hearbeat-monitor] HostRequest:127 -
> HostRequest: Successfully recovered host request for host: Host Assignment
> Pending
> 28 Sep 2016 13:14:39,042  INFO [ambari-hearbeat-monitor]
> LogicalRequest:449 - LogicalRequest.createHostRequests: created new
> outstanding host request ID = 4
> 28 Sep 2016 13:14:39,055  INFO [ambari-hearbeat-monitor] HostRequest:127 -
> HostRequest: Successfully recovered host request for host: Host Assignment
> Pending
> 28 Sep 2016 13:14:39,056  INFO [ambari-hearbeat-monitor]
> LogicalRequest:449 - LogicalRequest.createHostRequests: created new
> outstanding host request ID = 1
> 28 Sep 2016 13:14:39,062  INFO [ambari-hearbeat-monitor] HostRequest:127 -
> HostRequest: Successfully recovered host request for host: Host Assignment
> Pending
> 28 Sep 2016 13:14:39,062  INFO [ambari-hearbeat-monitor]
> LogicalRequest:449 - LogicalRequest.createHostRequests: created new
> outstanding host request ID = 3
> 28 Sep 2016 13:15:13,119  WARN [ambari-hearbeat-monitor]
> HeartbeatMonitor:129 - Exception received
> java.lang.NullPointerException
>         at java.lang.String.replace(String.java:2240)
>         at org.apache.ambari.server.topology.HostRequest.
> getLogicalTasks(HostRequest.java:303)
>         at org.apache.ambari.server.topology.LogicalRequest.
> getCommands(LogicalRequest.java:158)
>         at org.apache.ambari.server.topology.LogicalRequest.
> getRequestStatus(LogicalRequest.java:231)
>         at org.apache.ambari.server.topology.TopologyManager.
> isLogicalRequestFinished(TopologyManager.java:812)
>         at org.apache.ambari.server.topology.TopologyManager.
> replayRequests(TopologyManager.java:766)
>         at org.apache.ambari.server.topology.TopologyManager.
> ensureInitialized(TopologyManager.java:150)
>         at org.apache.ambari.server.topology.TopologyManager.
> onHostHeartBeatLost(TopologyManager.java:485)
>         at org.apache.ambari.server.state.host.HostImpl$
> HostHeartbeatLostTransition.transition(HostImpl.java:408)
>         at org.apache.ambari.server.state.host.HostImpl$
> HostHeartbeatLostTransition.transition(HostImpl.java:396)
>         at org.apache.ambari.server.state.fsm.StateMachineFactory$
> SingleInternalArc.doTransition(StateMachineFactory.java:354)
>         at org.apache.ambari.server.state.fsm.StateMachineFactory.
> doTransition(StateMachineFactory.java:294)
>         at org.apache.ambari.server.state.fsm.StateMachineFactory.
> access$300(StateMachineFactory.java:39)
>         at org.apache.ambari.server.state.fsm.StateMachineFactory$
> InternalStateMachine.doTransition(StateMachineFactory.java:440)
>         at org.apache.ambari.server.state.host.HostImpl.
> handleEvent(HostImpl.java:584)
>         at org.apache.ambari.server.agent.HeartbeatMonitor.doWork(
> HeartbeatMonitor.java:160)
>         at org.apache.ambari.server.agent.HeartbeatMonitor.run(
> HeartbeatMonitor.java:121)
>         at java.lang.Thread.run(Thread.java:745)
>
>
> I've repeated this upgrade over and over in a development environment, the
> initial install and upgrade is automated. Each time I get the same issue
> once the server starts up, none of the agents can register, they just get
> error 500 returned.
>
> Am I the only one who is hitting this issue?
>
> Cheers!
>
>
> On Wed, Mar 9, 2016 at 7:09 PM, cs user <[email protected]> wrote:
>
>> So I was able to get past this error by running removing rows 9 and 10
>> from the table below. It appears that when two hosts I deleted came back ,
>> in effect totally new hosts but with the same hostname, it created a number
>> of duplicate rows in the various topology tables. I deleted the duplicates
>> from a number of these tables, but deleting the final two rows below fixed
>> it for me...... I don't have a copy of how these looked, but some of them
>> contained duplicate rows with the node names I had deleted and restored
>> listed twice. Perhaps someone can shed some light on what may have caused
>> this?
>>
>>
>> Just to clarify, I have 7 hosts, so this table should contain 8 rows. 1
>> for the cluster, the remaining for the hosts. When things were failing it
>> contained 10 rows.
>>
>>
>> ambari=> select * from topology_request;
>>
>>  id |  action   | cluster_id |  bp_name   | cluster_properties |
>> cluster_attributes |              description
>>
>> ----+-----------+------------+------------+-----------------
>> ---+--------------------+---------------------------------------
>>
>>   1 | PROVISION |          2 | testcluster | {}                 | {}
>>             | Provision Cluster 'testcluster'
>>
>>   2 | SCALE     |          2 | testcluster | {}                 | {}
>>             | Scale Cluster 'testcluster' (+1 hosts)
>>
>>   3 | SCALE     |          2 | testcluster | {}                 | {}
>>             | Scale Cluster 'testcluster' (+1 hosts)
>>
>>   4 | SCALE     |          2 | testcluster | {}                 | {}
>>             | Scale Cluster 'testcluster' (+1 hosts)
>>
>>   5 | SCALE     |          2 | testcluster | {}                 | {}
>>             | Scale Cluster 'testcluster' (+1 hosts)
>>
>>   6 | SCALE     |          2 | testcluster | {}                 | {}
>>             | Scale Cluster 'testcluster' (+1 hosts)
>>
>>   7 | SCALE     |          2 | testcluster | {}                 | {}
>>             | Scale Cluster 'testcluster' (+1 hosts)
>>
>>   8 | SCALE     |          2 | testcluster | {}                 | {}
>>             | Scale Cluster 'testcluster' (+1 hosts)
>>
>>   9 | SCALE     |          2 | testcluster | {}                 | {}
>>             | Scale Cluster 'testcluster' (+1 hosts)
>>
>>  10 | SCALE     |          2 | testcluster | {}                 | {}
>>             | Scale Cluster 'testcluster' (+1 hosts)
>>
>> (10 rows)
>>
>>
>>
>>
>> On Tue, Mar 8, 2016 at 3:00 PM, Jonathan Hurley <[email protected]>
>> wrote:
>>
>>> That's very odd, especially since the upgrade doesn't touch the topology
>>> tables. Are you using MySQL by any chance? If so, can you check to make
>>> sure that your database engine is Innodb and not MyISAM. You have an
>>> integrity violation here which doesn't seem possible unless you're using a
>>> database which doesn't support foreign key constraints.
>>>
>>> There's probably some SQL which you can run to insert an entry into the
>>> topology_logical_request table, but it's probably best to understand why
>>> this happened first.
>>>
>>> On Mar 8, 2016, at 5:55 AM, cs user <[email protected]> wrote:
>>>
>>> Hi All,
>>>
>>> I've upgraded Ambari from version 2.1.2-377 to version 2.2.1.0-161.
>>>
>>> After performing the upgrade on the server, agents, upgrading the
>>> database and starting everything up, I keep seeing the following error in
>>> the logs on the server:
>>>
>>> 08 Mar 2016 10:07:05,087  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host: Host Assignment
>>> Pending
>>> 08 Mar 2016 10:07:05,088  INFO [qtp-ambari-agent-55] LogicalRequest:420
>>> - LogicalRequest.createHostRequests: created new outstanding host
>>> request ID = 3
>>> 08 Mar 2016 10:07:05,120  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host: Host Assignment
>>> Pending
>>> 08 Mar 2016 10:07:05,120  INFO [qtp-ambari-agent-55] LogicalRequest:420
>>> - LogicalRequest.createHostRequests: created new outstanding host
>>> request ID = 5
>>> 08 Mar 2016 10:07:05,134  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host: Host Assignment
>>> Pending
>>> 08 Mar 2016 10:07:05,134  INFO [qtp-ambari-agent-55] LogicalRequest:420
>>> - LogicalRequest.createHostRequests: created new outstanding host
>>> request ID = 8
>>> 08 Mar 2016 10:07:05,147  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host: Host Assignment
>>> Pending
>>> 08 Mar 2016 10:07:05,148  INFO [qtp-ambari-agent-55] LogicalRequest:420
>>> - LogicalRequest.createHostRequests: created new outstanding host
>>> request ID = 7
>>> 08 Mar 2016 10:07:05,158  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host: Host Assignment
>>> Pending
>>> 08 Mar 2016 10:07:05,158  INFO [qtp-ambari-agent-55] LogicalRequest:420
>>> - LogicalRequest.createHostRequests: created new outstanding host
>>> request ID = 6
>>> 08 Mar 2016 10:07:05,170  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host: Host Assignment
>>> Pending
>>> 08 Mar 2016 10:07:05,170  INFO [qtp-ambari-agent-55] LogicalRequest:420
>>> - LogicalRequest.createHostRequests: created new outstanding host
>>> request ID = 2
>>> 08 Mar 2016 10:07:05,184  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host: Host Assignment
>>> Pending
>>> 08 Mar 2016 10:07:05,185  INFO [qtp-ambari-agent-55] LogicalRequest:420
>>> - LogicalRequest.createHostRequests: created new outstanding host
>>> request ID = 1
>>> 08 Mar 2016 10:07:05,194  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host: Host Assignment
>>> Pending
>>> 08 Mar 2016 10:07:05,194  INFO [qtp-ambari-agent-55] LogicalRequest:420
>>> - LogicalRequest.createHostRequests: created new outstanding host
>>> request ID = 4
>>> 08 Mar 2016 10:07:05,290  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host:
>>> ambdevtestdc2host-group-21.node.example
>>> 08 Mar 2016 10:07:05,328  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host:
>>> ambdevtestdc2host-group-51.node.example
>>> 08 Mar 2016 10:07:05,384  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host:
>>> ambdevtestdc2host-group-11.node.example
>>> 08 Mar 2016 10:07:05,428  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host:
>>> ambdevtestdc2host-group-41.node.example
>>> 08 Mar 2016 10:07:05,507  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host:
>>> ambdevtestdc2host-group-31.node.example
>>> 08 Mar 2016 10:07:05,575  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host:
>>> ambdevtestdc2host-group-53.node.example
>>> 08 Mar 2016 10:07:05,627  INFO [qtp-ambari-agent-55] HostRequest:125 -
>>> HostRequest: Successfully recovered host request for host:
>>> ambdevtestdc2host-group-52.node.example
>>> 08 Mar 2016 10:07:05,644  WARN [qtp-ambari-agent-55] ServletHandler:563
>>> - /agent/v1/register/ambdevtestdc2host-group-51.node.example
>>> java.lang.NullPointerException
>>>         at org.apache.ambari.server.topology.PersistedStateImpl.getAllR
>>> equests(PersistedStateImpl.java:157)
>>>         at org.apache.ambari.server.topology.TopologyManager.ensureInit
>>> ialized(TopologyManager.java:131)
>>>         at org.apache.ambari.server.topology.TopologyManager.onHostRegi
>>> stered(TopologyManager.java:315)
>>>         at org.apache.ambari.server.state.host.HostImpl$HostRegistratio
>>> nReceived.transition(HostImpl.java:301)
>>>         at org.apache.ambari.server.state.host.HostImpl$HostRegistratio
>>> nReceived.transition(HostImpl.java:266)
>>>         at org.apache.ambari.server.state.fsm.StateMachineFactory$Singl
>>> eInternalArc.doTransition(StateMachineFactory.java:354)
>>>         at org.apache.ambari.server.state.fsm.StateMachineFactory.doTra
>>> nsition(StateMachineFactory.java:294)
>>>         at org.apache.ambari.server.state.fsm.StateMachineFactory.acces
>>> s$300(StateMachineFactory.java:39)
>>>         at org.apache.ambari.server.state.fsm.StateMachineFactory$Inter
>>> nalStateMachine.doTransition(StateMachineFactory.java:440)
>>>         at org.apache.ambari.server.state.host.HostImpl.handleEvent(
>>> HostImpl.java:570)
>>>         at org.apache.ambari.server.agent.HeartBeatHandler.handleRegist
>>> ration(HeartBeatHandler.java:966)
>>>         at org.apache.ambari.server.agent.rest.AgentResource.register(
>>> AgentResource.java:95)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:497)
>>>         at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invo
>>> ke(JavaMethodInvokerFactory.java:60)
>>>         at com.sun.jersey.server.impl.model.method.dispatch.AbstractRes
>>> ourceMethodDispatchProvider$TypeOutInvoker._dispatch(Abstr
>>> actResourceMethodDispatchProvider.java:185)
>>>         at com.sun.jersey.server.impl.model.method.dispatch.ResourceJav
>>> aMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>>>         at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(
>>> HttpMethodRule.java:302)
>>>         at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accep
>>> t(RightHandPathRule.java:147)
>>>         at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accep
>>> t(ResourceClassRule.java:108)
>>>         at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accep
>>> t(RightHandPathRule.java:147)
>>>         at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule
>>> .accept(RootResourceClassesRule.java:84)
>>>         at com.sun.jersey.server.impl.application.WebApplicationImpl._h
>>> andleRequest(WebApplicationImpl.java:1542)
>>>         at com.sun.jersey.server.impl.application.WebApplicationImpl._h
>>> andleRequest(WebApplicationImpl.java:1473)
>>>         at com.sun.jersey.server.impl.application.WebApplicationImpl.ha
>>> ndleRequest(WebApplicationImpl.java:1419)
>>>         at com.sun.jersey.server.impl.application.WebApplicationImpl.ha
>>> ndleRequest(WebApplicationImpl.java:1409)
>>>         at com.sun.jersey.spi.container.servlet.WebComponent.service(We
>>> bComponent.java:409)
>>>         at com.sun.jersey.spi.container.servlet.ServletContainer.servic
>>> e(ServletContainer.java:540)
>>>         at com.sun.jersey.spi.container.servlet.ServletContainer.servic
>>> e(ServletContainer.java:715)
>>>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
>>>         at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder
>>> .java:684)
>>>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>>> r(ServletHandler.java:1496)
>>>         at org.apache.ambari.server.security.SecurityFilter.doFilter(
>>> SecurityFilter.java:67)
>>>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>>> r(ServletHandler.java:1467)
>>>         at org.apache.ambari.server.api.AmbariPersistFilter.doFilter(Am
>>> bariPersistFilter.java:47)
>>>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>>> r(ServletHandler.java:1467)
>>>         at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgen
>>> tFilter.java:82)
>>>         at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.
>>> java:294)
>>>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>>> r(ServletHandler.java:1467)
>>>         at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
>>> dler.java:501)
>>>         at org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>>> ContextHandler.java:1086)
>>>         at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
>>> ler.java:429)
>>>         at org.eclipse.jetty.server.handler.ContextHandler.doScope(
>>> ContextHandler.java:1020)
>>>         at org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>>> Handler.java:135)
>>>         at org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
>>> erWrapper.java:116)
>>>         at org.eclipse.jetty.server.Server.handle(Server.java:370)
>>>         at org.eclipse.jetty.server.AbstractHttpConnection.handleReques
>>> t(AbstractHttpConnection.java:494)
>>>         at org.eclipse.jetty.server.AbstractHttpConnection.content(Abst
>>> ractHttpConnection.java:982)
>>>         at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandl
>>> er.content(AbstractHttpConnection.java:1043)
>>>         at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:
>>> 865)
>>>         at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.
>>> java:240)
>>>         at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHtt
>>> pConnection.java:82)
>>>         at org.eclipse.jetty.io.nio.SslConnection.handle(SslConnection.
>>> java:196)
>>>         at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(Select
>>> ChannelEndPoint.java:696)
>>>         at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectC
>>> hannelEndPoint.java:53)
>>>         at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
>>> ThreadPool.java:608)
>>>         at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedT
>>> hreadPool.java:543)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>> This is not specific to host group ambdevtestdc2host-group-51.node.example,
>>> it is happening for all host groups.
>>>
>>> On the agents I see the following:
>>>
>>> <head>
>>> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
>>> <title>Error 500 Server Error</title>
>>> </head>
>>> <body>
>>> <h2>HTTP ERROR: 500</h2>
>>> <p>Problem accessing 
>>> /agent/v1/register/ambdevtestdc2host-group-51.node.example
>>> Reason:
>>> <pre>    Server Error</pre></p>
>>> <hr /><i><small>Powered by Jetty://</small></i>
>>>
>>> Is there a work around for this? It's just a test cluster, but it would
>>> be good to know how to work around this, as I've seen it a number of times
>>> now. Is there anything that can be modified in the database to resolve it?
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>>
>>
>

Reply via email to