I've created AMBARI-14031 & AMBARI-14032 for these issues.

And thanks for the pointer for the #experimental /
opsDuringRollingUpgrade workaround.

One more question - is there a process for uninstalling old / unused
versions of HDP?  For example, now that I've upgraded from 2.2.8 ->
2.2.9, is there a way to remove 2.2.6?

On Mon, Nov 23, 2015 at 11:35 AM, Alejandro Fernandez
<[email protected]> wrote:
> Hi,
>
> I wish your experience with Rolling Upgrades would have been better.
> I'll do my best to explain the solution to each one of those items. As a
> developer, I like to hear this feedback so we can make the product better.
>
> * Cluster is locked down while in the middle of upgrade:
> Operations like changing configs, adding hosts, adding services, etc. are
> disallowed by default.
> This is meant to prevent the user from drastically changing the stack
> configs and ending up in a worse state.
> Cluster operators can still change configs by navigating to
> http://server:8080/#/experimental and enabling "opsDuringRollingUpgrade".
> I completely agree that we need to be more flexible in this area since
> configs are likely to break, and the savvy users should still be allowed to
> change them.
>
> * Configs are only changed in major stack versions:
> In HDP 2.2.*->2.2.*, we don't expect any config changes, so the Upgrade Pack
> doesn't orchestrate any, whereas a 2.2.*->2.3.* has many config changes.
> At times, this will break, and we typically find out about it during testing
> and reports from users with custom configs.
> Tools like SmartSense can also help to point out incorrect configs. In the
> future, we may relax this so that even minor versions are allowed to change
> configs.
>
> * Unable to finalize since hosts are not on the new version:
> We've talked about a way to "force finalize" the versions. Today, Ambari is
> very strict about requiring all hosts to be updated.
> As a workaround, we have a python script called "RU Magician" that will
> allow you to fix things, and force any version to CURRENT; checkout
> https://github.com/apache/ambari/tree/branch-2.1.2/contrib/ru_magician
> You ran the correct SQL statements, so kudos to you for that.
>
> * Components that don't advertise a version:
> Some components like ZKFC, AMS, MySQL, Kerberos Client, don’t need to
> advertise a version.
> In the case of ZKFC, it is because it uses the same binary as that of
> NameNode. So perhaps an earlier version of Ambari caused it to stay stuck on
> 2.2.6 in the DB.
> If you feel more comfortable, you can change ZKFC's version to 'UNKNOWN'.
>
> My suggestion is to create Jiras on Apache for the following:
>
> Allow force finalizing a version during Stack Upgrade
> Allow changing configs during the middle of a Stack Upgrade, will need to
> prompt user with a disclaimer/warning
>
> Thanks,
> Alejandro
>
> On 11/22/15, 11:32 PM, "Andrew Robertson" <[email protected]>
> wrote:
>
> I performed a rolling upgrade of HDP from 2.2.8 to 2.2.9 today using
> Ambari 2.1.2.1 & ran into several issues.
>
> My YARN resource manager failed to start due to a "Service
> ResourceManager failed in state INITED; cause:
> java.lang.IllegalArgumentException: Illegal capacity of -1.0 for
> node-label=default in queue=root, valid capacity should in range of
> [0, 100].". (It was working fine with 2.2.8; this may be something new
> in 2.2.9).
>
> As Ambari usage feedback - this was impossible to fix in Ambari while
> the upgrade was going on, and it added a ton of (down)time to the
> upgrade. This error caused a number of service checks to time out
> after a long wait (many checks took 5-15 min to fail). I didn't see
> any way to fix the error (the only options I had during the upgrade
> were "Downgrade" - which I didn't want to do (It was a test cluster
> after all, I wanted to get through it so I could fix it); and "Ignore"
> which allowed it to continue, but caused each step to take 300+
> seconds. Ambari seemed to lock the configs so I couldn't make changes
> to fix the issue while the upgrade was going on.  Likewise, I couldn't
> manually restart the service myself or abort the service checks. Even
> at the "Verify operation" and the "finalize" checkpoints, where I
> could "pause" the upgrade - the configs were still locked and I had no
> ability to start/stop services.
>
> At the end, Ambari started giving other errors about being unable to
> finalize the upgrade. I ended up rebooting the cluster & ambari - this
> got it back to a state where I could edit the configs again to fix the
> YARN RM config.  The fix to the RM not starting ended up being the
> same as AMBARI-11358, which appears to only have been fixed in the
> HDP2.3 upgrade.
>
> Separately, Ambari had the 2.2.9 version waiting to be finalized but I
> couldn't find any way to do this in the UI after the restart. So I
> went into the database and ran the following:
> UPDATE host_version SET state = 'INSTALLED' WHERE state = 'CURRENT';
> UPDATE host_version SET state = 'CURRENT' WHERE repo_version_id = <id
> for 2.2.9.0 version> and state = 'UPGRADED';
> UPDATE cluster_version SET state = 'INSTALLED' WHERE state = 'CURRENT';
> UPDATE cluster_version SET state = 'CURRENT' WHERE repo_version_id =
> <id for 2.2.9.0 version> and state = 'UPGRADED';
> UPDATE hostcomponentstate set upgrade_state = 'NONE';
> This seems to have fixed that.
>
> Possibly unrelated - I did find there are 2 services that show up with
> an even older old version when checking the ambari database:
>
> ambari=> SELECT h.host_name, hcs.service_name, hcs.component_name,
> hcs.version FROM hostcomponentstate hcs JOIN hosts h ON hcs.host_id =
> h.host_id where hcs.version NOT IN ('2.2.9.0-3393', 'UNKNOWN');
>             host_name             | service_name | component_name |
> version
> ----------------------------------+--------------+----------------+--------------
> node2 | HDFS         | ZKFC           | 2.2.6.0-2800
> node1 | HDFS         | ZKFC           | 2.2.6.0-2800
>
> (But I had upgraded from 2.2.8; 2.2.6 was the version before that).
>
> Any suggestions on how to fix this? I think Ambari may just be
> confused, but I'm not sure how to verify this and/or fix Ambari (other
> than overwrite this field in the database?). I've verified the yum
> versions are right for the package and the right processes are
> actually running on the machine.
>
> Thank you!
>
>

Reply via email to