Hi Jason, Thanks a lot for sharing your story too, I definitely feel way better about the upgrade plan that we used knowing that the exact issue happened to other people. I tried to check in Hadoop's jira if this upgrade memory requirement was mentioned, but didn't find anything. Have you some more info to share about how to best scale DNs' jvm heap sizes before the upgrade starts? In my case it was a restart/fail/double-the-heap procedure until we found that 16G was a good value for our DNs, but I see that in your case it was probably worse (4GB -> 64GB). I wouldn't really be sure about what to suggest to somebody doing a similar upgrade and asking for suggestions, and since you encountered the issue upgrading to Hadoop 3.x this will be relevant also for people upgrading from Bigtop 1.4/1.5 to the future 3.x release. The more info we can collect the better for the community in my opinion!
Luca On Fri, Feb 12, 2021 at 7:49 PM Jason Wen <[email protected]> wrote: > > HI Luca, > > Thanks for sharing your upgrade experience. > We hit the exact same issue of HDFS inconsistent status issue when we > upgraded one cluster from CDH5.16.2 to CDH6.3.2. At that time some DNs > crashed due to OOM and some other DNs were still running but failed to > upgrade its volumes. We finally resolved the issue by increasing the max heap > size from 4GB to 64GB (our DNs has either 256GB or 512GB memory) and then > restarting all the DNs. > > -Jason > > On 2/12/21, 12:52 AM, "Luca Toscano" <[email protected]> wrote: > > Hi everybody, > > We have finally migrated our CDH cluster to Bigtop 1.5, so I can say > that we are now happy Bigtop users :) > > The upgrade of the production cluster (60 worker nodes, ~50M files on > HDFS) was harder than I expected, since we bumped into a strange > performance issue that slowed down the HDFS upgrade. I wrote a summary > in > https://urldefense.proofpoint.com/v2/url?u=https-3A__phabricator.wikimedia.org_T273711-236818136&d=DwIBaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=n8sbnJKTVI75MPipuVM4uUi1n49089On4CdWygRwp20&s=Lluhh7rsGsKk9zQbVVXvbAMLIlMUPdary3ZUuI3dA8I&e= > for whoever is > interested, it is surely something to highlight in the CDH->Bigtop > guide. Speaking of which, the last thing that we did was starting > > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE_edit&d=DwIBaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=n8sbnJKTVI75MPipuVM4uUi1n49089On4CdWygRwp20&s=GxA46Ok2-8JaiU3V2_uF9QaI49w31jRHn4sRh_YCcGc&e= > some time ago, so I am wondering if we could find a more permanent > location. Would it make sense to start a wiki page somewhere? Or even > a .md file in the github repo, as you prefer (the latter would be more > convenient for reviewers etc..). > > Anyway, thanks a lot to all for the support! It was a looong project > but we eventually did it! > > Luca >
