HI Luca, Thanks for sharing your upgrade experience. We hit the exact same issue of HDFS inconsistent status issue when we upgraded one cluster from CDH5.16.2 to CDH6.3.2. At that time some DNs crashed due to OOM and some other DNs were still running but failed to upgrade its volumes. We finally resolved the issue by increasing the max heap size from 4GB to 64GB (our DNs has either 256GB or 512GB memory) and then restarting all the DNs.
-Jason On 2/12/21, 12:52 AM, "Luca Toscano" <[email protected]> wrote: Hi everybody, We have finally migrated our CDH cluster to Bigtop 1.5, so I can say that we are now happy Bigtop users :) The upgrade of the production cluster (60 worker nodes, ~50M files on HDFS) was harder than I expected, since we bumped into a strange performance issue that slowed down the HDFS upgrade. I wrote a summary in https://urldefense.proofpoint.com/v2/url?u=https-3A__phabricator.wikimedia.org_T273711-236818136&d=DwIBaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=n8sbnJKTVI75MPipuVM4uUi1n49089On4CdWygRwp20&s=Lluhh7rsGsKk9zQbVVXvbAMLIlMUPdary3ZUuI3dA8I&e= for whoever is interested, it is surely something to highlight in the CDH->Bigtop guide. Speaking of which, the last thing that we did was starting https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE_edit&d=DwIBaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=n8sbnJKTVI75MPipuVM4uUi1n49089On4CdWygRwp20&s=GxA46Ok2-8JaiU3V2_uF9QaI49w31jRHn4sRh_YCcGc&e= some time ago, so I am wondering if we could find a more permanent location. Would it make sense to start a wiki page somewhere? Or even a .md file in the github repo, as you prefer (the latter would be more convenient for reviewers etc..). Anyway, thanks a lot to all for the support! It was a looong project but we eventually did it! Luca
