On Fri, Jun 29, 2018 at 11:20 AM Gábor Mészáros <gabor.mesza...@canonical.com> wrote: > > That I can later, not now. > What I see now though is mds are not clean and resync started:
Yes; the updated curtin does a number of things to prevent this from happening. 1) we wipe the first and last 1M of the raid device itself 2) we fail each member of the array, forcing them out 3) we wipe the first and last 1M of the disk in the array 4) we stop and remove the md device I've seen this exact scenario when we updated curtin to handle this case; so I'm keenly interested in seeing the install log to see what exactly happened. I've taken the storage config you've provided and attempted to reproduce but I'm not able to at this time. In the curtin install log, I'm specifically looking for the device tree and shutdown plan. [ 297.990094] cloud-init[1503]: Current device storage tree: [ 297.990508] cloud-init[1503]: vda [ 297.991362] cloud-init[1503]: `-- vda1 [ 297.991922] cloud-init[1503]: `-- md2 [ 297.992729] cloud-init[1503]: `-- bcache0 [ 297.993098] cloud-init[1503]: vdb [ 297.993678] cloud-init[1503]: `-- md2 [ 297.996101] cloud-init[1503]: `-- bcache0 [ 297.996425] cloud-init[1503]: vdc [ 297.997004] cloud-init[1503]: `-- md2 [ 297.997293] cloud-init[1503]: `-- bcache0 [ 297.997694] cloud-init[1503]: vdd [ 297.998112] cloud-init[1503]: `-- md2 [ 297.998511] cloud-init[1503]: `-- bcache0 [ 297.998797] cloud-init[1503]: vde [ 297.999382] cloud-init[1503]: |-- vde1 [ 297.999805] cloud-init[1503]: |-- vde2 [ 298.000298] cloud-init[1503]: | `-- md0 [ 298.002438] cloud-init[1503]: `-- vde3 [ 298.002718] cloud-init[1503]: `-- md1 [ 298.003113] cloud-init[1503]: `-- bcache0 [ 298.003760] cloud-init[1503]: vdf [ 298.004168] cloud-init[1503]: |-- vdf1 [ 298.004576] cloud-init[1503]: |-- vdf2 [ 298.007226] cloud-init[1503]: | `-- md0 [ 298.007661] cloud-init[1503]: `-- vdf3 [ 298.008095] cloud-init[1503]: `-- md1 [ 298.008493] cloud-init[1503]: `-- bcache0 [ 298.009085] cloud-init[1503]: vdg [ 298.009516] cloud-init[1503]: Shutdown Plan: [ 298.011795] cloud-init[1503]: {'device': '/sys/class/block/bcache0', 'level': 3, 'dev_type': 'bcache'} [ 298.012072] cloud-init[1503]: {'device': '/sys/class/block/md0', 'level': 2, 'dev_type': 'raid'} [ 298.012503] cloud-init[1503]: {'device': '/sys/class/block/md2', 'level': 2, 'dev_type': 'raid'} [ 298.012881] cloud-init[1503]: {'device': '/sys/class/block/md1', 'level': 2, 'dev_type': 'raid'} [ 298.016125] cloud-init[1503]: {'device': '/sys/class/block/vdf/vdf1', 'level': 1, 'dev_type': 'partition'} [ 298.018322] cloud-init[1503]: {'device': '/sys/class/block/vdf/vdf3', 'level': 1, 'dev_type': 'partition'} [ 298.020177] cloud-init[1503]: {'device': '/sys/class/block/vde/vde1', 'level': 1, 'dev_type': 'partition'} [ 298.022210] cloud-init[1503]: {'device': '/sys/class/block/vda/vda1', 'level': 1, 'dev_type': 'partition'} [ 298.024264] cloud-init[1503]: {'device': '/sys/class/block/vde/vde3', 'level': 1, 'dev_type': 'partition'} [ 298.026705] cloud-init[1503]: {'device': '/sys/class/block/vdf/vdf2', 'level': 1, 'dev_type': 'partition'} [ 298.028756] cloud-init[1503]: {'device': '/sys/class/block/vde/vde2', 'level': 1, 'dev_type': 'partition'} [ 298.033613] cloud-init[1503]: {'device': '/sys/class/block/vdb', 'level': 0, 'dev_type': 'disk'} [ 298.033821] cloud-init[1503]: {'device': '/sys/class/block/vdg', 'level': 0, 'dev_type': 'disk'} [ 298.034262] cloud-init[1503]: {'device': '/sys/class/block/vda', 'level': 0, 'dev_type': 'disk'} [ 298.034838] cloud-init[1503]: {'device': '/sys/class/block/vdd', 'level': 0, 'dev_type': 'disk'} [ 298.035435] cloud-init[1503]: {'device': '/sys/class/block/vdf', 'level': 0, 'dev_type': 'disk'} [ 298.039619] cloud-init[1503]: {'device': '/sys/class/block/vde', 'level': 0, 'dev_type': 'disk'} [ 298.042112] cloud-init[1503]: {'device': '/sys/class/block/vdc', 'level': 0, 'dev_type': 'disk'} We first stop the bcache device, then proceed to the raid devices, then members of the raid, and then underlying disks. > > Jun 29 14:42:36 ic-skbrat2-s40pxtg mdadm[2746]: RebuildStarted event detected > on md device /dev/md2 > Jun 29 14:42:36 ic-skbrat2-s40pxtg kernel: [ 279.213933] md/raid1:md0: not > clean -- starting background reconstruction > Jun 29 14:42:36 ic-skbrat2-s40pxtg kernel: [ 279.213936] md/raid1:md0: > active with 2 out of 2 mirrors > Jun 29 14:42:36 ic-skbrat2-s40pxtg kernel: [ 279.213968] md0: detected > capacity change from 0 to 1995440128 > Jun 29 14:42:36 ic-skbrat2-s40pxtg kernel: [ 279.214033] md: resync of RAID > array md0 > Jun 29 14:42:36 ic-skbrat2-s40pxtg kernel: [ 279.214039] md: minimum > _guaranteed_ speed: 1000 KB/sec/disk. > Jun 29 14:42:36 ic-skbrat2-s40pxtg kernel: [ 279.214041] md: using maximum > available idle IO bandwidth (but not more than 200000 KB/sec) for resync. > Jun 29 14:42:36 ic-skbrat2-s40pxtg kernel: [ 279.214051] md: using 128k > window, over a total of 1948672k. > Jun 29 14:42:36 ic-skbrat2-s40pxtg mdadm[2746]: NewArray event detected on md > device /dev/md0 > Jun 29 14:42:36 ic-skbrat2-s40pxtg mdadm[2746]: RebuildStarted event detected > on md device /dev/md0 > Jun 29 14:42:42 ic-skbrat2-s40pxtg mdadm[2746]: Rebuild51 event detected on > md device /dev/md0 > Jun 29 14:42:42 ic-skbrat2-s40pxtg mdadm[2746]: NewArray event detected on md > device /dev/md1 > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.312105] md: bind<sdd3> > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.312238] md: bind<sde3> > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.313697] md/raid1:md1: not > clean -- starting background reconstruction > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.313701] md/raid1:md1: > active with 2 out of 2 mirrors > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.313774] created bitmap (2 > pages) for device md1 > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.314044] md1: bitmap > initialized from disk: read 1 pages, set 3515 of 3515 bits > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.314138] md1: detected > capacity change from 0 to 235862491136 > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.314228] md: delaying resync > of md1 until md0 has finished (they share one or more physical units) > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.412570] bcache: > bch_journal_replay() journal replay done, 0 keys in 2 entries, seq 78030 > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.437013] bcache: > bch_cached_dev_attach() Caching md2 as bcache0 on set > 38d7614a-32f6-4e4f-a044-ab0f06434bf4 > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.437033] bcache: > register_cache() registered cache device md1 > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.454171] bcache: > register_bcache() error opening /dev/md1: device already registered > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.532188] bcache: > register_bcache() error opening /dev/md1: device already registered > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.563413] bcache: > register_bcache() error opening /dev/md1: device already registered > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.642738] bcache: > register_bcache() error opening /dev/md2: device already registered (emitting > change event) > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.702291] bcache: > register_bcache() error opening /dev/md2: device already registered (emitting > change event) > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.748625] bcache: > register_bcache() error opening /dev/md1: device already registered > Jun 29 14:42:42 ic-skbrat2-s40pxtg kernel: [ 284.772383] bcache: > register_bcache() error opening /dev/md1: device already registered > Jun 29 14:42:42 ic-skbrat2-s40pxtg cloud-init[4053]: An error occured > handling 'bcache0': RuntimeError - ('Unexpected old bcache device: %s', > '/dev/md2') > Jun 29 14:42:42 ic-skbrat2-s40pxtg cloud-init[4053]: ('Unexpected old bcache > device: %s', '/dev/md2') > Jun 29 14:42:42 ic-skbrat2-s40pxtg cloud-init[4053]: curtin: Installation > failed with exception: Unexpected error while running command. > Jun 29 14:42:42 ic-skbrat2-s40pxtg cloud-init[4053]: Command: ['curtin', > 'block-meta', 'custom'] > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1778704 > > Title: > redeployment of node with bcache fails > > To manage notifications about this bug go to: > https://bugs.launchpad.net/curtin/+bug/1778704/+subscriptions -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1778704 Title: redeployment of node with bcache fails To manage notifications about this bug go to: https://bugs.launchpad.net/curtin/+bug/1778704/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs