I/O Error Test 6 (for the Cosmic kernel)
================

commit: 'Revert "bcache: set CACHE_SET_IO_DISABLE in
bch_cached_dev_error()"'

Problem: if one backing device hits I/O errors the cache device
is disabled, but if that cache device is shared by other bcache
devices they stop too (even with non-failing backing devices).

Original kernel: all bcache devices that share cache device with
failing backing device are stopped.

Modified kernel: only the bcache device with the failing backing
device is stopped.


Original kernel
---------------

root@guest-bcache:~#  uname -rv
4.18.0-23-generic #24-Ubuntu SMP Wed Jun 12 18:17:39 UTC 2019

root@guest-bcache:~# lsblk -e 252
root@guest-bcache:~# 

root@guest-bcache:~# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1
[   35.686002] bcache: register_bdev() registered backing device dm-0
[   35.695980] bcache: register_bdev() registered backing device dm-1
[   35.704662] bcache: run_cache_set() invalidating existing data
[   35.719046] bcache: register_cache() registered cache device dm-2
[   36.705686] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 
fce8d558-4657-47dc-ab37-226ada14daf5
[   36.711827] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set 
fce8d558-4657-47dc-ab37-226ada14daf5

root@guest-bcache:~# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache1  251:128  0 1024M  0 disk 
loop2          7:2    0    1G  0 loop 
└─fake-loop2 253:2    0 1024M  0 dm   
  ├─bcache0  251:0    0 1024M  0 disk 
  └─bcache1  251:128  0 1024M  0 disk 

root@guest-bcache:~# echo writeback | tee /sys/block/dm-*/bcache/cache_mode
writeback

root@guest-bcache:~# cat /sys/block/dm-*/bcache/cache_mode
writethrough [writeback] writearound none
writethrough [writeback] writearound none


root@guest-bcache:~# ./dm_fake_dev.sh /dev/loop0 bad
[   76.875749] Buffer I/O error on dev dm-0, logical block 262128, async page 
read
[   76.882159] Buffer I/O error on dev dm-0, logical block 262128, async page 
read
[   76.889453] bcache: register_bcache() error /dev/dm-0: device already 
registered (emitting change event)
[   76.892183] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   76.904907] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   76.907711] Buffer I/O error on dev bcache0, logical block 262112, async 
page read
[   76.912607] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   76.916905] Buffer I/O error on dev bcache0, logical block 262112, async 
page read
[   76.920345] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   76.924767] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   76.928404] Buffer I/O error on dev bcache0, logical block 1, async page read



root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero 
of=/dev/bcache0 bs=4k &
[  175.024811] Buffer I/O error on dev bcache0, logical block 0, lost async 
page write              
[  175.029844] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  175.034652] Buffer I/O error on dev bcache0, logical block 1, lost async 
page write              
[  175.037465] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  175.040373] Buffer I/O error on dev bcache0, logical block 2, lost async 
page write              
...
[  175.092196] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  175.096635] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  175.101272] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  175.105829] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
...
[  175.235700] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  175.239457] bcache: bch_cached_dev_error() stop bcache0: too many IO errors 
on backing device dm-0
[  175.239457]
[  175.324069] bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE already set
[  175.328998] bcache: error on fce8d558-4657-47dc-ab37-226ada14daf5:
[  175.328999] journal io error
[  175.331022] , disabling caching
[  175.334264] bcache: conditional_stop_bcache_device() 
stop_when_cache_set_failed of bcache0 is "auto" and cache is dirty, stop it to 
avoid
 potential data corruption.
[  175.338865] bcache: conditional_stop_bcache_device() 
stop_when_cache_set_failed of bcache1 is "auto" and cache is dirty, stop it to 
avoid
 potential data corruption.
[  175.344097] bcache: cached_dev_detach_finish() Caching disabled for dm-1
[  176.080139] bcache: bcache_device_free() bcache0 stopped
[  176.083928] bcache: bch_count_io_errors() dm-2: IO error on writing btree.
[  176.188371] bcache: cache_set_free() Cache set 
fce8d558-4657-47dc-ab37-226ada14daf5 unregistered
[  176.841497] bcache: bcache_device_free() bcache1 stopped

dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 1.81834 s, 591 MB/s

dd: error writing '/dev/bcache1': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.5749 s, 417 MB/s

[1]-  Exit 1                  dd if=/dev/zero of=/dev/bcache1 bs=4k
[2]+  Exit 1                  dd if=/dev/zero of=/dev/bcache0 bs=4k

root@guest-bcache:~# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
loop2          7:2    0    1G  0 loop 
└─fake-loop2 253:2    0 1024M  0 dm   
fake-loop0   253:0    0    1G  0 dm


Notice that bcache0 and bcache1 are missing.


Modified kernel
---------------

root@guest-bcache:~# uname -rv
4.18.0-23-generic #24+test20190627b1 SMP Thu Jun 27 13:29:22 UTC 2019

root@guest-bcache:~# lsblk -e 252
root@guest-bcache:~# 

root@guest-bcache:~# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1
[  146.600391] bcache: register_bdev() registered backing device dm-0
[  146.608618] bcache: register_bdev() registered backing device dm-1
[  146.617808] bcache: run_cache_set() invalidating existing data
[  146.632355] bcache: register_cache() registered cache device dm-2
[  147.615003] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 
6673bcb3-7a64-4675-a82f-59bb66886d66
[  147.633610] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set 
6673bcb3-7a64-4675-a82f-59bb66886d66

root@guest-bcache:~# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache1  251:128  0 1024M  0 disk 
loop2          7:2    0    1G  0 loop 
└─fake-loop2 253:2    0 1024M  0 dm   
  ├─bcache0  251:0    0 1024M  0 disk 
  └─bcache1  251:128  0 1024M  0 disk 

root@guest-bcache:~# echo writeback | tee /sys/block/dm-*/bcache/cache_mode
writeback
root@guest-bcache:~# cat /sys/block/dm-*/bcache/cache_mode
writethrough [writeback] writearound none
writethrough [writeback] writearound none

root@guest-bcache:~# ./dm_fake_dev.sh /dev/loop0 bad                            
                   
[  174.138534] Buffer I/O error on dev dm-0, logical block 262128, async page 
read                  
[  174.145142] Buffer I/O error on dev dm-0, logical block 262128, async page 
read                  
[  174.152728] bcache: register_bcache() error /dev/dm-0: device already 
registered (emitting change event)
[  174.154780] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  174.159945] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  174.162933] Buffer I/O error on dev bcache0, logical block 262112, async 
page read               
[  174.168696] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  174.172368] Buffer I/O error on dev bcache0, logical block 262112, async 
page read               
[  174.175272] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  174.178593] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  174.181896] Buffer I/O error on dev bcache0, logical block 1, async page 
read                    
                                                                                
                    
root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero 
of=/dev/bcache0 bs=4k &s
[1] 1377                                                                        
                    
[2] 1378                                                                        
                    

[  183.348428] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  183.354587] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  183.360488] Buffer I/O error on dev bcache0, logical block 0, lost async 
page write              
[  183.364666] Buffer I/O error on dev bcache0, logical block 1, lost async 
page write              
[  183.368326] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
...
[  183.430652] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  183.434399] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  183.438198] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  183.441991] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
...
[  183.635500] bcache: bch_cached_dev_error() stop bcache0: too many IO errors 
on backing device dm-0
[  183.635500]                                                                  
                                                           
[  184.840023] bcache: bcache_device_free() bcache0 stopped
dd: error writing '/dev/bcache0': No space left on device                       
                    
dd: error writing '/dev/bcache1': No space left on device 
262142+0 records in                                                             
                    
262141+0 records out                                                            
                    
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.18238 s, 492 MB/s                  
                    
262142+0 records in                                                             
                    
262141+0 records out                                                            
                    
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 3.69895 s, 290 MB/s

[1]-  Exit 1                  dd if=/dev/zero of=/dev/bcache1 bs=4k
[2]+  Exit 1                  dd if=/dev/zero of=/dev/bcache0 bs=4k

root@guest-bcache:~# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache1  251:128  0 1024M  0 disk 
loop2          7:2    0    1G  0 loop 
└─fake-loop2 253:2    0 1024M  0 dm   
  └─bcache1  251:128  0 1024M  0 disk 
fake-loop0   253:0    0    1G  0 dm  


Notice that only bcache0 is stopped, bcache1 is still present.

And after reboot, the bcache devices are reattached.

root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k
dd: error writing '/dev/bcache1': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.79076 s, 224 MB/s
root@guest-bcache:~# 

root@guest-bcache:~# reboot

root@guest-bcache:~# ./setup-two-bcache-one-cache.reboot.sh 
[  104.421020] bcache: register_bdev() registered backing device dm-0
[  104.492000] bcache: register_bdev() registered backing device dm-1
[  104.685632] bcache: bch_journal_replay() journal replay done, 97526 keys in 
57 entries, seq 359
[  104.695263] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set 
6673bcb3-7a64-4675-a82f-59bb66886d66
[  104.704708] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 
6673bcb3-7a64-4675-a82f-59bb66886d66
[  104.709640] bcache: register_cache() registered cache device dm-2

root@guest-bcache:~# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache1  251:128  0 1024M  0 disk 
loop2          7:2    0    1G  0 loop 
└─fake-loop2 253:2    0 1024M  0 dm   
  ├─bcache0  251:0    0 1024M  0 disk 
  └─bcache1  251:128  0 1024M  0 disk

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1829563

Title:
  bcache: risk of data loss on I/O errors in backing or caching devices

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to