Hello all, I'm currently running 2.13.0 on Debian Buster with ZFS osds. My current setup is a simple cluster with all the components on the same node. Though the OST is marked as "failout", operations are still hanging indefinitely when they should fail after a timeout.
Predictably, I get the following error when It try and `touch` a file on the missing OST: Lustre: 16507:0:(client.c:2219:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1588114247/real 1588114247] req@00000000978c5ab1 x1665257677278528/t0(0) o2->foobar-OST0000-osc-ffff8e9de9263800@192.168.7.229@tcp1:28/4 lens 440/432 e 0 to 1 dl 1588114254 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' Then there is the hung `touch` task in the kernel as well: [ 1177.541894] LustreError: 49487:0:(mgs_llog.c:4313:mgs_write_log_param()) err -22 on param 'sys.timeout' [ 1177.542627] LustreError: 49487:0:(mgs_handler.c:1032:mgs_iocontrol()) MGS: setparam err: rc = -22 [ 1209.388728] INFO: task touch:48422 blocked for more than 120 seconds. [ 1209.389779] Tainted: P O 4.19.0-9-amd64 #1 Debian 4.19.98-1 [ 1209.390636] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1209.391482] touch D 0 48422 48421 0x00000000 [ 1209.391485] Call Trace: [ 1209.391495] ? __schedule+0x2a2/0x870 [ 1209.391497] schedule+0x28/0x80 [ 1209.391499] schedule_timeout+0x26d/0x390 [ 1209.391646] ? ptlrpc_set_add_new_req+0x100/0x180 [ptlrpc] [ 1209.391649] wait_for_completion+0x11f/0x190 [ 1209.391655] ? wake_up_q+0x70/0x70 [ 1209.391688] osc_io_setattr_end+0xcf/0x1f0 [osc] [ 1209.391710] ? lov_io_iter_fini_wrapper+0x40/0x40 [lov] [ 1209.391771] cl_io_end+0x53/0x130 [obdclass] [ 1209.391781] lov_io_end_wrapper+0xc3/0xd0 [lov] [ 1209.391787] lov_io_call.isra.10+0x7d/0x130 [lov] [ 1209.391793] lov_io_end+0x32/0xd0 [lov] [ 1209.391822] cl_io_end+0x53/0x130 [obdclass] [ 1209.391851] cl_io_loop+0xea/0x1b0 [obdclass] [ 1209.391917] cl_setattr_ost+0x278/0x300 [lustre] [ 1209.391931] ll_setattr_raw+0xe9b/0xf50 [lustre] [ 1209.391936] notify_change+0x2df/0x440 [ 1209.391939] utimes_common.isra.1+0xdf/0x1b0 [ 1209.391942] ? __check_object_size+0x162/0x173 [ 1209.391943] do_utimes+0x13c/0x160 [ 1209.391945] __x64_sys_utimensat+0x7a/0xc0 [ 1209.391952] ? lov_read_and_clear_async_rc+0x178/0x310 [lov] [ 1209.391957] do_syscall_64+0x53/0x110 [ 1209.391961] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1209.391963] RIP: 0033:0x7f290224a2d3 [ 1209.391968] Code: Bad RIP value. [ 1209.391969] RSP: 002b:00007ffc62383408 EFLAGS: 00000246 ORIG_RAX: 0000000000000118 [ 1209.391971] RAX: ffffffffffffffda RBX: 00007ffc623848f0 RCX: 00007f290224a2d3 [ 1209.391972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 1209.391972] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 [ 1209.391973] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffff9c [ 1209.391974] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000 It hangs indefinitely, repeating this error until the OST is reattached. Initially, the OST was not created with "failover.mode=failout", and was modified with `mkfs.lustre --param failover.mode=failout ZFS/DATASET` followed by a `--writeconf` after the logs errored and said it would be necessary. I have tried bringing down the clustre entirely and then back up, but the behavior persists. Am I missing something? Maybe an lctl parameter, or a mount option? Perhaps this is a known issue and the OST must be initially formatted with this option? So far the only resource I've found is on page 107-108 of the manual outlining how to do this. Thanks for your time and assistance, Christian -- <https://opendrives.com/wp-content/uploads/2020/04/OD-Anywhere.pdf>
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org