[ https://issues.apache.org/jira/browse/FLINK-35217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Richter closed FLINK-35217. ---------------------------------- Resolution: Fixed Merged to master in [{{80af4d5}}|https://github.com/apache/flink/commit/80af4d502318348ba15a8f75a2a622ce9dbdc968] > Missing fsync in FileSystemCheckpointStorage > -------------------------------------------- > > Key: FLINK-35217 > URL: https://issues.apache.org/jira/browse/FLINK-35217 > Project: Flink > Issue Type: Bug > Components: FileSystems, Runtime / Checkpointing > Affects Versions: 1.17.0, 1.18.0, 1.19.0 > Reporter: Marc Aurel Fritz > Assignee: Stefan Richter > Priority: Critical > Labels: pull-request-available > Fix For: 1.20.0 > > > While running Flink on a system with unstable power supply checkpoints were > regularly corrupted in the form of "_metadata" files with a file size of 0 > bytes. In all cases the previous checkpoint data had already been deleted, > causing progress to be lost completely. > Further investigation revealed that the "FileSystemCheckpointStorage" doesn't > perform "fsync" when writing a new checkpoint to disk. This means the old > checkpoint gets removed without making sure that the new one is durably > persisted on disk. "strace" on the jobmanager's process confirms this > behavior: > # The checkpoint chk-60's in-progress metadata is written at "openat" > # The checkpoint chk-60's in-progress metadata is atomically renamed at > "rename" > # The old checkpoint chk-59 is deleted at "unlink" > For durable persistence an "fsync" call is missing before step 3. > Full "strace" log: > {code:java} > [pid 51618] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", > 0x7fd2ad5fc970) = -1 ENOENT (No such file or directory) > [pid 51618] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", > 0x7fd2ad5fca00) = -1 ENOENT (No such file or directory) > [pid 51618] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc", > {st_mode=S_IFDIR|0755, st_size=42, ...}) = 0 > [pid 51618] 11:44:30 > mkdir("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 0777) > = 0 > [pid 51618] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata", > 0x7fd2ad5fc860) = -1 ENOENT (No such file or directory) > [pid 51618] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata", > 0x7fd2ad5fc740) = -1 ENOENT (No such file or directory) > [pid 51618] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8", > 0x7fd2ad5fc7d0) = -1 ENOENT (No such file or directory) > [pid 51618] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > [pid 51618] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > [pid 51618] 11:44:30 openat(AT_FDCWD, > "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8", > O_WRONLY|O_CREAT|O_EXCL, 0666) = 168 > [pid 51618] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8", > {st_mode=S_IFREG|0644, st_size=23378, ...}) = 0 > [pid 51618] 11:44:30 > rename("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8", > "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata") = > 0 > [pid 51644] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59/_metadata", > {st_mode=S_IFREG|0644, st_size=23378, ...}) = 0 > [pid 51644] 11:44:30 > unlink("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59/_metadata") > = 0 > [pid 51644] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > [pid 51644] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > [pid 51644] 11:44:30 openat(AT_FDCWD, > "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", > O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 168 > [pid 51644] 11:44:30 newfstatat(168, "", {st_mode=S_IFDIR|0755, st_size=0, > ...}, AT_EMPTY_PATH) = 0 > [pid 51644] 11:44:30 > stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > [pid 51644] 11:44:30 openat(AT_FDCWD, > "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", > O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 168 > [pid 51644] 11:44:30 newfstatat(168, "", {st_mode=S_IFDIR|0755, st_size=0, > ...}, AT_EMPTY_PATH) = 0 > [pid 51644] 11:44:30 > unlink("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59") = -1 > EISDIR (Is a directory) > [pid 51644] 11:44:30 > rmdir("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59") = 0 > {code} > To fix this I'm currently testing the following commit: > [https://github.com/Planet-X/flink/commit/24196cc897533b654f44e2b612543ff023cdb123] > "strace" can confirm that "fsync" is now called before the previous > checkpoint is removed at "unlink": > {code:java} > [pid 40393] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", > <unfinished ...> > [pid 40393] 11:30:17 <... stat resumed>0x7fc887efc970) = -1 ENOENT (No such > file or directory) > [pid 40393] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", > 0x7fc887efca00) = -1 ENOENT (No such file or directory) > [pid 40393] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e", > {st_mode=S_IFDIR|0755, st_size=42, ...}) = 0 > [pid 40393] 11:30:17 > mkdir("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", 0777) > = 0 > [pid 40393] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/_metadata", > 0x7fc887efc870) = -1 ENOENT (No such file or directory) > [pid 40393] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/_metadata", > 0x7fc887efc750) = -1 ENOENT (No such file or directory) > [pid 40393] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90", > 0x7fc887efc7e0) = -1 ENOENT (No such file or directory) > [pid 40393] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > [pid 40393] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > [pid 40393] 11:30:17 openat(AT_FDCWD, > "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90", > O_WRONLY|O_CREAT|O_EXCL, 0666) = 194 > [pid 40393] 11:30:17 fsync(194) = 0 > [pid 40393] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90", > {st_mode=S_IFREG|0644, st_size=23366, ...}) = 0 > [pid 40393] 11:30:17 > rename("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90", > "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/_metadata") = > 0 > [pid 39230] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49/_metadata", > {st_mode=S_IFREG|0644, st_size=23366, ...}) = 0 > [pid 39230] 11:30:17 > unlink("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49/_metadata") > = 0 > [pid 39230] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > [pid 39230] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > [pid 39230] 11:30:17 openat(AT_FDCWD, > "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", > O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 194 > [pid 39230] 11:30:17 newfstatat(194, "", {st_mode=S_IFDIR|0755, st_size=0, > ...}, AT_EMPTY_PATH) = 0 > [pid 39230] 11:30:17 > stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", > {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > [pid 39230] 11:30:17 openat(AT_FDCWD, > "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", > O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 194 > [pid 39230] 11:30:17 newfstatat(194, "", {st_mode=S_IFDIR|0755, st_size=0, > ...}, AT_EMPTY_PATH) = 0 > [pid 39230] 11:30:17 > unlink("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49") = -1 > EISDIR (Is a directory) > [pid 39230] 11:30:17 > rmdir("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49") = 0 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)