[jira] [Updated] (ZOOKEEPER-4744) Zookeeper fails to start after power failure

Maria Ramos (Jira) Fri, 15 Sep 2023 07:19:06 -0700


     [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Maria Ramos updated ZOOKEEPER-4744:
-----------------------------------
    Description: 
The underlying issue stems from consecutive writes to the log file that are not 
interleaved with {{fsync}} operations. This is a well-documented behavior of 
operating systems, and there are several references addressing this problem:
 - 
[https://www.usenix.org/conference/osdi14/technical-sessions/presentation/pillai]
 - [https://dl.acm.org/doi/pdf/10.1145/2872362.2872406]
 - [https://mariadb.com/kb/en/atomic-write-support/]
 - [https://pages.cs.wisc.edu/~remzi/OSTEP/file-journaling.pdf] (Page 9)

This issue can be replicated using 
[LazyFS|https://github.com/dsrhaslab/lazyfs], a file system capable of 
simulating power failures and exhibiting the OS behavior mentioned above, i.e., 
the out-of-order file writes at the disk level. LazyFS persists these writes 
out of order and then crashes to simulate a power failure.

To reproduce this problem, one can follow these steps:

{*}1{*}. Mount LazyFS on a directory where ZooKeeper data will be saved, with a 
specified root directory. Assuming the data path for ZooKeeper is 
{{/home/data/zk}} and the root directory is {{{}/home/data/zk-root{}}}, add the 
following lines to the default configuration file (located in the 
{{config/default.toml}} directory):

{{[[injection]] }}
{{type="reorder" }}
{{occurrence=1 }}
{{op="write" }}
{{file="/home/data/zk-root/version-2/log.100000001" }}
{{persist=[3]}}

These lines define a fault to be injected. A power failure will be simulated 
after the third write to the {{/home/data/zk-root/version-2/log.100000001}} 
file. The `occurrence` parameter allows specifying that this is the first group 
where this happens, as there might be more than one group of consecutive writes.

{*}2{*}. Start LazyFS as the underlying file system of a node_ in the cluster 
with the following command:

{{     ./scripts/mount-lazyfs.sh -c config/default.toml -m /home/data/zk -r 
/home/data/zk-root -f}}

{*}3{*}. Start ZooKeeper with the command:
{{     apache-zookeeper-3.7.1-bin/bin/zkServer.sh start-foreground}}

  {*}4{*}. Connect a client to the node that has LazyFS as the underlying file 
system:

          {{apache-zookeeper-3.7.1-bin/bin/zkCli.sh -server 127.0.0.1:2181}}

Immediately after this step, LazyFS will be unmounted, simulating a power 
failure, and ZooKeeper will keep printing error messages in the terminal, 
requiring a forced shutdown.
At this point, one can analyze the logs produced by LazyFS to examine the 
system calls issued up to the moment of the fault. Here is a simplified version 
of the log:

{'syscall': 'create', 'path': 
'/home/gsd/data/zk37-root/version-2/log.100000001', 'mode': 'O_TRUNC'}

{'syscall': 'write', 'path': '/home/data/zk37-root/version-2/log.100000001', 
'size': '16', 'off': '0'}

{'syscall': 'write', 'path': '/home/data/zk37-root/version-2/log.100000001', 
'size': '1', 'off': '67108879'}

{'syscall': 'write', 'path': '/home/data/zk37-root/version-2/log.100000001', 
'size': '67108863', 'off': '16'}

{'syscall': 'write', 'path': '/home/data/zk37-root/version-2/log.100000001', 
'size': '61', 'off': '16'}

Note that the third write is issued by LazyFS for padding.

 
{*}5{*}. Remove the fault from the configuration file, unmount the file system 
with

           {{fusermount -uz /home/data/zk}}

{*}6{*}. Mount LazyFS again with the previously provided command.

  {*}7{*}. Attempt to start ZooKeeper (it fails).

By following these steps, one can replicate the issue and analyze the effects 
of the power failure on ZooKeeper's restart process.

  was:
The underlying issue stems from consecutive writes to the log file that are not 
interleaved with {{fsync}} operations. This is a well-documented behavior of 
operating systems, and there are several references addressing this problem:
 - 
[https://www.usenix.org/conference/osdi14/technical-sessions/presentation/pillai]
 - [https://dl.acm.org/doi/pdf/10.1145/2872362.2872406]
 - [https://mariadb.com/kb/en/atomic-write-support/]
 - [https://pages.cs.wisc.edu/~remzi/OSTEP/file-journaling.pdf] (Page 9)

This issue can be replicated using 
[LazyFS|https://github.com/dsrhaslab/lazyfs], a file system capable of 
simulating power failures and exhibiting the OS behavior mentioned above, i.e., 
the out-of-order file writes at the disk level. LazyFS persists these writes 
out of order and then crashes to simulate a power failure.

To reproduce this problem, one can follow these steps:

   {*}1{*}. Mount LazyFS on a directory where ZooKeeper data will be saved, 
with a specified root directory. Assuming the data path for ZooKeeper is 
{{/home/data/zk}} and the root directory is {{{}/home/data/zk-root{}}}, add the 
following lines to the default configuration file (located in the 
{{config/default.toml}} directory):

{{[[injection]] }}
{{type="reorder" }}
{{occurrence=1 }}
{{op="write" }}
{{file="/home/data/zk-root/version-2/log.100000001" }}
{{persist=[3]}}

These lines define a fault to be injected. A power failure will be simulated 
after the third write to the {{/home/data/zk-root/version-2/log.100000001}} 
file. The `occurrence` parameter allows specifying that this is the first group 
where this happens, as there might be more than one group of consecutive writes.

   {*}2{*}. Start LazyFS as the underlying file system of a node_ in the 
cluster with the following command:

{{     ./scripts/mount-lazyfs.sh -c config/default.toml -m /home/data/zk -r 
/home/data/zk-root -f}}


   {*}3{*}. Start ZooKeeper with the command:
{{     apache-zookeeper-3.7.1-bin/bin/zkServer.sh start-foreground}}


   {*}4{*}. Connect a client to the node that has LazyFS as the underlying file 
system:

          {{apache-zookeeper-3.7.1-bin/bin/zkCli.sh -server 127.0.0.1:2181}}

Immediately after this step, LazyFS will be unmounted, simulating a power 
failure, and ZooKeeper will keep printing error messages in the terminal, 
requiring a forced shutdown.
At this point, one can analyze the logs produced by LazyFS to examine the 
system calls issued up to the moment of the fault. Here is a simplified version 
of the log:

{'syscall': 'create', 'path': 
'/home/gsd/data/zk37-root/version-2/log.100000001', 'mode': 'O_TRUNC'} 
{'syscall': 'write', 'path': '/home/data/zk37-root/version-2/log.100000001', 
'size': '16', 'off': '0'} 
{'syscall': 'write', 'path': '/home/data/zk37-root/version-2/log.100000001', 
'size': '1', 'off': '67108879'} 
{'syscall': 'write', 'path': '/home/data/zk37-root/version-2/log.100000001', 
'size': '67108863', 'off': '16'} 
{'syscall': 'write', 'path': '/home/data/zk37-root/version-2/log.100000001', 
'size': '61', 'off': '16'} 

Note that the third write is issued by LazyFS for padding.

 
   {*}5{*}. Remove the fault from the configuration file, unmount the file 
system with

           {{fusermount -uz /home/data/zk}}

   {*}6{*}. Mount LazyFS again with the previously provided command.

   {*}7{*}. Attempt to start ZooKeeper (it fails).

By following these steps, one can replicate the issue and analyze the effects 
of the power failure on ZooKeeper's restart process.


> Zookeeper fails to start after power failure
> --------------------------------------------
>
>                 Key: ZOOKEEPER-4744
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4744
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.7.1
>         Environment: These are the configurations of the ZooKeeper cluster 
> (omitting IPs):
> {{tickTime=2000}}
> {{dataDir=/home/data/zk37}}
> {{clientPort=2181}}
> {{maxClientCnxns=60}}
> {{initLimit=100}}
> {{syncLimit=100}}
> {{server.1=[IP1]:2888:3888}}
> {{server.2=[IP2]:2888:3888}}
> {{server.3=[IP3]:2888:3888}}
>            Reporter: Maria Ramos
>            Priority: Critical
>         Attachments: reported_error.txt
>
>
> The underlying issue stems from consecutive writes to the log file that are 
> not interleaved with {{fsync}} operations. This is a well-documented behavior 
> of operating systems, and there are several references addressing this 
> problem:
>  - 
> [https://www.usenix.org/conference/osdi14/technical-sessions/presentation/pillai]
>  - [https://dl.acm.org/doi/pdf/10.1145/2872362.2872406]
>  - [https://mariadb.com/kb/en/atomic-write-support/]
>  - [https://pages.cs.wisc.edu/~remzi/OSTEP/file-journaling.pdf] (Page 9)
> This issue can be replicated using 
> [LazyFS|https://github.com/dsrhaslab/lazyfs], a file system capable of 
> simulating power failures and exhibiting the OS behavior mentioned above, 
> i.e., the out-of-order file writes at the disk level. LazyFS persists these 
> writes out of order and then crashes to simulate a power failure.
> To reproduce this problem, one can follow these steps:
> {*}1{*}. Mount LazyFS on a directory where ZooKeeper data will be saved, with 
> a specified root directory. Assuming the data path for ZooKeeper is 
> {{/home/data/zk}} and the root directory is {{{}/home/data/zk-root{}}}, add 
> the following lines to the default configuration file (located in the 
> {{config/default.toml}} directory):
> {{[[injection]] }}
> {{type="reorder" }}
> {{occurrence=1 }}
> {{op="write" }}
> {{file="/home/data/zk-root/version-2/log.100000001" }}
> {{persist=[3]}}
> These lines define a fault to be injected. A power failure will be simulated 
> after the third write to the {{/home/data/zk-root/version-2/log.100000001}} 
> file. The `occurrence` parameter allows specifying that this is the first 
> group where this happens, as there might be more than one group of 
> consecutive writes.
> {*}2{*}. Start LazyFS as the underlying file system of a node_ in the cluster 
> with the following command:
> {{     ./scripts/mount-lazyfs.sh -c config/default.toml -m /home/data/zk -r 
> /home/data/zk-root -f}}
> {*}3{*}. Start ZooKeeper with the command:
> {{     apache-zookeeper-3.7.1-bin/bin/zkServer.sh start-foreground}}
>   {*}4{*}. Connect a client to the node that has LazyFS as the underlying 
> file system:
>           {{apache-zookeeper-3.7.1-bin/bin/zkCli.sh -server 127.0.0.1:2181}}
> Immediately after this step, LazyFS will be unmounted, simulating a power 
> failure, and ZooKeeper will keep printing error messages in the terminal, 
> requiring a forced shutdown.
> At this point, one can analyze the logs produced by LazyFS to examine the 
> system calls issued up to the moment of the fault. Here is a simplified 
> version of the log:
> {'syscall': 'create', 'path': 
> '/home/gsd/data/zk37-root/version-2/log.100000001', 'mode': 'O_TRUNC'}
> {'syscall': 'write', 'path': '/home/data/zk37-root/version-2/log.100000001', 
> 'size': '16', 'off': '0'}
> {'syscall': 'write', 'path': '/home/data/zk37-root/version-2/log.100000001', 
> 'size': '1', 'off': '67108879'}
> {'syscall': 'write', 'path': '/home/data/zk37-root/version-2/log.100000001', 
> 'size': '67108863', 'off': '16'}
> {'syscall': 'write', 'path': '/home/data/zk37-root/version-2/log.100000001', 
> 'size': '61', 'off': '16'}
> Note that the third write is issued by LazyFS for padding.
>  
> {*}5{*}. Remove the fault from the configuration file, unmount the file 
> system with
>            {{fusermount -uz /home/data/zk}}
> {*}6{*}. Mount LazyFS again with the previously provided command.
>   {*}7{*}. Attempt to start ZooKeeper (it fails).
> By following these steps, one can replicate the issue and analyze the effects 
> of the power failure on ZooKeeper's restart process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ZOOKEEPER-4744) Zookeeper fails to start after power failure

Reply via email to