Systemd-devel,

Below is a write-up I've done to explain a new service for archiving pstore contents. I've attached the pstore.service files (/lib/systemd/system/pstore.service and bin/pstore-tool). These are trivial right now, but easy to build upon if periodic, rather than just on-boot, examination of the pstore is desirable.

The questions I have for you are:

- Is a new unit pstore.service the right approach for this? If not, what unit do you recommend augmenting with these actions?

- What are your thoughts/comments/feedback on such a service?

Thank you in advance for your time,
Eric

==== Oracle ERST usage ====
The BIOS ACPI error record serialization table, ERST, is an API for storing data into non-volatile storage, such as hardware errors [1, Section 18.5 Error Serialization]. The ERST non-volatile storage on Oracle servers tends to be small, on the order of 64KiB.

The Linux persistent storage subsystem, pstore, supports using the ERST as a backend for persistent storage [2].

The kernel, with the crash_kexec_post_notifiers command line option, stores the dmesg into pstore on a panic [3]. This action is available independent of kdump; as such, the crash backtrace is captured into pstore for post mortem analysis, regardless of whether kdump is enabled or working properly.

Since the ERST area is typically small, it is easily filled with the contents of dmesg upon a kernel panic. As such, there is a need to archive the contents of kernel dmesg items in the pstore to a normal filesystem, and then free the dmesg items in the pstore in order to make room for the dmesg of a subsequent kernel panic.

Therefore, this is a proposal for a new service, pstore.service, that will archive the dmesg contents in the pstore to a regular filesystem, and remove those dmesg entries from the pstore. Since Linux exposes the persistent storage subsystem as a filesystem [2], and the items in the pstore are available as regular files, this makes archiving and removal of the entries trivial. This proposal is for a new service instead of augmenting kdump.service since this is independent of kdump, though both are related to a kernel crash. Conceivably other items that are stored in pstore, like hardware errors, could have their own rules for archiving. The goal of the pstore.service is to attempt to keep the pstore empty and available for emergent events like hardware errors and kernel crashes.

Initially the service could be as simple as looking for items upon boot, but I could see it being extended to periodically check for events like hardware errors in the pstore. Kernel crash dmesg items are named in a regular fashion, such as:

-r--r--r-- 1 root root 17716 Nov 20 11:08 dmesg-erst-6625975467788730369
-r--r--r-- 1 root root 17731 Nov 20 11:08 dmesg-erst-6625975467788730370
-r--r--r-- 1 root root 17679 Nov 20 11:08 dmesg-erst-6625975467788730371

And a simple bit of filename manipulation can be used to create archive sub-directories, say in /var/pstore, with the archived data.

[1] "Advanced Configuration and Power Interface Specification",
     version 6.2, May 2017.
     https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf

[2] "Persistent storage for a kernel's dying breath",
     March 23, 2011.
     https://lwn.net/Articles/434821/

[3] "The kernel’s command-line parameters",
     https://static.lwn.net/kerneldoc/admin-guide/kernel-parameters.html

[Unit]
Description=pstore archive service
Wants=network-online.target local-fs.target remote-fs.target
After=network-online.target

[Service]
Type=oneshot
StandardOutput=syslog+console
#EnvironmentFile=/etc/default/kdump-tools
#ExecStart=/etc/init.d/pstore-tools start
#ExecStop=/etc/init.d/pstore-tools stop
ExecStart=/root/pstore-tool start
ExecStop=/root/pstore-tool stop
#RemainAfterExit=yes
RemainAfterExit=no

[Install]
#WantedBy=multi-user.target
WantedBy=local-fs.target

#!/bin/sh
# Utility script to archive contents of pstore

#-r--r--r--. 1 root root 1826 Dec 17 10:44 dmesg-efi-154506148323001
#-r--r--r--. 1 root root 1826 Dec 17 10:44 dmesg-efi-154506148324001

pstorefs=/sys/fs/pstore
archivedir=/var/pstore/`date +"%Y-%m-%d-%H:%M"`

pstore_start()
{
    echo "PSTORE manager started wtf"
    # Note: The -r is essential for dmesg reconstruction
    files=`ls -r $pstorefs/dmesg-* 2>/dev/null`
    if [ "$files" != "" ];
    then
        # Archive files
        mkdir -p $archivedir
        for f in $files;
        do
            # Reconstruct dmesg
            cat $f >> $archivedir/dmesg.txt
            mv -f $f $archivedir
        done
    fi
}

pstore_stop()
{
    echo "PSTORE manager stopped"
}

while [[ $# -gt 0 ]]
do
    case $1 in
        start)
            pstore_start
            ;;
        stop)
            pstore_stop
            ;;
        *)
            echo "pstore-tool: unrecognized option: $1"
            ;;
    esac
    shift # on to next argument
done

_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Reply via email to