Module Name:    src
Committed By:   martin
Date:           Mon Aug 17 10:30:23 UTC 2020

Modified Files:
        src/lib/libp2k [netbsd-9]: p2k.c
        src/sbin/fsck_lfs [netbsd-9]: pass1.c
        src/sys/arch/i386/stand/efiboot/bootx64 [netbsd-9]: Makefile
        src/sys/rump/fs/lib/liblfs [netbsd-9]: Makefile
        src/sys/ufs/lfs [netbsd-9]: lfs.h lfs_accessors.h lfs_alloc.c
            lfs_balloc.c lfs_bio.c lfs_debug.c lfs_extern.h lfs_inode.c
            lfs_inode.h lfs_pages.c lfs_rename.c lfs_segment.c lfs_subr.c
            lfs_vfsops.c lfs_vnops.c
        src/usr.sbin/dumplfs [netbsd-9]: dumplfs.c

Log Message:
Pull up following revision(s) (requested by riastradh in ticket #1050):

        sys/ufs/lfs/lfs_subr.c: revision 1.101
        sys/ufs/lfs/lfs_subr.c: revision 1.102
        sys/ufs/lfs/lfs_inode.c: revision 1.158
        sys/ufs/lfs/lfs_inode.h: revision 1.25
        sys/ufs/lfs/lfs_balloc.c: revision 1.95
        sys/ufs/lfs/lfs_pages.c: revision 1.21
        sys/ufs/lfs/lfs_vnops.c: revision 1.330
        sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
        sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
        lib/libp2k/p2k.c: revision 1.72
        sys/ufs/lfs/lfs.h: revision 1.205
        sys/ufs/lfs/lfs.h: revision 1.206
        sys/ufs/lfs/lfs_segment.c: revision 1.284
        sys/ufs/lfs/lfs.h: revision 1.207
        sys/ufs/lfs/lfs_segment.c: revision 1.285
        sys/ufs/lfs/lfs_debug.c: revision 1.55
        sys/ufs/lfs/lfs_rename.c: revision 1.23
        usr.sbin/dumplfs/dumplfs.c: revision 1.65
        sys/ufs/lfs/lfs_vfsops.c: revision 1.371
        sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
        sys/ufs/lfs/lfs_vfsops.c: revision 1.372
        sys/ufs/lfs/lfs_vfsops.c: revision 1.373
        sbin/fsck_lfs/pass1.c: revision 1.46
        sys/ufs/lfs/lfs_vnops.c: revision 1.326
        sys/ufs/lfs/lfs_vnops.c: revision 1.327
        sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
        sys/ufs/lfs/lfs_vnops.c: revision 1.328
        sys/ufs/lfs/lfs_subr.c: revision 1.98
        sys/ufs/lfs/lfs_extern.h: revision 1.116
        sys/ufs/lfs/lfs_vnops.c: revision 1.329
        sys/ufs/lfs/lfs_subr.c: revision 1.99
        sys/ufs/lfs/lfs_extern.h: revision 1.117
        sys/ufs/lfs/lfs_accessors.h: revision 1.49
        sys/ufs/lfs/lfs_extern.h: revision 1.118
        sys/rump/fs/lib/liblfs/Makefile: revision 1.15
        sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
        sys/ufs/lfs/lfs_bio.c: revision 1.147
        sys/ufs/lfs/lfs_subr.c: revision 1.100

Fix kassert in lfs by initializing vp first.

Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

I believe elements can be removed while the lock is dropped,
including the next node we're hanging on to.

Just use VOP_BWRITE for lfs_bwrite_log.
Hope this doesn't cause trouble with vfs_suspend.

Teach lfs to transition ro<->rw.

Prevent new dirops while we issue lfs_flush_dirops.

lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
that vnodes on the dchain will not become involved in active dirops
even while holding no other locks (lfs_lock, v_interlock), so we must
set lfs_writer here.  All other callers already set lfs_writer.

We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
because
(a) we already waited for the dirops to drain, and
(b) we hold lfs_lock and cannot drop it before setting lfs_writer.

Assert lfs_writer where I think we can now prove it.

Serialize access to the splay tree with lfs_lock.

Change some cheap KDASSERT into KASSERT.

Take a reference and fix assertions in lfs_flush_dirops.
Fixes panic:
KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
lfs_flush_dirops
lfs_check
lfs_setattr
VOP_SETATTR
change_mode
sys_fchmod
syscall

This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
-- is valid only until we release lfs_lock, because we may race with
lfs_unmark_dirop which will remove the nodes and change the flags.

Further, vp itself is valid only as long as it is referenced, which it
is as long as it's on the dchain, but lfs_unmark_dirop drops the
dchain's reference.

Don't lfs_writer_enter while holding v_interlock.

There's no need to lfs_writer_enter at all here, as far as I can see.
lfs_flush_fs will do it for us.

Break deadlock in PR kern/52301.

The lock order is lfs_writer -> lfs_seglock.  The problem in 52301 is
that lfs_segwrite violates this lock order by sometimes doing
lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
opportunistically, when there are no dirops pending.  Both cases can
deadlock, because dirops sometimes take the seglock (lfs_truncate,
lfs_valloc, lfs_vfree):
(a) There may be dirops pending, and they may be waiting for the
seglock, so we can't wait for them to complete while holding the
seglock.
(b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
may change by the time lfs_writer_enter acquires lfs_lock.

To resolve this in each case:
(a) Do lfs_writer_enter before lfs_seglock, since we will need it
unconditionally anyway.  The worst performance impact of this should
be that some dirops get delayed a little bit.
(b) Create a new lfs_writer_tryenter to use at this point so that the
test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
atomically under lfs_lock.

Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

Fixes reloading lfs.kmod.

In lfs_update, hold lfs_writer around lfs_vflush.

Otherwise, we might do
lfs_vflush
-> lfs_seglock
-> lfs_segwait(SEGM_CKP)
   -> lfs_writer_enter
which is the reverse of the lfs_writer -> lfs_seglock ordering.

Call lfs_orphan in lfs_rename while we're still in the dirop.
lfs_writer_enter can't fail; keep it simple and don't pretend it can.

Assert that mtsleep can't fail either -- it doesn't catch signals and
there's no timeout.

Teach LFS_ORPHAN_NEXTFREE about lfs64.

Dust off the orphan detection code and try to make it work.

Fix !DIAGNOSTIC compile

Fix userland references to LFS_ORPHAN_NEXTFREE.

Forgot to grep for these or do a full distribution build, oops!

Fix missing <sys/evcnt.h> by removing the evcnts instead.

Just wanted to confirm that a race might happen, and indeed it did.
These serve little diagnostic value otherwise.

OR into bp->b_cflags; don't overwrite.

CTASSERT lfs on-disk structure sizes.

Avoid misaligned access to lfs64 on-disk records in memory.
lfs64 directory entries are only 32-bit aligned in order to conserve
space in directory blocks, and we had a hack to stuff a 64-bit inode
in them.  This replaces the hack by __aligned(4) __packed, and goes
further:

1. It's not clear that all the other lfs64 data structures are 64-bit
   aligned on disk to begin with.  We can go through these later and
   upgrade them from
        struct foo64 {
                ...
        } __aligned(4) __packed;
        union foo {
                struct foo64 f64;
                ...
        };
   to
        struct foo64 {
                ...
        };
        union foo {
                struct foo64 f64 __aligned(8);
                ...
        } __aligned(4) __packed;
   if we really want to take advantage of 64-bit memory accesses.
   However, the __aligned(4) __packed must remain on the union
   because:
2. We access even the lfs32 data structures via a union that has
   lfs64 members, and it turns out that compilers will assume access
   through a union with 64-bit aligned members implies the whole
   union has 64-bit alignment, even if we're only accessing a 32-bit
   aligned member.

Fix clang build after packed lfs64 accessor change.

Suppress spurious address-of-packed error in rump lfs too.


To generate a diff of this commit:
cvs rdiff -u -r1.70 -r1.70.14.1 src/lib/libp2k/p2k.c
cvs rdiff -u -r1.45 -r1.45.18.1 src/sbin/fsck_lfs/pass1.c
cvs rdiff -u -r1.1.26.1 -r1.1.26.2 \
    src/sys/arch/i386/stand/efiboot/bootx64/Makefile
cvs rdiff -u -r1.14 -r1.14.22.1 src/sys/rump/fs/lib/liblfs/Makefile
cvs rdiff -u -r1.204 -r1.204.4.1 src/sys/ufs/lfs/lfs.h
cvs rdiff -u -r1.48 -r1.48.12.1 src/sys/ufs/lfs/lfs_accessors.h
cvs rdiff -u -r1.137 -r1.137.8.1 src/sys/ufs/lfs/lfs_alloc.c
cvs rdiff -u -r1.94 -r1.94.10.1 src/sys/ufs/lfs/lfs_balloc.c
cvs rdiff -u -r1.142 -r1.142.6.1 src/sys/ufs/lfs/lfs_bio.c
cvs rdiff -u -r1.54 -r1.54.22.1 src/sys/ufs/lfs/lfs_debug.c
cvs rdiff -u -r1.114 -r1.114.4.1 src/sys/ufs/lfs/lfs_extern.h
cvs rdiff -u -r1.157 -r1.157.10.1 src/sys/ufs/lfs/lfs_inode.c
cvs rdiff -u -r1.23 -r1.23.10.1 src/sys/ufs/lfs/lfs_inode.h
cvs rdiff -u -r1.15 -r1.15.8.1 src/sys/ufs/lfs/lfs_pages.c
cvs rdiff -u -r1.22 -r1.22.10.1 src/sys/ufs/lfs/lfs_rename.c
cvs rdiff -u -r1.278 -r1.278.4.1 src/sys/ufs/lfs/lfs_segment.c
cvs rdiff -u -r1.97 -r1.97.8.1 src/sys/ufs/lfs/lfs_subr.c
cvs rdiff -u -r1.365 -r1.365.2.1 src/sys/ufs/lfs/lfs_vfsops.c
cvs rdiff -u -r1.324 -r1.324.2.1 src/sys/ufs/lfs/lfs_vnops.c
cvs rdiff -u -r1.64 -r1.64.4.1 src/usr.sbin/dumplfs/dumplfs.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/lib/libp2k/p2k.c
diff -u src/lib/libp2k/p2k.c:1.70 src/lib/libp2k/p2k.c:1.70.14.1
--- src/lib/libp2k/p2k.c:1.70	Wed Apr 26 03:02:48 2017
+++ src/lib/libp2k/p2k.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: p2k.c,v 1.70 2017/04/26 03:02:48 riastradh Exp $	*/
+/*	$NetBSD: p2k.c,v 1.70.14.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*
  * Copyright (c) 2007, 2008, 2009  Antti Kantee.  All Rights Reserved.
@@ -789,7 +789,7 @@ do_makenode(struct puffs_usermount *pu, 
 	struct p2k_node *p2n;
 	struct componentname *cn;
 	struct vattr *va_x;
-	struct vnode *vp;
+	struct vnode *vp = NULL;
 	int rv;
 
 	p2n = malloc(sizeof(*p2n));

Index: src/sbin/fsck_lfs/pass1.c
diff -u src/sbin/fsck_lfs/pass1.c:1.45 src/sbin/fsck_lfs/pass1.c:1.45.18.1
--- src/sbin/fsck_lfs/pass1.c:1.45	Sat Oct  3 08:30:13 2015
+++ src/sbin/fsck_lfs/pass1.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/* $NetBSD: pass1.c,v 1.45 2015/10/03 08:30:13 dholland Exp $	 */
+/* $NetBSD: pass1.c,v 1.45.18.1 2020/08/17 10:30:22 martin Exp $	 */
 
 /*
  * Copyright (c) 1980, 1986, 1993
@@ -307,7 +307,7 @@ checkinode(ino_t inumber, struct inodesc
 	 */
 	if (lfs_dino_getnlink(fs, dp) <= 0) {
 		LFS_IENTRY(ifp, fs, inumber, bp);
-		if (lfs_if_getnextfree(fs, ifp) == LFS_ORPHAN_NEXTFREE) {
+		if (lfs_if_getnextfree(fs, ifp) == LFS_ORPHAN_NEXTFREE(fs)) {
 			statemap[inumber] = (mode == LFS_IFDIR ? DCLEAR : FCLEAR);
 			/* Add this to our list of orphans */
 			zlnp = emalloc(sizeof *zlnp);

Index: src/sys/arch/i386/stand/efiboot/bootx64/Makefile
diff -u src/sys/arch/i386/stand/efiboot/bootx64/Makefile:1.1.26.1 src/sys/arch/i386/stand/efiboot/bootx64/Makefile:1.1.26.2
--- src/sys/arch/i386/stand/efiboot/bootx64/Makefile:1.1.26.1	Tue Sep 17 19:32:00 2019
+++ src/sys/arch/i386/stand/efiboot/bootx64/Makefile	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-#	$NetBSD: Makefile,v 1.1.26.1 2019/09/17 19:32:00 martin Exp $
+#	$NetBSD: Makefile,v 1.1.26.2 2020/08/17 10:30:22 martin Exp $
 
 PROG=		bootx64.efi
 OBJFMT=		pei-x86-64
@@ -9,4 +9,9 @@ EXTRA_SOURCES=	efibootx64.c startprog64.
 COPTS+=		-mno-red-zone
 CPPFLAGS+=	-DEFI_FUNCTION_WRAPPER
 
+# Follow the suit of Makefile.kern.inc; needed for the lfs64 union
+# accessors -- they don't actually dereference the resulting pointer,
+# just use it for type-checking.
+CWARNFLAGS.clang+=	-Wno-error=address-of-packed-member
+
 .include "${.CURDIR}/../Makefile.efiboot"

Index: src/sys/rump/fs/lib/liblfs/Makefile
diff -u src/sys/rump/fs/lib/liblfs/Makefile:1.14 src/sys/rump/fs/lib/liblfs/Makefile:1.14.22.1
--- src/sys/rump/fs/lib/liblfs/Makefile:1.14	Wed Mar 23 21:38:51 2016
+++ src/sys/rump/fs/lib/liblfs/Makefile	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-#	$NetBSD: Makefile,v 1.14 2016/03/23 21:38:51 christos Exp $
+#	$NetBSD: Makefile,v 1.14.22.1 2020/08/17 10:30:22 martin Exp $
 #
 
 .PATH:  ${.CURDIR}/../../../../ufs/lfs
@@ -21,5 +21,10 @@ CFLAGS+=        -DLFS_KERNEL_RFW
 COPTS.lfs_inode.c+=-O0
 .endif
 
+# Follow the suit of Makefile.kern.inc; needed for the lfs64 union
+# accessors -- they don't actually dereference the resulting pointer,
+# just use it for type-checking.
+CWARNFLAGS.clang+=	-Wno-error=address-of-packed-member
+
 .include <bsd.lib.mk>
 .include <bsd.klinks.mk>

Index: src/sys/ufs/lfs/lfs.h
diff -u src/sys/ufs/lfs/lfs.h:1.204 src/sys/ufs/lfs/lfs.h:1.204.4.1
--- src/sys/ufs/lfs/lfs.h:1.204	Thu Jan 10 06:31:04 2019
+++ src/sys/ufs/lfs/lfs.h	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs.h,v 1.204 2019/01/10 06:31:04 martin Exp $	*/
+/*	$NetBSD: lfs.h,v 1.204.4.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*  from NetBSD: dinode.h,v 1.25 2016/01/22 23:06:10 dholland Exp  */
 /*  from NetBSD: dir.h,v 1.25 2015/09/01 06:16:03 dholland Exp  */
@@ -355,19 +355,22 @@ struct lfs_dirheader32 {
 	uint8_t  dh_type; 		/* file type, see below */
 	uint8_t  dh_namlen;		/* length of string in d_name */
 };
+__CTASSERT(sizeof(struct lfs_dirheader32) == 8);
 
 struct lfs_dirheader64 {
-	uint32_t dh_inoA;		/* inode number of entry */
-	uint32_t dh_inoB;		/* inode number of entry */
+	uint64_t dh_ino;		/* inode number of entry */
 	uint16_t dh_reclen;		/* length of this record */
 	uint8_t  dh_type; 		/* file type, see below */
 	uint8_t  dh_namlen;		/* length of string in d_name */
-};
+} __aligned(4) __packed;
+__CTASSERT(sizeof(struct lfs_dirheader64) == 12);
 
 union lfs_dirheader {
 	struct lfs_dirheader64 u_64;
 	struct lfs_dirheader32 u_32;
 };
+__CTASSERT(__alignof(union lfs_dirheader) == __alignof(struct lfs_dirheader64));
+__CTASSERT(__alignof(union lfs_dirheader) == __alignof(struct lfs_dirheader32));
 
 typedef union lfs_dirheader LFS_DIRHEADER;
 
@@ -381,6 +384,7 @@ struct lfs_dirtemplate32 {
 	struct lfs_dirheader32	dotdot_header;
 	char			dotdot_name[4];	/* ditto */
 };
+__CTASSERT(sizeof(struct lfs_dirtemplate32) == 2*(8 + 4));
 
 struct lfs_dirtemplate64 {
 	struct lfs_dirheader64	dot_header;
@@ -388,6 +392,7 @@ struct lfs_dirtemplate64 {
 	struct lfs_dirheader64	dotdot_header;
 	char			dotdot_name[4];	/* ditto */
 };
+__CTASSERT(sizeof(struct lfs_dirtemplate64) == 2*(12 + 4));
 
 union lfs_dirtemplate {
 	struct lfs_dirtemplate64 u_64;
@@ -408,6 +413,7 @@ struct lfs_odirtemplate {
 	uint16_t	dotdot_namlen;
 	char		dotdot_name[4];	/* ditto */
 };
+__CTASSERT(sizeof(struct lfs_odirtemplate) == 2*(8 + 4));
 #endif
 
 /*
@@ -441,6 +447,7 @@ struct lfs32_dinode {
 	uint32_t	di_gid;		/* 116: File group. */
 	uint64_t	di_modrev;	/* 120: i_modrev for NFSv4 */
 };
+__CTASSERT(sizeof(struct lfs32_dinode) == 128);
 
 struct lfs64_dinode {
 	uint16_t	di_mode;	/*   0: IFMT, permissions; see below. */
@@ -469,11 +476,14 @@ struct lfs64_dinode {
 	uint64_t	di_inumber;	/* 240: Inode number */
 	uint64_t	di_spare[1];	/* 248: Reserved; currently unused */
 };
+__CTASSERT(sizeof(struct lfs64_dinode) == 256);
 
 union lfs_dinode {
 	struct lfs64_dinode u_64;
 	struct lfs32_dinode u_32;
 };
+__CTASSERT(__alignof(union lfs_dinode) == __alignof(struct lfs64_dinode));
+__CTASSERT(__alignof(union lfs_dinode) == __alignof(struct lfs32_dinode));
 
 /*
  * The di_db fields may be overlaid with other information for
@@ -529,6 +539,7 @@ struct segusage {
 	uint32_t su_flags;		/* 12: segment flags */
 	uint64_t su_lastmod;		/* 16: last modified timestamp */
 };
+__CTASSERT(sizeof(struct segusage) == 24);
 
 typedef struct segusage_v1 SEGUSE_V1;
 struct segusage_v1 {
@@ -538,6 +549,7 @@ struct segusage_v1 {
 	uint16_t su_ninos;		/* 10: number of inode blocks in seg */
 	uint32_t su_flags;		/* 12: segment flags  */
 };
+__CTASSERT(sizeof(struct segusage_v1) == 16);
 
 /*
  * On-disk file information.  One per file with data blocks in the segment.
@@ -554,7 +566,8 @@ struct finfo64 {
 	uint64_t fi_ino;		/* inode number */
 	uint32_t fi_lastlength;		/* length of last block in array */
 	uint32_t fi_pad;		/* unused */
-};
+} __aligned(4) __packed;
+__CTASSERT(sizeof(struct finfo64) == 24);
 
 typedef struct finfo32 FINFO32;
 struct finfo32 {
@@ -563,11 +576,14 @@ struct finfo32 {
 	uint32_t fi_ino;		/* inode number */
 	uint32_t fi_lastlength;		/* length of last block in array */
 };
+__CTASSERT(sizeof(struct finfo32) == 16);
 
 typedef union finfo {
 	struct finfo64 u_64;
 	struct finfo32 u_32;
 } FINFO;
+__CTASSERT(__alignof(union finfo) == __alignof(struct finfo64));
+__CTASSERT(__alignof(union finfo) == __alignof(struct finfo32));
 
 /*
  * inode info (part of the segment summary)
@@ -579,16 +595,20 @@ typedef union finfo {
 
 typedef struct iinfo64 {
 	uint64_t ii_block;		/* block number */
-} IINFO64;
+} __aligned(4) __packed IINFO64;
+__CTASSERT(sizeof(struct iinfo64) == 8);
 
 typedef struct iinfo32 {
 	uint32_t ii_block;		/* block number */
 } IINFO32;
+__CTASSERT(sizeof(struct iinfo32) == 4);
 
 typedef union iinfo {
 	struct iinfo64 u_64;
 	struct iinfo32 u_32;
 } IINFO;
+__CTASSERT(__alignof(union iinfo) == __alignof(struct iinfo64));
+__CTASSERT(__alignof(union iinfo) == __alignof(struct iinfo32));
 
 /*
  * Index file inode entries.
@@ -596,8 +616,9 @@ typedef union iinfo {
 
 /* magic value for daddrs */
 #define	LFS_UNUSED_DADDR	0	/* out-of-band daddr */
-/* magic value for if_nextfree */
-#define LFS_ORPHAN_NEXTFREE	(~(uint32_t)0) /* indicate orphaned file */
+/* magic value for if_nextfree -- indicate orphaned file */
+#define LFS_ORPHAN_NEXTFREE(fs) \
+	((fs)->lfs_is64 ? ~(uint64_t)0 : ~(uint32_t)0)
 
 typedef struct ifile64 IFILE64;
 struct ifile64 {
@@ -606,7 +627,8 @@ struct ifile64 {
 	uint64_t if_atime_sec;		/* Last access time, seconds */
 	int64_t	  if_daddr;		/* inode disk address */
 	uint64_t if_nextfree;		/* next-unallocated inode */
-};
+} __aligned(4) __packed;
+__CTASSERT(sizeof(struct ifile64) == 32);
 
 typedef struct ifile32 IFILE32;
 struct ifile32 {
@@ -616,6 +638,7 @@ struct ifile32 {
 	uint32_t if_atime_sec;		/* Last access time, seconds */
 	uint32_t if_atime_nsec;		/* and nanoseconds */
 };
+__CTASSERT(sizeof(struct ifile32) == 20);
 
 typedef struct ifile_v1 IFILE_V1;
 struct ifile_v1 {
@@ -627,6 +650,7 @@ struct ifile_v1 {
 	struct timespec if_atime;	/* Last access time */
 #endif
 };
+__CTASSERT(sizeof(struct ifile_v1) == 12);
 
 /*
  * Note: struct ifile_v1 is often handled by accessing the first three
@@ -638,6 +662,9 @@ typedef union ifile {
 	struct ifile32 u_32;
 	struct ifile_v1 u_v1;
 } IFILE;
+__CTASSERT(__alignof(union ifile) == __alignof(struct ifile64));
+__CTASSERT(__alignof(union ifile) == __alignof(struct ifile32));
+__CTASSERT(__alignof(union ifile) == __alignof(struct ifile_v1));
 
 /*
  * Cleaner information structure.  This resides in the ifile and is used
@@ -656,6 +683,7 @@ typedef struct _cleanerinfo32 {
 	uint32_t free_tail;		/* 20: tail of the inode free list */
 	uint32_t flags;			/* 24: status word from the kernel */
 } CLEANERINFO32;
+__CTASSERT(sizeof(struct _cleanerinfo32) == 28);
 
 typedef struct _cleanerinfo64 {
 	uint32_t clean;			/* 0: number of clean segments */
@@ -666,13 +694,16 @@ typedef struct _cleanerinfo64 {
 	uint64_t free_tail;		/* 32: tail of the inode free list */
 	uint32_t flags;			/* 40: status word from the kernel */
 	uint32_t pad;			/* 44: must be 64-bit aligned */
-} CLEANERINFO64;
+} __aligned(4) __packed CLEANERINFO64;
+__CTASSERT(sizeof(struct _cleanerinfo64) == 48);
 
 /* this must not go to disk directly of course */
 typedef union _cleanerinfo {
 	CLEANERINFO32 u_32;
 	CLEANERINFO64 u_64;
 } CLEANERINFO;
+__CTASSERT(__alignof(union _cleanerinfo) == __alignof(struct _cleanerinfo32));
+__CTASSERT(__alignof(union _cleanerinfo) == __alignof(struct _cleanerinfo64));
 
 /*
  * On-disk segment summary information
@@ -704,6 +735,7 @@ struct segsum_v1 {
 	uint16_t ss_pad;		/* 26: extra space */
 	/* FINFO's and inode daddr's... */
 };
+__CTASSERT(sizeof(struct segsum_v1) == 28);
 
 typedef struct segsum32 SEGSUM32;
 struct segsum32 {
@@ -720,7 +752,8 @@ struct segsum32 {
 	uint64_t ss_serial;		/* 32: serial number */
 	uint64_t ss_create;		/* 40: time stamp */
 	/* FINFO's and inode daddr's... */
-};
+} __aligned(4) __packed;
+__CTASSERT(sizeof(struct segsum32) == 48);
 
 typedef struct segsum64 SEGSUM64;
 struct segsum64 {
@@ -737,7 +770,8 @@ struct segsum64 {
 	uint64_t ss_serial;		/* 40: serial number */
 	uint64_t ss_create;		/* 48: time stamp */
 	/* FINFO's and inode daddr's... */
-};
+} __aligned(4) __packed;
+__CTASSERT(sizeof(struct segsum64) == 56);
 
 typedef union segsum SEGSUM;
 union segsum {
@@ -745,7 +779,9 @@ union segsum {
 	struct segsum32 u_32;
 	struct segsum_v1 u_v1;
 };
-
+__CTASSERT(__alignof(union segsum) == __alignof(struct segsum64));
+__CTASSERT(__alignof(union segsum) == __alignof(struct segsum32));
+__CTASSERT(__alignof(union segsum) == __alignof(struct segsum_v1));
 
 /*
  * On-disk super block.
@@ -934,6 +970,8 @@ struct dlfs64 {
 	uint32_t dlfs_cksum;	  /* 508: checksum for superblock checking */
 };
 
+__CTASSERT(__alignof(struct dlfs) == __alignof(struct dlfs64));
+
 /* Type used for the inode bitmap */
 typedef uint32_t lfs_bm_t;
 

Index: src/sys/ufs/lfs/lfs_accessors.h
diff -u src/sys/ufs/lfs/lfs_accessors.h:1.48 src/sys/ufs/lfs/lfs_accessors.h:1.48.12.1
--- src/sys/ufs/lfs/lfs_accessors.h:1.48	Sat Jun 10 05:29:36 2017
+++ src/sys/ufs/lfs/lfs_accessors.h	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_accessors.h,v 1.48 2017/06/10 05:29:36 maya Exp $	*/
+/*	$NetBSD: lfs_accessors.h,v 1.48.12.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*  from NetBSD: lfs.h,v 1.165 2015/07/24 06:59:32 dholland Exp  */
 /*  from NetBSD: dinode.h,v 1.25 2016/01/22 23:06:10 dholland Exp  */
@@ -274,17 +274,7 @@ static __inline uint64_t
 lfs_dir_getino(const STRUCT_LFS *fs, const LFS_DIRHEADER *dh)
 {
 	if (fs->lfs_is64) {
-		uint64_t ino;
-
-		/*
-		 * XXX we can probably write this in a way that's both
-		 * still legal and generates better code.
-		 */
-		memcpy(&ino, &dh->u_64.dh_inoA, sizeof(dh->u_64.dh_inoA));
-		memcpy((char *)&ino + sizeof(dh->u_64.dh_inoA),
-		       &dh->u_64.dh_inoB,
-		       sizeof(dh->u_64.dh_inoB));
-		return LFS_SWAP_uint64_t(fs, ino);
+		return LFS_SWAP_uint64_t(fs, dh->u_64.dh_ino);
 	} else {
 		return LFS_SWAP_uint32_t(fs, dh->u_32.dh_ino);
 	}
@@ -331,16 +321,7 @@ static __inline void
 lfs_dir_setino(STRUCT_LFS *fs, LFS_DIRHEADER *dh, uint64_t ino)
 {
 	if (fs->lfs_is64) {
-
-		ino = LFS_SWAP_uint64_t(fs, ino);
-		/*
-		 * XXX we can probably write this in a way that's both
-		 * still legal and generates better code.
-		 */
-		memcpy(&dh->u_64.dh_inoA, &ino, sizeof(dh->u_64.dh_inoA));
-		memcpy(&dh->u_64.dh_inoB,
-		       (char *)&ino + sizeof(dh->u_64.dh_inoA),
-		       sizeof(dh->u_64.dh_inoB));
+		dh->u_64.dh_ino = LFS_SWAP_uint64_t(fs, ino);
 	} else {
 		dh->u_32.dh_ino = LFS_SWAP_uint32_t(fs, ino);
 	}

Index: src/sys/ufs/lfs/lfs_alloc.c
diff -u src/sys/ufs/lfs/lfs_alloc.c:1.137 src/sys/ufs/lfs/lfs_alloc.c:1.137.8.1
--- src/sys/ufs/lfs/lfs_alloc.c:1.137	Sat Aug 19 11:27:42 2017
+++ src/sys/ufs/lfs/lfs_alloc.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_alloc.c,v 1.137 2017/08/19 11:27:42 maya Exp $	*/
+/*	$NetBSD: lfs_alloc.c,v 1.137.8.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003, 2007 The NetBSD Foundation, Inc.
@@ -60,7 +60,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_alloc.c,v 1.137 2017/08/19 11:27:42 maya Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_alloc.c,v 1.137.8.1 2020/08/17 10:30:22 martin Exp $");
 
 #if defined(_KERNEL_OPT)
 #include "opt_quota.h"
@@ -705,16 +705,16 @@ lfs_vfree(struct vnode *vp, ino_t ino, i
  * Takes the segmenet lock.
  */
 void
-lfs_order_freelist(struct lfs *fs)
+lfs_order_freelist(struct lfs *fs, ino_t **orphanp, size_t *norphanp)
 {
 	CLEANERINFO *cip;
 	IFILE *ifp = NULL;
 	struct buf *bp;
 	ino_t ino, firstino, lastino, maxino;
-#ifdef notyet
-	struct vnode *vp;
-#endif
-	
+	ino_t *orphan = NULL;
+	size_t norphan = 0;
+	size_t norphan_alloc = 0;
+
 	ASSERT_NO_SEGLOCK(fs);
 	lfs_seglock(fs, SEGM_PROT);
 
@@ -745,7 +745,6 @@ lfs_order_freelist(struct lfs *fs)
 		if (ino == LFS_UNUSED_INUM || ino == LFS_IFILE_INUM)
 			continue;
 
-#ifdef notyet
 		/*
 		 * Address orphaned files.
 		 *
@@ -757,39 +756,26 @@ lfs_order_freelist(struct lfs *fs)
 		 * but presumably it doesn't work... not sure what
 		 * happens to such files currently. -- dholland 20160806
 		 */
-		if (lfs_if_getnextfree(fs, ifp) == LFS_ORPHAN_NEXTFREE &&
-		    VFS_VGET(fs->lfs_ivnode->v_mount, ino, &vp) == 0) {
-			unsigned segno;
-
-			/* get the segment the inode in on disk  */
-			segno = lfs_dtosn(fs, lfs_if_getdaddr(fs, ifp));
-
-			/* truncate the inode */
-			lfs_truncate(vp, 0, 0, NOCRED);
-			vput(vp);
-
-			/* load the segment summary */
-			LFS_SEGENTRY(sup, fs, segno, bp);
-			/* update the number of bytes in the segment */
-			KASSERT(sup->su_nbytes >= DINOSIZE(fs));
-			sup->su_nbytes -= DINOSIZE(fs);
-			/* write the segment summary */
-			LFS_WRITESEGENTRY(sup, fs, segno, bp);
-
-			/* Drop the on-disk address */
-			lfs_if_setdaddr(fs, ifp, LFS_UNUSED_DADDR);
-			/* write the ifile entry */
-			LFS_BWRITE_LOG(bp);
-
-			/*
-			 * and reload it (XXX: why? I guess
-			 * LFS_BWRITE_LOG drops it...)
-			 */
-			LFS_IENTRY(ifp, fs, ino, bp);
-
-			/* Fall through to next if block */
+		if (lfs_if_getnextfree(fs, ifp) == LFS_ORPHAN_NEXTFREE(fs)) {
+			if (orphan == NULL) {
+				norphan_alloc = 32; /* XXX pulled from arse */
+				orphan = kmem_zalloc(sizeof(orphan[0]) *
+				    norphan_alloc, KM_SLEEP);
+			} else if (norphan == norphan_alloc) {
+				ino_t *orphan_new;
+				if (norphan_alloc >= 4096)
+					norphan_alloc += 4096;
+				else
+					norphan_alloc *= 2;
+				orphan_new = kmem_zalloc(sizeof(orphan[0]) *
+				    norphan_alloc, KM_SLEEP);
+				memcpy(orphan_new, orphan, sizeof(orphan[0]) *
+				    norphan);
+				kmem_free(orphan, sizeof(orphan[0]) * norphan);
+				orphan = orphan_new;
+			}
+			orphan[norphan++] = ino;
 		}
-#endif
 
 		if (lfs_if_getdaddr(fs, ifp) == LFS_UNUSED_DADDR) {
 
@@ -836,6 +822,22 @@ lfs_order_freelist(struct lfs *fs)
 
 	/* done */
 	lfs_segunlock(fs);
+
+	/*
+	 * Shrink the array of orphans so we don't have to carry around
+	 * the allocation size.
+	 */
+	if (norphan < norphan_alloc) {
+		ino_t *orphan_new = kmem_alloc(sizeof(orphan[0]) * norphan,
+		    KM_SLEEP);
+		memcpy(orphan_new, orphan, sizeof(orphan[0]) * norphan);
+		kmem_free(orphan, sizeof(orphan[0]) * norphan_alloc);
+		orphan = orphan_new;
+		norphan_alloc = norphan;
+	}
+
+	*orphanp = orphan;
+	*norphanp = norphan;
 }
 
 /*
@@ -851,6 +853,84 @@ lfs_orphan(struct lfs *fs, ino_t ino)
 	struct buf *bp;
 
 	LFS_IENTRY(ifp, fs, ino, bp);
-	lfs_if_setnextfree(fs, ifp, LFS_ORPHAN_NEXTFREE);
+	lfs_if_setnextfree(fs, ifp, LFS_ORPHAN_NEXTFREE(fs));
 	LFS_BWRITE_LOG(bp);
 }
+
+/*
+ * Free orphans discovered during mount.  This is a separate stage
+ * because it requires fs->lfs_suflags to be set up, which is not done
+ * by the time we run lfs_order_freelist.  It's possible that we could
+ * run lfs_order_freelist later (i.e., set up fs->lfs_suflags sooner)
+ * but that requires more thought than I can put into this at the
+ * moment.
+ */
+void
+lfs_free_orphans(struct lfs *fs, ino_t *orphan, size_t norphan)
+{
+	size_t i;
+
+	for (i = 0; i < norphan; i++) {
+		ino_t ino = orphan[i];
+		unsigned segno;
+		struct vnode *vp;
+		struct inode *ip;
+		struct buf *bp;
+		IFILE *ifp;
+		SEGUSE *sup;
+		int error;
+
+		/* Get the segment the inode is in on disk.  */
+		LFS_IENTRY(ifp, fs, ino, bp);
+		segno = lfs_dtosn(fs, lfs_if_getdaddr(fs, ifp));
+		brelse(bp, 0);
+
+		/*
+		 * Try to get the vnode.  If we can't, tough -- hope
+		 * you have backups!
+		 */
+		error = VFS_VGET(fs->lfs_ivnode->v_mount, ino, &vp);
+		if (error) {
+			printf("orphan %jd vget error %d\n", (intmax_t)ino,
+			    error);
+			continue;
+		}
+
+		/*
+		 * Sanity-check the inode.
+		 *
+		 * XXX What to do if it is still referenced?
+		 */
+		ip = VTOI(vp);
+		if (ip->i_nlink != 0)
+			printf("orphan %jd nlink %d\n", (intmax_t)ino,
+			    ip->i_nlink);
+
+		/*
+		 * Truncate the inode, to free any blocks allocated for
+		 * it, and release it, to free the inode number.
+		 *
+		 * XXX Isn't it redundant to truncate?  Won't vput do
+		 * that for us?
+		 */
+		error = lfs_truncate(vp, 0, 0, NOCRED);
+		if (error)
+			printf("orphan %jd truncate error %d", (intmax_t)ino,
+			    error);
+		vput(vp);
+
+		/* Update the number of bytes in the segment summary.  */
+		LFS_SEGENTRY(sup, fs, segno, bp);
+		KASSERT(sup->su_nbytes >= DINOSIZE(fs));
+		sup->su_nbytes -= DINOSIZE(fs);
+		LFS_WRITESEGENTRY(sup, fs, segno, bp);
+
+		/* Drop the on-disk address.  */
+		LFS_IENTRY(ifp, fs, ino, bp);
+		lfs_if_setdaddr(fs, ifp, LFS_UNUSED_DADDR);
+		LFS_BWRITE_LOG(bp);
+	}
+
+	if (orphan)
+		kmem_free(orphan, sizeof(orphan[0]) * norphan);
+}

Index: src/sys/ufs/lfs/lfs_balloc.c
diff -u src/sys/ufs/lfs/lfs_balloc.c:1.94 src/sys/ufs/lfs/lfs_balloc.c:1.94.10.1
--- src/sys/ufs/lfs/lfs_balloc.c:1.94	Sat Jun 10 05:29:36 2017
+++ src/sys/ufs/lfs/lfs_balloc.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_balloc.c,v 1.94 2017/06/10 05:29:36 maya Exp $	*/
+/*	$NetBSD: lfs_balloc.c,v 1.94.10.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -60,7 +60,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_balloc.c,v 1.94 2017/06/10 05:29:36 maya Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_balloc.c,v 1.94.10.1 2020/08/17 10:30:22 martin Exp $");
 
 #if defined(_KERNEL_OPT)
 #include "opt_quota.h"
@@ -660,9 +660,10 @@ lfs_register_block(struct vnode *vp, dad
 static void
 lfs_do_deregister(struct lfs *fs, struct inode *ip, struct lbnentry *lbp)
 {
+
+	KASSERT(mutex_owned(&lfs_lock));
 	ASSERT_MAYBE_SEGLOCK(fs);
 
-	mutex_enter(&lfs_lock);
 	--ip->i_lfs_nbtree;
 	SPLAY_REMOVE(lfs_splay, &ip->i_lfs_lbtree, lbp);
 	if (fs->lfs_favail > lfs_btofsb(fs, (1 << lfs_sb_getbshift(fs))))
@@ -671,9 +672,12 @@ lfs_do_deregister(struct lfs *fs, struct
 	if (locked_fakequeue_count > 0)
 		--locked_fakequeue_count;
 	lfs_subsys_pages -= lfs_sb_getbsize(fs) >> PAGE_SHIFT;
-	mutex_exit(&lfs_lock);
 
+	mutex_exit(&lfs_lock);
 	pool_put(&lfs_lbnentry_pool, lbp);
+	mutex_enter(&lfs_lock);
+
+	KASSERT(mutex_owned(&lfs_lock));
 }
 
 void
@@ -690,19 +694,18 @@ lfs_deregister_block(struct vnode *vp, d
 	if (lbn < 0 || vp->v_type != VREG || ip->i_number == LFS_IFILE_INUM)
 		return;
 
+	mutex_enter(&lfs_lock);
 	fs = ip->i_lfs;
 	tmp.lbn = lbn;
-	lbp = SPLAY_FIND(lfs_splay, &ip->i_lfs_lbtree, &tmp);
-	if (lbp == NULL)
-		return;
-
-	lfs_do_deregister(fs, ip, lbp);
+	if ((lbp = SPLAY_FIND(lfs_splay, &ip->i_lfs_lbtree, &tmp)) != NULL)
+		lfs_do_deregister(fs, ip, lbp);
+	mutex_exit(&lfs_lock);
 }
 
 void
 lfs_deregister_all(struct vnode *vp)
 {
-	struct lbnentry *lbp, *nlbp;
+	struct lbnentry *lbp;
 	struct lfs_splay *hd;
 	struct lfs *fs;
 	struct inode *ip;
@@ -711,8 +714,8 @@ lfs_deregister_all(struct vnode *vp)
 	fs = ip->i_lfs;
 	hd = &ip->i_lfs_lbtree;
 
-	for (lbp = SPLAY_MIN(lfs_splay, hd); lbp != NULL; lbp = nlbp) {
-		nlbp = SPLAY_NEXT(lfs_splay, hd, lbp);
+	mutex_enter(&lfs_lock);
+	while ((lbp = SPLAY_MIN(lfs_splay, hd)) != NULL)
 		lfs_do_deregister(fs, ip, lbp);
-	}
+	mutex_exit(&lfs_lock);
 }

Index: src/sys/ufs/lfs/lfs_bio.c
diff -u src/sys/ufs/lfs/lfs_bio.c:1.142 src/sys/ufs/lfs/lfs_bio.c:1.142.6.1
--- src/sys/ufs/lfs/lfs_bio.c:1.142	Sat Jun  9 18:48:31 2018
+++ src/sys/ufs/lfs/lfs_bio.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_bio.c,v 1.142 2018/06/09 18:48:31 zafer Exp $	*/
+/*	$NetBSD: lfs_bio.c,v 1.142.6.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003, 2008 The NetBSD Foundation, Inc.
@@ -60,7 +60,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_bio.c,v 1.142 2018/06/09 18:48:31 zafer Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_bio.c,v 1.142.6.1 2020/08/17 10:30:22 martin Exp $");
 
 #include <sys/param.h>
 #include <sys/systm.h>
@@ -653,9 +653,14 @@ lfs_check(struct vnode *vp, daddr_t blkn
 	/* If there are too many pending dirops, we have to flush them. */
 	if (fs->lfs_dirvcount > LFS_MAX_FSDIROP(fs) ||
 	    lfs_dirvcount > LFS_MAX_DIROP || fs->lfs_diropwait > 0) {
+		KASSERT(fs->lfs_dirops == 0);
+		fs->lfs_writer++;
 		mutex_exit(&lfs_lock);
 		lfs_flush_dirops(fs);
 		mutex_enter(&lfs_lock);
+		if (--fs->lfs_writer == 0)
+			cv_broadcast(&fs->lfs_diropscv);
+		KASSERT(fs->lfs_dirops == 0);
 	} else if (locked_queue_count + INOCOUNT(fs) > LFS_MAX_BUFS ||
 	    locked_queue_bytes + INOBYTES(fs) > LFS_MAX_BYTES ||
 	    lfs_subsys_pages > LFS_MAX_PAGES ||
@@ -730,7 +735,7 @@ lfs_newbuf(struct lfs *fs, struct vnode 
 	bp->b_error = 0;
 	bp->b_resid = 0;
 	bp->b_iodone = lfs_callback;
-	bp->b_cflags = BC_BUSY | BC_NOCACHE;
+	bp->b_cflags |= BC_BUSY | BC_NOCACHE;
 	bp->b_private = fs;
 
 	mutex_enter(&bufcache_lock);

Index: src/sys/ufs/lfs/lfs_debug.c
diff -u src/sys/ufs/lfs/lfs_debug.c:1.54 src/sys/ufs/lfs/lfs_debug.c:1.54.22.1
--- src/sys/ufs/lfs/lfs_debug.c:1.54	Tue Sep  1 06:12:04 2015
+++ src/sys/ufs/lfs/lfs_debug.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_debug.c,v 1.54 2015/09/01 06:12:04 dholland Exp $	*/
+/*	$NetBSD: lfs_debug.c,v 1.54.22.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -60,7 +60,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_debug.c,v 1.54 2015/09/01 06:12:04 dholland Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_debug.c,v 1.54.22.1 2020/08/17 10:30:22 martin Exp $");
 
 #ifdef DEBUG
 
@@ -84,16 +84,12 @@ struct lfs_log_entry lfs_log[LFS_LOGLENG
 int
 lfs_bwrite_log(struct buf *bp, const char *file, int line)
 {
-	struct vop_bwrite_args a;
-
-	a.a_desc = VDESC(vop_bwrite);
-	a.a_bp = bp;
 
 	if (!(bp->b_flags & B_GATHERED) && !(bp->b_oflags & BO_DELWRI)) {
 		LFS_ENTER_LOG("write", file, line, bp->b_lblkno, bp->b_flags,
 			curproc->p_pid);
 	}
-	return (VCALL(bp->b_vp, VOFFSET(vop_bwrite), &a));
+	return VOP_BWRITE(bp->b_vp, bp);
 }
 
 void

Index: src/sys/ufs/lfs/lfs_extern.h
diff -u src/sys/ufs/lfs/lfs_extern.h:1.114 src/sys/ufs/lfs/lfs_extern.h:1.114.4.1
--- src/sys/ufs/lfs/lfs_extern.h:1.114	Wed Aug 22 01:05:24 2018
+++ src/sys/ufs/lfs/lfs_extern.h	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_extern.h,v 1.114 2018/08/22 01:05:24 msaitoh Exp $	*/
+/*	$NetBSD: lfs_extern.h,v 1.114.4.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -127,9 +127,10 @@ extern kcondvar_t locked_queue_cv;
 int lfs_valloc(struct vnode *, int, kauth_cred_t, ino_t *, int *);
 int lfs_valloc_fixed(struct lfs *, ino_t, int);
 int lfs_vfree(struct vnode *, ino_t, int);
-void lfs_order_freelist(struct lfs *);
+void lfs_order_freelist(struct lfs *, ino_t **, size_t *);
 int lfs_extend_ifile(struct lfs *, kauth_cred_t);
 void lfs_orphan(struct lfs *, ino_t);
+void lfs_free_orphans(struct lfs *, ino_t *, size_t);
 
 /* lfs_balloc.c */
 int lfs_balloc(struct vnode *, off_t, int, kauth_cred_t, int, struct buf **);
@@ -210,7 +211,8 @@ void lfs_free(struct lfs *, void *, int)
 int lfs_seglock(struct lfs *, unsigned long);
 void lfs_segunlock(struct lfs *);
 void lfs_segunlock_relock(struct lfs *);
-int lfs_writer_enter(struct lfs *, const char *);
+void lfs_writer_enter(struct lfs *, const char *);
+int lfs_writer_tryenter(struct lfs *);
 void lfs_writer_leave(struct lfs *);
 void lfs_wakeup_cleaner(struct lfs *);
 

Index: src/sys/ufs/lfs/lfs_inode.c
diff -u src/sys/ufs/lfs/lfs_inode.c:1.157 src/sys/ufs/lfs/lfs_inode.c:1.157.10.1
--- src/sys/ufs/lfs/lfs_inode.c:1.157	Sat Jun 10 05:29:36 2017
+++ src/sys/ufs/lfs/lfs_inode.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_inode.c,v 1.157 2017/06/10 05:29:36 maya Exp $	*/
+/*	$NetBSD: lfs_inode.c,v 1.157.10.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -60,7 +60,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_inode.c,v 1.157 2017/06/10 05:29:36 maya Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_inode.c,v 1.157.10.1 2020/08/17 10:30:22 martin Exp $");
 
 #if defined(_KERNEL_OPT)
 #include "opt_quota.h"
@@ -133,6 +133,7 @@ lfs_update(struct vnode *vp, const struc
 	struct inode *ip;
 	struct lfs *fs = VFSTOULFS(vp->v_mount)->um_lfs;
 	int flags;
+	int error;
 
 	ASSERT_NO_SEGLOCK(fs);
 	if (vp->v_mount->mnt_flag & MNT_RDONLY)
@@ -175,7 +176,7 @@ lfs_update(struct vnode *vp, const struc
 			      vp->v_iflag | vp->v_vflag | vp->v_uflag,
 			      ip->i_state));
 			if (fs->lfs_dirops == 0)
-				lfs_flush_fs(fs, SEGM_SYNC);
+				break;
 			else
 				mtsleep(&fs->lfs_writer, PRIBIO+1, "lfs_fsync",
 					0, &lfs_lock);
@@ -183,8 +184,18 @@ lfs_update(struct vnode *vp, const struc
 			twice? */
 		}
 		--fs->lfs_diropwait;
+		fs->lfs_writer++;
+		if (vp->v_uflag & VU_DIROP) {
+			KASSERT(fs->lfs_dirops == 0);
+			lfs_flush_fs(fs, SEGM_SYNC);
+		}
+		mutex_exit(&lfs_lock);
+		error = lfs_vflush(vp);
+		mutex_enter(&lfs_lock);
+		if (--fs->lfs_writer == 0)
+			cv_broadcast(&fs->lfs_diropscv);
 		mutex_exit(&lfs_lock);
-		return lfs_vflush(vp);
+		return error;
 	}
 	return 0;
 }

Index: src/sys/ufs/lfs/lfs_inode.h
diff -u src/sys/ufs/lfs/lfs_inode.h:1.23 src/sys/ufs/lfs/lfs_inode.h:1.23.10.1
--- src/sys/ufs/lfs/lfs_inode.h:1.23	Sat Jun 10 05:29:36 2017
+++ src/sys/ufs/lfs/lfs_inode.h	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_inode.h,v 1.23 2017/06/10 05:29:36 maya Exp $	*/
+/*	$NetBSD: lfs_inode.h,v 1.23.10.1 2020/08/17 10:30:22 martin Exp $	*/
 /*  from NetBSD: ulfs_inode.h,v 1.5 2013/06/06 00:51:50 dholland Exp  */
 /*  from NetBSD: inode.h,v 1.72 2016/06/03 15:36:03 christos Exp  */
 
@@ -123,6 +123,7 @@ struct inode {
 /* 	   unused	0x0400 */	/* was FFS-only IN_SPACECOUNTED */
 #define	IN_PAGING       0x1000		/* LFS: file is on paging queue */
 #define IN_CDIROP       0x4000          /* LFS: dirop completed pending i/o */
+#define	IN_MARKER	0x00010000	/* LFS: marker inode for iteration */
 
 /* XXX this is missing some of the flags */
 #define IN_ALLMOD (IN_MODIFIED|IN_ACCESS|IN_CHANGE|IN_UPDATE|IN_MODIFY|IN_ACCESSED|IN_CLEANING)

Index: src/sys/ufs/lfs/lfs_pages.c
diff -u src/sys/ufs/lfs/lfs_pages.c:1.15 src/sys/ufs/lfs/lfs_pages.c:1.15.8.1
--- src/sys/ufs/lfs/lfs_pages.c:1.15	Sat Aug 19 14:22:49 2017
+++ src/sys/ufs/lfs/lfs_pages.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_pages.c,v 1.15 2017/08/19 14:22:49 maya Exp $	*/
+/*	$NetBSD: lfs_pages.c,v 1.15.8.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -60,7 +60,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_pages.c,v 1.15 2017/08/19 14:22:49 maya Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_pages.c,v 1.15.8.1 2020/08/17 10:30:22 martin Exp $");
 
 #ifdef _KERNEL_OPT
 #include "opt_compat_netbsd.h"
@@ -710,29 +710,30 @@ retry:
 	    (vp->v_uflag & VU_DIROP)) {
 		DLOG((DLOG_PAGE, "lfs_putpages: flushing VU_DIROP\n"));
 
- 		lfs_writer_enter(fs, "ppdirop");
+		/*
+		 * NB: lfs_flush_fs can recursively call lfs_putpages,
+		 * but it won't reach this branch because it passes
+		 * PGO_LOCKED.
+		 */
 
-		/* Note if we hold the vnode locked */
-		if (VOP_ISLOCKED(vp) == LK_EXCLUSIVE)
-		{
-		    DLOG((DLOG_PAGE, "lfs_putpages: dirop inode already locked\n"));
-		} else {
-		    DLOG((DLOG_PAGE, "lfs_putpages: dirop inode not locked\n"));
-		}
 		mutex_exit(vp->v_interlock);
-
 		mutex_enter(&lfs_lock);
 		lfs_flush_fs(fs, sync ? SEGM_SYNC : 0);
 		mutex_exit(&lfs_lock);
-
 		mutex_enter(vp->v_interlock);
-		lfs_writer_leave(fs);
 
 		/*
 		 * The flush will have cleaned out this vnode as well,
 		 *  no need to do more to it.
 		 *  XXX then why are we falling through and continuing?
 		 */
+
+		/*
+		 * XXX State may have changed while we dropped the
+		 * lock; start over just in case.  The above comment
+		 * suggests this should maybe instead be goto out.
+		 */
+		goto retry;
 	}
 
 	/*

Index: src/sys/ufs/lfs/lfs_rename.c
diff -u src/sys/ufs/lfs/lfs_rename.c:1.22 src/sys/ufs/lfs/lfs_rename.c:1.22.10.1
--- src/sys/ufs/lfs/lfs_rename.c:1.22	Sat Jun 10 05:29:36 2017
+++ src/sys/ufs/lfs/lfs_rename.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_rename.c,v 1.22 2017/06/10 05:29:36 maya Exp $	*/
+/*	$NetBSD: lfs_rename.c,v 1.22.10.1 2020/08/17 10:30:22 martin Exp $	*/
 /*  from NetBSD: ufs_rename.c,v 1.12 2015/03/27 17:27:56 riastradh Exp  */
 
 /*-
@@ -89,7 +89,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_rename.c,v 1.22 2017/06/10 05:29:36 maya Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_rename.c,v 1.22.10.1 2020/08/17 10:30:22 martin Exp $");
 
 #include <sys/param.h>
 #include <sys/systm.h>
@@ -1061,6 +1061,9 @@ lfs_gro_rename(struct mount *mp, kauth_c
 	    fdvp, fcnp, fde, fvp,
 	    tdvp, tcnp, tde, tvp);
 
+	if (tvp && VTOI(tvp)->i_nlink == 0)
+		lfs_orphan(VTOI(tvp)->i_lfs, VTOI(tvp)->i_number);
+
 	UNMARK_VNODE(fdvp);
 	UNMARK_VNODE(fvp);
 	UNMARK_VNODE(tdvp);

Index: src/sys/ufs/lfs/lfs_segment.c
diff -u src/sys/ufs/lfs/lfs_segment.c:1.278 src/sys/ufs/lfs/lfs_segment.c:1.278.4.1
--- src/sys/ufs/lfs/lfs_segment.c:1.278	Mon Sep  3 16:29:37 2018
+++ src/sys/ufs/lfs/lfs_segment.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_segment.c,v 1.278 2018/09/03 16:29:37 riastradh Exp $	*/
+/*	$NetBSD: lfs_segment.c,v 1.278.4.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -60,7 +60,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_segment.c,v 1.278 2018/09/03 16:29:37 riastradh Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_segment.c,v 1.278.4.1 2020/08/17 10:30:22 martin Exp $");
 
 #ifdef DEBUG
 # define vndebug(vp, str) do {						\
@@ -399,7 +399,7 @@ lfs_vflush(struct vnode *vp)
 					 * still not done with this vnode.
 					 * XXX we can do better than this.
 					 */
-					KDASSERT(ip->i_number != LFS_IFILE_INUM);
+					KASSERT(ip->i_number != LFS_IFILE_INUM);
 					lfs_writeinode(fs, sp, ip);
 					mutex_enter(&lfs_lock);
 					LFS_SET_UINO(ip, IN_MODIFIED);
@@ -624,6 +624,15 @@ lfs_segwrite(struct mount *mp, int flags
 	 */
 	do_ckp = LFS_SHOULD_CHECKPOINT(fs, flags);
 
+	/*
+	 * If we know we're gonna need the writer lock, take it now to
+	 * preserve the lock order lfs_writer -> lfs_seglock.
+	 */
+	if (do_ckp) {
+		lfs_writer_enter(fs, "ckpwriter");
+		writer_set = 1;
+	}
+
 	/* We can't do a partial write and checkpoint at the same time. */
 	if (do_ckp)
 		flags &= ~SEGM_SINGLE;
@@ -653,11 +662,10 @@ lfs_segwrite(struct mount *mp, int flags
 				break;
 			}
 
-			if (do_ckp || fs->lfs_dirops == 0) {
-				if (!writer_set) {
-					lfs_writer_enter(fs, "lfs writer");
-					writer_set = 1;
-				}
+			if (do_ckp ||
+			    (writer_set = lfs_writer_tryenter(fs)) != 0) {
+				KASSERT(writer_set);
+				KASSERT(fs->lfs_writer);
 				error = lfs_writevnodes(fs, mp, sp, VN_DIROP);
 				if (um_error == 0)
 					um_error = error;

Index: src/sys/ufs/lfs/lfs_subr.c
diff -u src/sys/ufs/lfs/lfs_subr.c:1.97 src/sys/ufs/lfs/lfs_subr.c:1.97.8.1
--- src/sys/ufs/lfs/lfs_subr.c:1.97	Wed Jul 26 16:42:37 2017
+++ src/sys/ufs/lfs/lfs_subr.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_subr.c,v 1.97 2017/07/26 16:42:37 maya Exp $	*/
+/*	$NetBSD: lfs_subr.c,v 1.97.8.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -60,7 +60,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_subr.c,v 1.97 2017/07/26 16:42:37 maya Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_subr.c,v 1.97.8.1 2020/08/17 10:30:22 martin Exp $");
 
 #include <sys/param.h>
 #include <sys/systm.h>
@@ -340,7 +340,7 @@ static void lfs_unmark_dirop(struct lfs 
 static void
 lfs_unmark_dirop(struct lfs *fs)
 {
-	struct inode *ip, *nip;
+	struct inode *ip, *marker;
 	struct vnode *vp;
 	int doit;
 
@@ -349,13 +349,26 @@ lfs_unmark_dirop(struct lfs *fs)
 	doit = !(fs->lfs_flags & LFS_UNDIROP);
 	if (doit)
 		fs->lfs_flags |= LFS_UNDIROP;
-	if (!doit) {
-		mutex_exit(&lfs_lock);
+	mutex_exit(&lfs_lock);
+
+	if (!doit)
 		return;
-	}
 
-	for (ip = TAILQ_FIRST(&fs->lfs_dchainhd); ip != NULL; ip = nip) {
-		nip = TAILQ_NEXT(ip, i_lfs_dchain);
+	marker = pool_get(&lfs_inode_pool, PR_WAITOK);
+	KASSERT(fs != NULL);
+	memset(marker, 0, sizeof(*marker));
+	marker->inode_ext.lfs = pool_get(&lfs_inoext_pool, PR_WAITOK);
+	memset(marker->inode_ext.lfs, 0, sizeof(*marker->inode_ext.lfs));
+	marker->i_state |= IN_MARKER;
+
+	mutex_enter(&lfs_lock);
+	TAILQ_INSERT_HEAD(&fs->lfs_dchainhd, marker, i_lfs_dchain);
+	while ((ip = TAILQ_NEXT(marker, i_lfs_dchain)) != NULL) {
+		TAILQ_REMOVE(&fs->lfs_dchainhd, marker, i_lfs_dchain);
+		TAILQ_INSERT_AFTER(&fs->lfs_dchainhd, ip, marker,
+		    i_lfs_dchain);
+		if (ip->i_state & IN_MARKER)
+			continue;
 		vp = ITOV(ip);
 		if ((ip->i_state & (IN_ADIROP | IN_CDIROP)) == IN_CDIROP) {
 			--lfs_dirvcount;
@@ -371,10 +384,13 @@ lfs_unmark_dirop(struct lfs *fs)
 			ip->i_state &= ~IN_CDIROP;
 		}
 	}
-
+	TAILQ_REMOVE(&fs->lfs_dchainhd, marker, i_lfs_dchain);
 	fs->lfs_flags &= ~LFS_UNDIROP;
 	wakeup(&fs->lfs_flags);
 	mutex_exit(&lfs_lock);
+
+	pool_put(&lfs_inoext_pool, marker->inode_ext.lfs);
+	pool_put(&lfs_inode_pool, marker);
 }
 
 static void
@@ -539,6 +555,7 @@ lfs_segunlock(struct lfs *fs)
 			lfs_unmark_dirop(fs);
 	} else {
 		--fs->lfs_seglock;
+		KASSERT(fs->lfs_seglock != 0);
 		mutex_exit(&lfs_lock);
 	}
 }
@@ -548,12 +565,12 @@ lfs_segunlock(struct lfs *fs)
  *
  * No simple_locks are held when we enter and none are held when we return.
  */
-int
+void
 lfs_writer_enter(struct lfs *fs, const char *wmesg)
 {
-	int error = 0;
+	int error __diagused;
 
-	ASSERT_MAYBE_SEGLOCK(fs);
+	ASSERT_NO_SEGLOCK(fs);
 	mutex_enter(&lfs_lock);
 
 	/* disallow dirops during flush */
@@ -563,15 +580,26 @@ lfs_writer_enter(struct lfs *fs, const c
 		++fs->lfs_diropwait;
 		error = mtsleep(&fs->lfs_writer, PRIBIO+1, wmesg, 0,
 				&lfs_lock);
+		KASSERT(error == 0);
 		--fs->lfs_diropwait;
 	}
 
-	if (error)
-		fs->lfs_writer--;
+	mutex_exit(&lfs_lock);
+}
 
+int
+lfs_writer_tryenter(struct lfs *fs)
+{
+	int writer_set;
+
+	ASSERT_MAYBE_SEGLOCK(fs);
+	mutex_enter(&lfs_lock);
+	writer_set = (fs->lfs_dirops == 0);
+	if (writer_set)
+		fs->lfs_writer++;
 	mutex_exit(&lfs_lock);
 
-	return error;
+	return writer_set;
 }
 
 void

Index: src/sys/ufs/lfs/lfs_vfsops.c
diff -u src/sys/ufs/lfs/lfs_vfsops.c:1.365 src/sys/ufs/lfs/lfs_vfsops.c:1.365.2.1
--- src/sys/ufs/lfs/lfs_vfsops.c:1.365	Tue May 28 08:59:35 2019
+++ src/sys/ufs/lfs/lfs_vfsops.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_vfsops.c,v 1.365 2019/05/28 08:59:35 msaitoh Exp $	*/
+/*	$NetBSD: lfs_vfsops.c,v 1.365.2.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003, 2007, 2007
@@ -61,7 +61,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_vfsops.c,v 1.365 2019/05/28 08:59:35 msaitoh Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_vfsops.c,v 1.365.2.1 2020/08/17 10:30:22 martin Exp $");
 
 #if defined(_KERNEL_OPT)
 #include "opt_lfs.h"
@@ -120,6 +120,7 @@ MODULE(MODULE_CLASS_VFS, lfs, NULL);
 
 static int lfs_gop_write(struct vnode *, struct vm_page **, int, int);
 static int lfs_mountfs(struct vnode *, struct mount *, struct lwp *);
+static int lfs_flushfiles(struct mount *, int);
 
 static struct sysctllog *lfs_sysctl_log;
 
@@ -355,6 +356,7 @@ lfs_modcmd(modcmd_t cmd, void *arg)
 			break;
 		}
 		lfs_sysctl_setup(&lfs_sysctl_log);
+		cv_init(&lfs_allclean_wakeup, "segment");
 		break;
 	case MODULE_CMD_FINI:
 		error = vfs_detach(&lfs_vfsops);
@@ -362,6 +364,7 @@ lfs_modcmd(modcmd_t cmd, void *arg)
 			break;
 		syscall_disestablish(NULL, lfs_syscalls);
 		sysctl_teardown(&lfs_sysctl_log);
+		cv_destroy(&lfs_allclean_wakeup);
 		break;
 	default:
 		error = ENOTTY;
@@ -755,23 +758,18 @@ lfs_mount(struct mount *mp, const char *
 		ump = VFSTOULFS(mp);
 		fs = ump->um_lfs;
 
-		if (fs->lfs_ronly == 0 && (mp->mnt_flag & MNT_RDONLY)) {
+		if (!fs->lfs_ronly && (mp->mnt_iflag & IMNT_WANTRDONLY)) {
 			/*
 			 * Changing from read/write to read-only.
-			 * XXX: shouldn't we sync here? or does vfs do that?
 			 */
-#ifdef LFS_QUOTA2
-			/* XXX: quotas should remain on when readonly */
-			if (fs->lfs_use_quota2) {
-				error = lfsquota2_umount(mp, 0);
-				if (error) {
-					return error;
-				}
-			}
-#endif
-		}
-
-		if (fs->lfs_ronly && (mp->mnt_iflag & IMNT_WANTRDWR)) {
+			int flags = WRITECLOSE;
+			if (mp->mnt_flag & MNT_FORCE)
+				flags |= FORCECLOSE;
+			error = lfs_flushfiles(mp, flags);
+			if (error)
+				return error;
+			fs->lfs_ronly = 1;
+		} else if (fs->lfs_ronly && (mp->mnt_iflag & IMNT_WANTRDWR)) {
 			/*
 			 * Changing from read-only to read/write.
 			 * Note in the superblocks that we're writing.
@@ -805,8 +803,9 @@ lfs_mount(struct mount *mp, const char *
 				lfs_writesuper(fs, lfs_sb_getsboff(fs, 1));
 			}
 		}
+
 		if (args->fspec == NULL)
-			return EINVAL;
+			return 0;
 	}
 
 	error = set_statvfs_info(path, UIO_USERSPACE, args->fspec,
@@ -860,7 +859,6 @@ lfs_checkmagic(struct lfs *fs)
 int
 lfs_mountfs(struct vnode *devvp, struct mount *mp, struct lwp *l)
 {
-	static bool lfs_mounted_once = false;
 	struct lfs *primarysb, *altsb, *thesb;
 	struct buf *primarybuf, *altbuf;
 	struct lfs *fs;
@@ -872,6 +870,8 @@ lfs_mountfs(struct vnode *devvp, struct 
 	CLEANERINFO *cip;
 	SEGUSE *sup;
 	daddr_t sb_addr;
+	ino_t *orphan;
+	size_t norphan;
 
 	cred = l ? l->l_cred : NOCRED;
 
@@ -1094,12 +1094,6 @@ lfs_mountfs(struct vnode *devvp, struct 
 	cv_init(&fs->lfs_stopcv, "lfsstop");
 	cv_init(&fs->lfs_nextsegsleep, "segment");
 
-	/* Initialize values for all LFS mounts */
-	if (!lfs_mounted_once) {
-		cv_init(&lfs_allclean_wakeup, "segment");
-		lfs_mounted_once = true;
-	}
-
 	/* Set the file system readonly/modify bits. */
 	fs->lfs_ronly = ronly;
 	if (ronly == 0)
@@ -1137,6 +1131,7 @@ lfs_mountfs(struct vnode *devvp, struct 
 	mp->mnt_stat.f_iosize = lfs_sb_getbsize(fs);
 	mp->mnt_flag |= MNT_LOCAL;
 	mp->mnt_fs_bshift = lfs_sb_getbshift(fs);
+	mp->mnt_iflag |= IMNT_CAN_RWTORO;
 	if (fs->um_maxsymlinklen > 0)
 		mp->mnt_iflag |= IMNT_DTYPE;
 	else
@@ -1169,8 +1164,8 @@ lfs_mountfs(struct vnode *devvp, struct 
 	fs->lfs_ivnode = vp;
 	vref(vp);
 
-	/* Set up inode bitmap and order free list */
-	lfs_order_freelist(fs);
+	/* Set up inode bitmap, order free list, and gather orphans.  */
+	lfs_order_freelist(fs, &orphan, &norphan);
 
 	/* Set up segment usage flags for the autocleaner. */
 	fs->lfs_nactive = 0;
@@ -1209,6 +1204,9 @@ lfs_mountfs(struct vnode *devvp, struct 
 			brelse(bp, 0);
 	}
 
+	/* Free the orphans we discovered while ordering the freelist.  */
+	lfs_free_orphans(fs, orphan, norphan);
+
 	/*
 	 * XXX: if the fs has quotas, quotas should be on even if
 	 * readonly. Otherwise you can't query the quota info!
@@ -1328,22 +1326,72 @@ out:
 int
 lfs_unmount(struct mount *mp, int mntflags)
 {
-	struct lwp *l = curlwp;
 	struct ulfsmount *ump;
 	struct lfs *fs;
-	int error, flags, ronly;
-	vnode_t *vp;
+	int error, ronly;
+
+	ump = VFSTOULFS(mp);
+	fs = ump->um_lfs;
+
+	error = lfs_flushfiles(mp, mntflags & MNT_FORCE ? FORCECLOSE : 0);
+	if (error)
+		return error;
+
+	/* Finish with the Ifile, now that we're done with it */
+	vgone(fs->lfs_ivnode);
+
+	ronly = !fs->lfs_ronly;
+	if (fs->lfs_devvp->v_type != VBAD)
+		spec_node_setmountedfs(fs->lfs_devvp, NULL);
+	vn_lock(fs->lfs_devvp, LK_EXCLUSIVE | LK_RETRY);
+	error = VOP_CLOSE(fs->lfs_devvp,
+	    ronly ? FREAD : FREAD|FWRITE, NOCRED);
+	vput(fs->lfs_devvp);
+
+	/* Complain about page leakage */
+	if (fs->lfs_pages > 0)
+		printf("lfs_unmount: still claim %d pages (%d in subsystem)\n",
+			fs->lfs_pages, lfs_subsys_pages);
+
+	/* Free per-mount data structures */
+	free(fs->lfs_ino_bitmap, M_SEGMENT);
+	free(fs->lfs_suflags[0], M_SEGMENT);
+	free(fs->lfs_suflags[1], M_SEGMENT);
+	free(fs->lfs_suflags, M_SEGMENT);
+	lfs_free_resblks(fs);
+	cv_destroy(&fs->lfs_sleeperscv);
+	cv_destroy(&fs->lfs_diropscv);
+	cv_destroy(&fs->lfs_stopcv);
+	cv_destroy(&fs->lfs_nextsegsleep);
+
+	rw_destroy(&fs->lfs_fraglock);
+	rw_destroy(&fs->lfs_iflock);
+
+	kmem_free(fs, sizeof(struct lfs));
+	kmem_free(ump, sizeof(*ump));
 
-	flags = 0;
-	if (mntflags & MNT_FORCE)
-		flags |= FORCECLOSE;
+	mp->mnt_data = NULL;
+	mp->mnt_flag &= ~MNT_LOCAL;
+	return (error);
+}
+
+static int
+lfs_flushfiles(struct mount *mp, int flags)
+{
+	struct lwp *l = curlwp;
+	struct ulfsmount *ump;
+	struct lfs *fs;
+	struct vnode *vp;
+	int error;
 
 	ump = VFSTOULFS(mp);
 	fs = ump->um_lfs;
 
 	/* Two checkpoints */
-	lfs_segwrite(mp, SEGM_CKP | SEGM_SYNC);
-	lfs_segwrite(mp, SEGM_CKP | SEGM_SYNC);
+	if (!fs->lfs_ronly) {
+		lfs_segwrite(mp, SEGM_CKP | SEGM_SYNC);
+		lfs_segwrite(mp, SEGM_CKP | SEGM_SYNC);
+	}
 
 	/* wake up the cleaner so it can die */
 	/* XXX: shouldn't this be *after* the error cases below? */
@@ -1383,51 +1431,18 @@ lfs_unmount(struct mount *mp, int mntfla
 	mutex_exit(vp->v_interlock);
 
 	/* Explicitly write the superblock, to update serial and pflags */
-	lfs_sb_setpflags(fs, lfs_sb_getpflags(fs) | LFS_PF_CLEAN);
-	lfs_writesuper(fs, lfs_sb_getsboff(fs, 0));
-	lfs_writesuper(fs, lfs_sb_getsboff(fs, 1));
+	if (!fs->lfs_ronly) {
+		lfs_sb_setpflags(fs, lfs_sb_getpflags(fs) | LFS_PF_CLEAN);
+		lfs_writesuper(fs, lfs_sb_getsboff(fs, 0));
+		lfs_writesuper(fs, lfs_sb_getsboff(fs, 1));
+	}
 	mutex_enter(&lfs_lock);
 	while (fs->lfs_iocount)
 		mtsleep(&fs->lfs_iocount, PRIBIO + 1, "lfs_umount", 0,
 			&lfs_lock);
 	mutex_exit(&lfs_lock);
 
-	/* Finish with the Ifile, now that we're done with it */
-	vgone(fs->lfs_ivnode);
-
-	ronly = !fs->lfs_ronly;
-	if (fs->lfs_devvp->v_type != VBAD)
-		spec_node_setmountedfs(fs->lfs_devvp, NULL);
-	vn_lock(fs->lfs_devvp, LK_EXCLUSIVE | LK_RETRY);
-	error = VOP_CLOSE(fs->lfs_devvp,
-	    ronly ? FREAD : FREAD|FWRITE, NOCRED);
-	vput(fs->lfs_devvp);
-
-	/* Complain about page leakage */
-	if (fs->lfs_pages > 0)
-		printf("lfs_unmount: still claim %d pages (%d in subsystem)\n",
-			fs->lfs_pages, lfs_subsys_pages);
-
-	/* Free per-mount data structures */
-	free(fs->lfs_ino_bitmap, M_SEGMENT);
-	free(fs->lfs_suflags[0], M_SEGMENT);
-	free(fs->lfs_suflags[1], M_SEGMENT);
-	free(fs->lfs_suflags, M_SEGMENT);
-	lfs_free_resblks(fs);
-	cv_destroy(&fs->lfs_sleeperscv);
-	cv_destroy(&fs->lfs_diropscv);
-	cv_destroy(&fs->lfs_stopcv);
-	cv_destroy(&fs->lfs_nextsegsleep);
-
-	rw_destroy(&fs->lfs_fraglock);
-	rw_destroy(&fs->lfs_iflock);
-
-	kmem_free(fs, sizeof(struct lfs));
-	kmem_free(ump, sizeof(*ump));
-
-	mp->mnt_data = NULL;
-	mp->mnt_flag &= ~MNT_LOCAL;
-	return (error);
+	return 0;
 }
 
 /*
@@ -2112,7 +2127,7 @@ lfs_gop_write(struct vnode *vp, struct v
 	mbp->b_bufsize = npages << PAGE_SHIFT;
 	mbp->b_data = (void *)kva;
 	mbp->b_resid = mbp->b_bcount = bytes;
-	mbp->b_cflags = BC_BUSY|BC_AGE;
+	mbp->b_cflags |= BC_BUSY|BC_AGE;
 	mbp->b_iodone = uvm_aio_biodone;
 
 	bp = NULL;

Index: src/sys/ufs/lfs/lfs_vnops.c
diff -u src/sys/ufs/lfs/lfs_vnops.c:1.324 src/sys/ufs/lfs/lfs_vnops.c:1.324.2.1
--- src/sys/ufs/lfs/lfs_vnops.c:1.324	Thu Jun 20 00:49:11 2019
+++ src/sys/ufs/lfs/lfs_vnops.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: lfs_vnops.c,v 1.324 2019/06/20 00:49:11 christos Exp $	*/
+/*	$NetBSD: lfs_vnops.c,v 1.324.2.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -125,7 +125,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_vnops.c,v 1.324 2019/06/20 00:49:11 christos Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_vnops.c,v 1.324.2.1 2020/08/17 10:30:22 martin Exp $");
 
 #ifdef _KERNEL_OPT
 #include "opt_compat_netbsd.h"
@@ -1602,7 +1602,7 @@ lfs_strategy(void *v)
 int
 lfs_flush_dirops(struct lfs *fs)
 {
-	struct inode *ip, *nip;
+	struct inode *ip, *marker;
 	struct vnode *vp;
 	extern int lfs_dostats; /* XXX this does not belong here */
 	struct segment *sp;
@@ -1611,7 +1611,8 @@ lfs_flush_dirops(struct lfs *fs)
 	int error = 0;
 
 	ASSERT_MAYBE_SEGLOCK(fs);
-	KASSERT(fs->lfs_nadirop == 0);
+	KASSERT(fs->lfs_nadirop == 0); /* stable during lfs_writer */
+	KASSERT(fs->lfs_dirops == 0);  /* stable during lfs_writer */
 
 	if (fs->lfs_ronly)
 		return EROFS;
@@ -1626,6 +1627,12 @@ lfs_flush_dirops(struct lfs *fs)
 	if (lfs_dostats)
 		++lfs_stats.flush_invoked;
 
+	marker = pool_get(&lfs_inode_pool, PR_WAITOK);
+	memset(marker, 0, sizeof(*marker));
+	marker->inode_ext.lfs = pool_get(&lfs_inoext_pool, PR_WAITOK);
+	memset(marker->inode_ext.lfs, 0, sizeof(*marker->inode_ext.lfs));
+	marker->i_state = IN_MARKER;
+
 	lfs_imtime(fs);
 	lfs_seglock(fs, flags);
 	sp = fs->lfs_sp;
@@ -1644,15 +1651,41 @@ lfs_flush_dirops(struct lfs *fs)
 	 *
 	 */
 	mutex_enter(&lfs_lock);
-	for (ip = TAILQ_FIRST(&fs->lfs_dchainhd); ip != NULL; ip = nip) {
-		nip = TAILQ_NEXT(ip, i_lfs_dchain);
-		mutex_exit(&lfs_lock);
+	KASSERT(fs->lfs_writer);
+	TAILQ_INSERT_HEAD(&fs->lfs_dchainhd, marker, i_lfs_dchain);
+	while ((ip = TAILQ_NEXT(marker, i_lfs_dchain)) != NULL) {
+		TAILQ_REMOVE(&fs->lfs_dchainhd, marker, i_lfs_dchain);
+		TAILQ_INSERT_AFTER(&fs->lfs_dchainhd, ip, marker,
+		    i_lfs_dchain);
+		if (ip->i_state & IN_MARKER)
+			continue;
 		vp = ITOV(ip);
-		mutex_enter(vp->v_interlock);
 
+		/*
+		 * Prevent the vnode from going away if it's just been
+		 * put out in the segment and lfs_unmark_dirop is about
+		 * to release it.  While it is on the list it is always
+		 * referenced, so it cannot be reclaimed until we
+		 * release it.
+		 */
+		vref(vp);
+
+		/*
+		 * Since we hold lfs_writer, the node can't be in an
+		 * active dirop.  Since it's on the list and we hold a
+		 * reference to it, it can't be reclaimed now.
+		 */
 		KASSERT((ip->i_state & IN_ADIROP) == 0);
 		KASSERT(vp->v_uflag & VU_DIROP);
-		KASSERT(vdead_check(vp, VDEAD_NOWAIT) == 0);
+
+		/*
+		 * After we release lfs_lock, if we were in the middle
+		 * of writing a segment, lfs_unmark_dirop may end up
+		 * clearing VU_DIROP, and we have no way to stop it.
+		 * That should be OK -- we'll just have less to do
+		 * here.
+		 */
+		mutex_exit(&lfs_lock);
 
 		/*
 		 * All writes to directories come from dirops; all
@@ -1662,15 +1695,6 @@ lfs_flush_dirops(struct lfs *fs)
 		 * directory blocks inodes and file inodes.  So we don't
 		 * really need to lock.
 		 */
-		if (vdead_check(vp, VDEAD_NOWAIT) != 0) {
-			mutex_exit(vp->v_interlock);
-			mutex_enter(&lfs_lock);
-			continue;
-		}
-		mutex_exit(vp->v_interlock);
-		/* XXX see below
-		 * waslocked = VOP_ISLOCKED(vp);
-		 */
 		if (vp->v_type != VREG &&
 		    ((ip->i_state & IN_ALLMOD) || !VPISEMPTY(vp))) {
 			error = lfs_writefile(fs, sp, vp);
@@ -1681,15 +1705,17 @@ lfs_flush_dirops(struct lfs *fs)
 			    	mutex_exit(&lfs_lock);
 			}
 			if (error && (sp->seg_flags & SEGM_SINGLE)) {
+				vrele(vp);
 				mutex_enter(&lfs_lock);
 				error = EAGAIN;
 				break;
 			}
 		}
-		KDASSERT(ip->i_number != LFS_IFILE_INUM);
+		KASSERT(ip->i_number != LFS_IFILE_INUM);
 		error = lfs_writeinode(fs, sp, ip);
-		mutex_enter(&lfs_lock);
 		if (error && (sp->seg_flags & SEGM_SINGLE)) {
+			vrele(vp);
+			mutex_enter(&lfs_lock);
 			error = EAGAIN;
 			break;
 		}
@@ -1702,9 +1728,16 @@ lfs_flush_dirops(struct lfs *fs)
 		 * write them.
 		 */
 		/* XXX only for non-directories? --KS */
+		mutex_enter(&lfs_lock);
 		LFS_SET_UINO(ip, IN_MODIFIED);
+		mutex_exit(&lfs_lock);
+
+		vrele(vp);
+		mutex_enter(&lfs_lock);
 	}
+	TAILQ_REMOVE(&fs->lfs_dchainhd, marker, i_lfs_dchain);
 	mutex_exit(&lfs_lock);
+
 	/* We've written all the dirops there are */
 	ssp = (SEGSUM *)sp->segsum;
 	lfs_ss_setflags(fs, ssp, lfs_ss_getflags(fs, ssp) & ~(SS_CONT));
@@ -1712,6 +1745,9 @@ lfs_flush_dirops(struct lfs *fs)
 	(void) lfs_writeseg(fs, sp);
 	lfs_segunlock(fs);
 
+	pool_put(&lfs_inoext_pool, marker->inode_ext.lfs);
+	pool_put(&lfs_inode_pool, marker);
+
 	return error;
 }
 
@@ -1732,6 +1768,7 @@ lfs_flush_pchain(struct lfs *fs)
 	int error, error2;
 
 	ASSERT_NO_SEGLOCK(fs);
+	KASSERT(fs->lfs_writer);
 
 	if (fs->lfs_ronly)
 		return EROFS;
@@ -1802,7 +1839,7 @@ lfs_flush_pchain(struct lfs *fs)
 			LFS_SET_UINO(ip, IN_MODIFIED);
 		    	mutex_exit(&lfs_lock);
 		}
-		KDASSERT(ip->i_number != LFS_IFILE_INUM);
+		KASSERT(ip->i_number != LFS_IFILE_INUM);
 		error2 = lfs_writeinode(fs, sp, ip);
 
 		VOP_UNLOCK(vp);

Index: src/usr.sbin/dumplfs/dumplfs.c
diff -u src/usr.sbin/dumplfs/dumplfs.c:1.64 src/usr.sbin/dumplfs/dumplfs.c:1.64.4.1
--- src/usr.sbin/dumplfs/dumplfs.c:1.64	Fri Jun 15 15:16:05 2018
+++ src/usr.sbin/dumplfs/dumplfs.c	Mon Aug 17 10:30:22 2020
@@ -1,4 +1,4 @@
-/*	$NetBSD: dumplfs.c,v 1.64 2018/06/15 15:16:05 christos Exp $	*/
+/*	$NetBSD: dumplfs.c,v 1.64.4.1 2020/08/17 10:30:22 martin Exp $	*/
 
 /*-
  * Copyright (c) 1991, 1993
@@ -40,7 +40,7 @@ __COPYRIGHT("@(#) Copyright (c) 1991, 19
 #if 0
 static char sccsid[] = "@(#)dumplfs.c	8.5 (Berkeley) 5/24/95";
 #else
-__RCSID("$NetBSD: dumplfs.c,v 1.64 2018/06/15 15:16:05 christos Exp $");
+__RCSID("$NetBSD: dumplfs.c,v 1.64.4.1 2020/08/17 10:30:22 martin Exp $");
 #endif
 #endif /* not lint */
 
@@ -133,7 +133,7 @@ print_ientry(int i, struct lfs *lfsp, IF
 	else
 		printf("%d\tINUSE\t%u\t%8jX\t%s\n",
 		    i, version, (intmax_t)daddr,
-		    nextfree == LFS_ORPHAN_NEXTFREE ? "FFFFFFFF" : "-");
+		    nextfree == LFS_ORPHAN_NEXTFREE(lfsp) ? "orphan" : "-");
 }
 
 /*

Reply via email to