Originally I thought that extracting ZFS out of the kernel
as a shared library would not be as easy as it it has turned out to
be. Obviously after figuring couple of important gotchas which I
describe below and in the code comments.

The advantages of moving ZFS to a separate library are following:
- kernel becomes ~900K smaller
- there are at least 10 less threads needed to run non-ZFS image
  (running ROFS image on 1 cpu requires only 25 threads)

I also hope this patch provides a blueprint of how we could implement
another ext2/3/4 filesystem driver (see #1179) or other true kernel modules.

The essence of this patch are changes to the main makefile to build
new libsolaris.so and various ZFS-related parts of the kernel like
pagecache, arc_shrinker and ZFS dev driver to make them call into
libsolaris.so upon dynamically registering handful of callbacks.

The new libsolaris.so is mainly composed of the solaris and zfs sets
as defined in the makefile (and not part of the kernel anymore)
plus bsd RPC code (xdr*), kobj and finally new fs/zfs/zfs_initialize.c
which provides main INIT function - zfs_initialize(). The
zfs_initialize() initializes various ZFS resources like threads and
memory and registers various callback functions into the main kernel
(see comments in zfs_initialize.c).

Two important gotchas I have discovered are:
1) The libsolaris.so needs to build with BIND_NOW to make all symbols
   resolved eagerly to avoid page faults to resolve those symbols later
   if the ZFS code in libsolaris.so is called to resolve other faults.
   This would cause deadlocks.
2) The libsolaris.so needs the osv-mlock note so that dynamic linker
   would populate the mappings. This is similar to above to avoid page
   faults later that would lead to deadlocks.

Please note the libsolaris.so is built with most symbols hidden
and code garbage collection on to help minimize its size (804K) 
and expose minimum number of symbols (< 100) needed by libzfs.so.
The latter also helps avoid possible symbol collision with other apps.

We also make changes to loader.cc to dlopen("/libsolaris.so") before
we mount ZFS filesystem (for that reason libsolaris.so needs to be part
of the bootfs for ZFS images). Because ZFS is root filesystem, we cannot
use the same approach we used for nfs which is also implemented as a
shared library but loaded in pivot_rootfs() which happens much later.

In theory we could build mixed disk with two partitions - 1st ROFS
one with libsolaris.so on it and the 2nd ZFS one which would be mounted
after we mount ROFS and load and initialize libsolaris.so from it.

I have tested this patch by running unit tests (all pass) and also using
tests/misc-zfs-io.cc as well as running stress test of MySQL on ZFS
image.

Fixes #1009

Signed-off-by: Waldemar Kozaczuk <jwkozac...@gmail.com>
---
 Makefile                  | 51 ++++++++++++++++----
 bootfs.manifest.skel      |  1 +
 bsd/init.cc               |  7 ---
 bsd/porting/shrinker.cc   | 22 +++++++--
 core/pagecache.cc         | 45 +++++++++++++-----
 drivers/zfs.cc            | 12 ++++-
 fs/zfs/zfs_initialize.c   | 97 +++++++++++++++++++++++++++++++++++++++
 fs/zfs/zfs_null_vfsops.cc | 54 ++++++++++++++++++++++
 libc/misc/uname.c         |  2 +-
 loader.cc                 | 50 ++++++++++++--------
 usr.manifest.skel         |  1 +
 11 files changed, 289 insertions(+), 53 deletions(-)
 create mode 100644 fs/zfs/zfs_initialize.c
 create mode 100644 fs/zfs/zfs_null_vfsops.cc

diff --git a/Makefile b/Makefile
index 7acf130c..d88efdb9 100644
--- a/Makefile
+++ b/Makefile
@@ -568,7 +568,6 @@ bsd += bsd/porting/kthread.o
 bsd += bsd/porting/mmu.o
 bsd += bsd/porting/pcpu.o
 bsd += bsd/porting/bus_dma.o
-bsd += bsd/porting/kobj.o
 bsd += bsd/sys/netinet/if_ether.o
 bsd += bsd/sys/compat/linux/linux_socket.o
 bsd += bsd/sys/compat/linux/linux_ioctl.o
@@ -618,9 +617,6 @@ bsd += bsd/sys/netinet/cc/cc_cubic.o
 bsd += bsd/sys/netinet/cc/cc_htcp.o
 bsd += bsd/sys/netinet/cc/cc_newreno.o
 bsd += bsd/sys/netinet/arpcache.o
-bsd += bsd/sys/xdr/xdr.o
-bsd += bsd/sys/xdr/xdr_array.o
-bsd += bsd/sys/xdr/xdr_mem.o
 bsd += bsd/sys/xen/evtchn.o
 
 ifeq ($(arch),x64)
@@ -644,6 +640,11 @@ bsd += bsd/sys/dev/random/live_entropy_sources.o
 
 $(out)/bsd/sys/%.o: COMMON += -Wno-sign-compare -Wno-narrowing 
-Wno-write-strings -Wno-parentheses -Wno-unused-but-set-variable
 
+xdr :=
+xdr += bsd/sys/xdr/xdr.o
+xdr += bsd/sys/xdr/xdr_array.o
+xdr += bsd/sys/xdr/xdr_mem.o
+
 solaris :=
 solaris += bsd/sys/cddl/compat/opensolaris/kern/opensolaris.o
 solaris += bsd/sys/cddl/compat/opensolaris/kern/opensolaris_atomic.o
@@ -799,7 +800,7 @@ libtsm += drivers/libtsm/tsm_screen.o
 libtsm += drivers/libtsm/tsm_vte.o
 libtsm += drivers/libtsm/tsm_vte_charsets.o
 
-drivers := $(bsd) $(solaris)
+drivers := $(bsd)
 drivers += core/mmu.o
 drivers += arch/$(arch)/early-console.o
 drivers += drivers/console.o
@@ -1849,6 +1850,7 @@ fs_objs += virtiofs/virtiofs_vfsops.o \
 fs_objs += pseudofs/pseudofs.o
 fs_objs += procfs/procfs_vnops.o
 fs_objs += sysfs/sysfs_vnops.o
+fs_objs += zfs/zfs_null_vfsops.o
 
 objects += $(addprefix fs/, $(fs_objs))
 objects += $(addprefix libc/, $(libc))
@@ -2035,11 +2037,11 @@ $(out)/empty_bootfs.o: ASFLAGS += -I$(out)
 
 $(out)/tools/mkfs/mkfs.so: $(out)/tools/mkfs/mkfs.o $(out)/libzfs.so
        $(makedir)
-       $(call quiet, $(CC) $(CFLAGS) -o $@ $(out)/tools/mkfs/mkfs.o -L$(out) 
-lzfs, LINK mkfs.so)
+       $(call quiet, $(CC) $(CFLAGS) -o $@ $(out)/tools/mkfs/mkfs.o -L$(out) 
-lzfs -lstdc++, LINK mkfs.so)
 
 $(out)/tools/cpiod/cpiod.so: $(out)/tools/cpiod/cpiod.o 
$(out)/tools/cpiod/cpio.o $(out)/libzfs.so
        $(makedir)
-       $(call quiet, $(CC) $(CFLAGS) -o $@ $(out)/tools/cpiod/cpiod.o 
$(out)/tools/cpiod/cpio.o -L$(out) -lzfs, LINK cpiod.so)
+       $(call quiet, $(CC) $(CFLAGS) -o $@ $(out)/tools/cpiod/cpiod.o 
$(out)/tools/cpiod/cpio.o -L$(out) -lzfs -lstdc++, LINK cpiod.so)
 
 
################################################################################
 # The dependencies on header files are automatically generated only after the
@@ -2117,6 +2119,34 @@ libzfs-objects = $(foreach file, $(libzfs-file-list), 
$(out)/bsd/cddl/contrib/op
 libzpool-file-list = util kernel
 libzpool-objects = $(foreach file, $(libzpool-file-list), 
$(out)/bsd/cddl/contrib/opensolaris/lib/libzpool/common/$(file).o)
 
+libsolaris-objects = $(foreach file, $(solaris) $(xdr), $(out)/$(file))
+libsolaris-objects += $(out)/bsd/porting/kobj.o $(out)/fs/zfs/zfs_initialize.o
+
+$(libsolaris-objects): kernel-defines = -D_KERNEL $(source-dialects) 
-fvisibility=hidden -ffunction-sections -fdata-sections
+
+$(out)/fs/zfs/zfs_initialize.o: CFLAGS+= \
+       -DBUILDING_ZFS \
+       -Ibsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs \
+       -Ibsd/sys/cddl/contrib/opensolaris/common/zfs \
+       -Ibsd/sys/cddl/compat/opensolaris \
+       -Ibsd/sys/cddl/contrib/opensolaris/common \
+       -Ibsd/sys/cddl/contrib/opensolaris/uts/common \
+       -Ibsd/sys \
+       -Wno-array-bounds \
+       -fno-strict-aliasing \
+       -Wno-unknown-pragmas \
+       -Wno-unused-variable \
+       -Wno-switch \
+       -Wno-maybe-uninitialized
+
+#build libsolaris.so with -z,now so that all symbols get resolved eagerly 
(BIND_NOW)
+#also make sure libsolaris.so has osv-mlock note (see zfs_initialize.c) so that
+# the file segments get loaded eagerly as well when mmapped
+comma:=,
+$(out)/libsolaris.so: $(libsolaris-objects)
+       $(makedir)
+       $(call quiet, $(CC) $(CFLAGS) -Wl$(comma)-z$(comma)now 
-Wl$(comma)--gc-sections -o $@ $(libsolaris-objects) -L$(out), LINK 
libsolaris.so)
+
 libzfs-objects += $(libzpool-objects)
 libzfs-objects += $(out)/bsd/cddl/compat/opensolaris/misc/mkdirp.o
 libzfs-objects += $(out)/bsd/cddl/compat/opensolaris/misc/zmount.o
@@ -2158,6 +2188,9 @@ $(libzfs-objects): CFLAGS += -Wno-switch 
-D__va_list=__builtin_va_list '-DTEXT_D
                        -Wno-maybe-uninitialized -Wno-unused-variable 
-Wno-unknown-pragmas -Wno-unused-function \
                        -D_OPENSOLARIS_SYS_UIO_H_
 
+$(out)/bsd/cddl/contrib/opensolaris/lib/libzpool/common/kernel.o: CFLAGS += 
-fvisibility=hidden
+$(out)/bsd/cddl/contrib/opensolaris/lib/libzfs/common/zfs_prop.o: CFLAGS += 
-fvisibility=hidden
+
 # Note: zfs_prop.c and zprop_common.c are also used by the kernel, thus the 
manual targets.
 $(out)/bsd/cddl/contrib/opensolaris/lib/libzfs/common/zfs_prop.o: 
bsd/sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c | generated-headers
        $(makedir)
@@ -2167,9 +2200,9 @@ 
$(out)/bsd/cddl/contrib/opensolaris/lib/libzfs/common/zprop_common.o: bsd/sys/cd
        $(makedir)
        $(call quiet, $(CC) $(CFLAGS) -c -o $@ $<, CC $<)
 
-$(out)/libzfs.so: $(libzfs-objects) $(out)/libuutil.so
+$(out)/libzfs.so: $(libzfs-objects) $(out)/libuutil.so $(out)/libsolaris.so
        $(makedir)
-       $(call quiet, $(CC) $(CFLAGS) -o $@ $(libzfs-objects) -L$(out) -luutil, 
LINK libzfs.so)
+       $(call quiet, $(CC) $(CFLAGS) -o $@ $(libzfs-objects) -L$(out) -luutil 
-lsolaris, LINK libzfs.so)
 
 #include $(src)/bsd/cddl/contrib/opensolaris/cmd/zpool/build.mk:
 zpool-cmd-file-list = zpool_iter  zpool_main  zpool_util  zpool_vdev
diff --git a/bootfs.manifest.skel b/bootfs.manifest.skel
index a819f1a1..aad2bbeb 100644
--- a/bootfs.manifest.skel
+++ b/bootfs.manifest.skel
@@ -3,6 +3,7 @@
 /libuutil.so: libuutil.so
 /zpool.so: zpool.so
 /libzfs.so: libzfs.so
+/libsolaris.so: libsolaris.so
 /zfs.so: zfs.so
 /tools/mkfs.so: tools/mkfs/mkfs.so
 /tools/cpiod.so: tools/cpiod/cpiod.so
diff --git a/bsd/init.cc b/bsd/init.cc
index f0e8e32c..e2c8c564 100644
--- a/bsd/init.cc
+++ b/bsd/init.cc
@@ -15,10 +15,6 @@
 #include <bsd/sys/sys/eventhandler.h>
 
 extern "C" {
-    extern void system_taskq_init(void *arg);
-    extern void opensolaris_load(void *arg);
-    extern void callb_init(void *arg);
-
     // taskqueue
     #include <bsd/sys/sys/taskqueue.h>
     #include <bsd/sys/sys/priority.h>
@@ -49,9 +45,6 @@ void bsd_init(void)
 
     arc4_init();
     eventhandler_init(NULL);
-    opensolaris_load(NULL);
-    callb_init(NULL);
-    system_taskq_init(NULL);
 
     debug(" - done\n");
 }
diff --git a/bsd/porting/shrinker.cc b/bsd/porting/shrinker.cc
index 3fb7aff1..b6e83d7f 100644
--- a/bsd/porting/shrinker.cc
+++ b/bsd/porting/shrinker.cc
@@ -45,14 +45,17 @@ arc_shrinker::arc_shrinker()
 {
 }
 
-extern "C" size_t arc_lowmem(void *arg, int howto);
-extern "C" size_t arc_sized_adjust(int64_t to_reclaim);
+//These two function pointers will be set dynamically in INIT function of
+//libsolaris.so by calling register_shrinker_funs() below. The arc_lowmem()
+//and arc_sized_adjust() are functions defined in libsolaris.so.
+size_t (*arc_lowmem_fun)(void *arg, int howto);
+size_t (*arc_sized_adjust_fun)(int64_t to_reclaim);
 
 size_t arc_shrinker::request_memory(size_t s, bool hard)
 {
     size_t ret = 0;
     if (hard) {
-        ret = arc_lowmem(nullptr, 0);
+        ret = (*arc_lowmem_fun)(nullptr, 0);
         // ARC's aggressive mode will call arc_adjust, which will reduce the 
size of the
         // cache, but won't necessarily free as much memory as we need. If it 
doesn't,
         // keep going in soft mode. This is better than calling arc_lowmem() 
again, since
@@ -67,7 +70,7 @@ size_t arc_shrinker::request_memory(size_t s, bool hard)
     // minimum of 16 M.
     s = std::max(s, (16ul << 20));
     do {
-        size_t r = arc_sized_adjust(s);
+        size_t r = (*arc_sized_adjust_fun)(s);
         if (r == 0) {
             break;
         }
@@ -89,7 +92,7 @@ void bsd_shrinker_init(void)
 
         auto *_ee = (struct eventhandler_entry_generic *)ep;
 
-        if ((void *)_ee->func == (void *)arc_lowmem) {
+        if ((void *)_ee->func == (void *)arc_lowmem_fun) {
             new arc_shrinker();
         } else {
             new bsd_shrinker(_ee);
@@ -99,3 +102,12 @@ void bsd_shrinker_init(void)
 
     debug("BSD shrinker: unlocked, running\n");
 }
+
+//This needs to be a C-style function so it can be called
+//from libsolaris.so
+extern "C" void register_shrinker_arc_funs(
+    size_t (*_arc_lowmem_fun)(void *, int),
+    size_t (*_arc_sized_adjust_fun)(int64_t)) {
+    arc_lowmem_fun = _arc_lowmem_fun;
+    arc_sized_adjust_fun = _arc_sized_adjust_fun;
+}
diff --git a/core/pagecache.cc b/core/pagecache.cc
index b58a97fb..dc0c2947 100644
--- a/core/pagecache.cc
+++ b/core/pagecache.cc
@@ -19,11 +19,26 @@
 #include <osv/prio.hh>
 #include <chrono>
 
-extern "C" {
-void arc_unshare_buf(arc_buf_t*);
-void arc_share_buf(arc_buf_t*);
-void arc_buf_accessed(const uint64_t[4]);
-void arc_buf_get_hashkey(arc_buf_t*, uint64_t[4]);
+//These four function pointers will be set dynamically in INIT function of
+//libsolaris.so by calling register_pagecache_arc_funs() below. The 
arc_unshare_buf(),
+//arc_share_buf(), arc_buf_accessed() and arc_buf_get_hashkey()
+//are functions defined in libsolaris.so.
+void (*arc_unshare_buf_fun)(arc_buf_t*);
+void (*arc_share_buf_fun)(arc_buf_t*);
+void (*arc_buf_accessed_fun)(const uint64_t[4]);
+void (*arc_buf_get_hashkey_fun)(arc_buf_t*, uint64_t[4]);
+
+//This needs to be a C-style function so it can be called
+//from libsolaris.so
+extern "C" void register_pagecache_arc_funs(
+    void (*_arc_unshare_buf_fun)(arc_buf_t*),
+    void (*_arc_share_buf_fun)(arc_buf_t*),
+    void (*_arc_buf_accessed_fun)(const uint64_t[4]),
+    void (*_arc_buf_get_hashkey_fun)(arc_buf_t*, uint64_t[4])) {
+    arc_unshare_buf_fun = _arc_unshare_buf_fun;
+    arc_share_buf_fun = _arc_share_buf_fun;
+    arc_buf_accessed_fun = _arc_buf_accessed_fun;
+    arc_buf_get_hashkey_fun = _arc_buf_get_hashkey_fun;
 }
 
 namespace std {
@@ -270,7 +285,7 @@ public:
     cached_page_arc(hashkey key, void* page, arc_buf_t* ab) : cached_page(key, 
page), _ab(ref(ab, this)) {}
     virtual ~cached_page_arc() {
         if (!_removed && unref(_ab, this)) {
-            arc_unshare_buf(_ab);
+            (*arc_unshare_buf_fun)(_ab);
         }
     }
     arc_buf_t* arcbuf() {
@@ -439,7 +454,7 @@ void map_arc_buf(hashkey *key, arc_buf_t* ab, void *page)
     SCOPE_LOCK(arc_read_lock);
     cached_page_arc* pc = new cached_page_arc(*key, page, ab);
     arc_read_cache.emplace(*key, pc);
-    arc_share_buf(ab);
+    (*arc_share_buf_fun)(ab);
 }
 
 void map_read_cached_page(hashkey *key, void *page)
@@ -656,7 +671,7 @@ void sync(vfs_file* fp, off_t start, off_t end)
 }
 
 TRACEPOINT(trace_access_scanner, "scanned=%u, cleared=%u, %%cpu=%g", unsigned, 
unsigned, double);
-static class access_scanner {
+class access_scanner {
     static constexpr double _max_cpu = 20;
     static constexpr double _min_cpu = 0.1;
     static constexpr unsigned _freq = 1000;
@@ -673,7 +688,7 @@ private:
             return false;
         }
         for (auto&& arc_hashkey: accessed) {
-            arc_buf_accessed(arc_hashkey.key);
+            (*arc_buf_accessed_fun)(arc_hashkey.key);
         }
         accessed.clear();
         return true;
@@ -708,7 +723,7 @@ private:
                         auto cp = p.second;
                         if (cp->clear_accessed()) {
                             arc_hashkey arc_hashkey;
-                            arc_buf_get_hashkey(arcbuf, arc_hashkey.key);
+                            (*arc_buf_get_hashkey_fun)(arcbuf, 
arc_hashkey.key);
                             accessed.emplace(arc_hashkey);
                             cleared++;
                         }
@@ -746,10 +761,18 @@ private:
             cleared /= 2;
         }
     }
-} s_access_scanner;
+};
+
+static access_scanner *s_access_scanner = nullptr;
 
 constexpr double access_scanner::_max_cpu;
 constexpr double access_scanner::_min_cpu;
 
+}
 
+//The access_scanner thread is ZFS specific so it
+//is initialized by calling the function below if libsolaris.so
+//is loaded.
+extern "C" void start_pagecache_access_scanner() {
+    pagecache::s_access_scanner = new pagecache::access_scanner();
 }
diff --git a/drivers/zfs.cc b/drivers/zfs.cc
index ef7f7812..6fad299b 100644
--- a/drivers/zfs.cc
+++ b/drivers/zfs.cc
@@ -11,7 +11,10 @@
 
 namespace zfsdev {
 
-extern "C" int osv_zfs_ioctl(unsigned long req, void* buffer);
+//The osv_zfs_ioctl_fun will be set dynamically in INIT function of
+//libsolaris.so by calling register_osv_zfs_ioctl() below. The osv_zfs_ioctl()
+//is a function defined in libsolaris.so.
+int (*osv_zfs_ioctl_fun)(unsigned long req, void* buffer);
 
 struct zfs_device_priv {
     zfs_device* drv;
@@ -24,7 +27,7 @@ static zfs_device_priv *to_priv(device *dev)
 
 static int zfs_ioctl(device* dev, ulong req, void* buffer)
 {
-    return osv_zfs_ioctl(req, buffer);
+    return (*osv_zfs_ioctl_fun)(req, buffer);
 }
 
 static devops zfs_device_devops = {
@@ -63,3 +66,8 @@ void zfsdev_init(void)
 }
 
 }
+
+//Needs to be a C-style function so it can be called from libsolaris.so
+extern "C" void register_osv_zfs_ioctl( int (*osv_zfs_ioctl_fun)(unsigned 
long, void*)) {
+    zfsdev::osv_zfs_ioctl_fun = osv_zfs_ioctl_fun;
+}
diff --git a/fs/zfs/zfs_initialize.c b/fs/zfs/zfs_initialize.c
new file mode 100644
index 00000000..b6336665
--- /dev/null
+++ b/fs/zfs/zfs_initialize.c
@@ -0,0 +1,97 @@
+/*
+ * Copyright (C) 2021 Waldemar Kozaczuk
+ *
+ * This work is open source software, licensed under the terms of the
+ * BSD license as described in the LICENSE file in the top-level directory.
+ */
+
+#include <stddef.h>
+#include <stdio.h>
+#include <osv/mount.h>
+#include <osv/debug.h>
+#include <sys/arc.h>
+
+//This file gets linked as part of libsolaris.so to
+//provide an INIT function to initialize ZFS filesystem
+//code
+
+extern void system_taskq_init(void *arg);
+extern void opensolaris_load(void *arg);
+extern void callb_init(void *arg);
+
+extern int osv_zfs_ioctl(unsigned long req, void* buffer);
+//The function below is part of kernel and is used to
+//register osv_zfs_ioctl() as a callback
+extern void register_osv_zfs_ioctl( int (*osv_zfs_ioctl_fun)(unsigned long, 
void*));
+
+extern size_t arc_lowmem(void *arg, int howto);
+extern size_t arc_sized_adjust(long to_reclaim);
+//The function below is part of kernel and is used to
+//register arc_lowmem() and arc_sized_adjust() as callbacks
+extern void register_shrinker_arc_funs(
+    size_t (*_arc_lowmem_fun)(void *, int),
+    size_t (*_arc_sized_adjust_fun)(long));
+
+extern void arc_unshare_buf(arc_buf_t*);
+extern void arc_share_buf(arc_buf_t*);
+extern void arc_buf_accessed(const uint64_t[4]);
+extern void arc_buf_get_hashkey(arc_buf_t*, uint64_t[4]);
+//The function below is part of kernel and is used to
+//register for functions above - arc_*() - as callbacks
+extern void register_pagecache_arc_funs(
+    void (*_arc_unshare_buf_fun)(arc_buf_t*),
+    void (*_arc_share_buf_fun)(arc_buf_t*),
+    void (*_arc_buf_accessed_fun)(const uint64_t[4]),
+    void (*_arc_buf_get_hashkey_fun)(arc_buf_t*, uint64_t[4]));
+
+extern struct vfsops zfs_vfsops;
+//The function below is part of kernel and is used to
+//update ZFS vfsops in the vfssw configuration struct
+extern void zfs_update_vfsops(struct vfsops* _vfsops);
+
+extern void start_pagecache_access_scanner();
+
+extern int zfs_init(void);
+
+//This init function gets called on loading of libsolaris.so
+//and it initializes all necessary resources (threads, etc) used by the code in
+//libsolaris.so. This initialization is necessary before ZFS can be mounted.
+void __attribute__((constructor)) zfs_initialize(void) {
+    // These 3 functions used to be called at the end of bsd_init()
+    // and are intended to initialize various resources, mainly thread pools
+    // (threads named 'system_taskq_*' and 'solthread-0x*')
+    opensolaris_load(NULL);
+    callb_init(NULL);
+    system_taskq_init(NULL);
+
+    //Register osv_zfs_ioctl() as callback in drivers/zfs.cc
+    register_osv_zfs_ioctl(&osv_zfs_ioctl);
+    //Register arc_lowmem() and arc_sized_adjust() as callbacks in arc_shrinker
+    //implemented as part of bsd/porting/shrinker.cc
+    register_shrinker_arc_funs(&arc_lowmem, &arc_sized_adjust);
+    //Register arc_unshare_buf(), arc_share_buf(), arc_buf_accessed() and 
arc_buf_get_hashkey()
+    //as callbacks in the page cache layer implemented in core/pagecache.cc
+    register_pagecache_arc_funs(&arc_unshare_buf, &arc_share_buf, 
&arc_buf_accessed, &arc_buf_get_hashkey);
+
+    //Register vfsops and vnops ...
+    zfs_update_vfsops(&zfs_vfsops);
+    //Start ZFS access scanner (part of pagecache)
+    start_pagecache_access_scanner();
+
+    //Finally call zfs_init() which is what would been normally called by 
vfs_init()
+    //The dummy zfs_init() defined in kernel does not do anything so
+    //we have to call the real one here as a last step after everything else 
above
+    //was called to initialize various ZFS resources and register relevant 
callback
+    //functions in the kernel
+    zfs_init();
+
+    debug("zfs: driver has been initialized!\n");
+}
+
+//This is important to make sure that OSv dynamic linker will
+//pre-fault (populate) all segments of libsolaris.so on load
+//before any of its code is executed. This makes it so that ZFS
+//code does not trigger any faults which is important
+//when handling map() or unmap() on ZFS files for example.
+//Without it we would encounter deadlocks in such scenarios.
+asm(".pushsection .note.osv-mlock, \"a\"; .long 0, 0, 0; .popsection");
diff --git a/fs/zfs/zfs_null_vfsops.cc b/fs/zfs/zfs_null_vfsops.cc
new file mode 100644
index 00000000..679fa40c
--- /dev/null
+++ b/fs/zfs/zfs_null_vfsops.cc
@@ -0,0 +1,54 @@
+/*
+ * Copyright (C) 2021 Waldemar Kozaczuk
+ *
+ * This work is open source software, licensed under the terms of the
+ * BSD license as described in the LICENSE file in the top-level directory.
+ */
+
+#include <osv/mount.h>
+
+#define zfs_mount   ((vfsop_mount_t)vfs_nullop)
+#define zfs_umount  ((vfsop_umount_t)vfs_nullop)
+#define zfs_sync    ((vfsop_sync_t)vfs_nullop)
+#define zfs_vget    ((vfsop_vget_t)vfs_nullop)
+#define zfs_statfs  ((vfsop_statfs_t)vfs_nullop)
+
+static int zfs_noop_mount(struct mount *mp, const char *dev, int flags,
+                          const void *data)
+{
+    printf("The zfs is in-active!. Please add libsolaris.so to the image.\n");
+    return -1;
+}
+
+/*
+ * File system operations
+ *
+ * This provides dummy vfsops when libsolaris is not loaded and ZFS filesystem
+ * is not active.
+ */
+struct vfsops zfs_vfsops = {
+    zfs_noop_mount, /* mount */
+    zfs_umount,     /* umount */
+    zfs_sync,       /* sync */
+    zfs_vget,       /* vget */
+    zfs_statfs,     /* statfs */
+    nullptr,        /* vnops */
+};
+
+extern "C" int zfs_init(void)
+{
+    return 0;
+}
+
+//Normally (without ZFS enabled) the zfs_vfsops points to dummy
+//noop functions. So when libsolaris.so is loaded, we provide the
+//function below to be called to register real vfsops for ZFS
+extern "C" void zfs_update_vfsops(struct vfsops* _vfsops) {
+    zfs_vfsops.vfs_mount = _vfsops->vfs_mount;
+    zfs_vfsops.vfs_unmount = _vfsops->vfs_unmount;
+    zfs_vfsops.vfs_sync = _vfsops->vfs_sync;
+    zfs_vfsops.vfs_mount = _vfsops->vfs_mount;
+    zfs_vfsops.vfs_vget = _vfsops->vfs_vget;
+    zfs_vfsops.vfs_statfs = _vfsops->vfs_statfs;
+    zfs_vfsops.vfs_vnops = _vfsops->vfs_vnops;
+}
diff --git a/libc/misc/uname.c b/libc/misc/uname.c
index 3f1bf754..016d74a5 100644
--- a/libc/misc/uname.c
+++ b/libc/misc/uname.c
@@ -24,7 +24,7 @@ _Static_assert(KERNEL_VERSION(LINUX_MAJOR, LINUX_MINOR, 
LINUX_PATCH)
 #define str(s) #s
 #define str2(s) str(s)
 
-struct utsname utsname OSV_HIDDEN = {
+struct utsname utsname = {
        .sysname        = "Linux",      /* lie, to avoid confusing the payload. 
*/
        .nodename       = "osv.local",
        .release        = str2(LINUX_MAJOR) "." str2(LINUX_MINOR) "." 
str2(LINUX_PATCH),
diff --git a/loader.cc b/loader.cc
index 44c0e754..da254492 100644
--- a/loader.cc
+++ b/loader.cc
@@ -57,6 +57,7 @@
 
 #include "libc/network/__dns.hh"
 #include <processor.hh>
+#include <dlfcn.h>
 
 using namespace osv;
 using namespace osv::clock::literals;
@@ -409,6 +410,7 @@ void* do_main_thread(void *_main_args)
     if (opt_mount) {
         unmount_devfs();
 
+        const auto libsolaris_file_name = "libsolaris.so";
         if (opt_rootfs.compare("rofs") == 0) {
             auto error = mount_rofs_rootfs(opt_pivot);
             if (error) {
@@ -421,14 +423,20 @@ void* do_main_thread(void *_main_args)
             }
             boot_time.event("ROFS mounted");
         } else if (opt_rootfs.compare("zfs") == 0) {
-            zfsdev::zfsdev_init();
-            auto error = mount_zfs_rootfs(opt_pivot, opt_extra_zfs_pools);
-            if (error) {
-                debug("Could not mount zfs root filesystem.\n");
-            }
+            //Initialize ZFS filesystem driver implemented in libsolaris.so
+            //TODO: Consider calling dlclose() somewhere after ZFS is unmounted
+            if (dlopen(libsolaris_file_name, RTLD_LAZY)) {
+                zfsdev::zfsdev_init();
+                auto error = mount_zfs_rootfs(opt_pivot, opt_extra_zfs_pools);
+                if (error) {
+                    debug("Could not mount zfs root filesystem.\n");
+                }
 
-            bsd_shrinker_init();
-            boot_time.event("ZFS mounted");
+                bsd_shrinker_init();
+                boot_time.event("ZFS mounted");
+            } else {
+                debug("Could not load and/or initialize %s.\n", 
libsolaris_file_name);
+            }
         } else if (opt_rootfs.compare("ramfs") == 0) {
             // NOTE: The ramfs is already mounted, we just need to mount fstab
             // entries. That's the only difference between this and --nomount.
@@ -454,18 +462,24 @@ void* do_main_thread(void *_main_args)
             } else if (mount_virtiofs_rootfs(opt_pivot) == 0) {
                 boot_time.event("Virtio-fs mounted");
             } else {
-                zfsdev::zfsdev_init();
-                auto error = mount_zfs_rootfs(opt_pivot, opt_extra_zfs_pools);
-                if (error) {
-                    debug("Could not mount zfs root filesystem (while "
-                          "auto-discovering).\n");
-                    // Continue with ramfs (already mounted)
-                    // TODO: Avoid the hack of using pivot_rootfs() just for
-                    // mounting the fstab entries.
-                    pivot_rootfs("/");
+                //Initialize ZFS filesystem driver implemented in libsolaris.so
+                //TODO: Consider calling dlclose() somewhere after ZFS is 
unmounted
+                if (dlopen("libsolaris.so", RTLD_LAZY)) {
+                    zfsdev::zfsdev_init();
+                    auto error = mount_zfs_rootfs(opt_pivot, 
opt_extra_zfs_pools);
+                    if (error) {
+                        debug("Could not mount zfs root filesystem (while "
+                              "auto-discovering).\n");
+                        // Continue with ramfs (already mounted)
+                        // TODO: Avoid the hack of using pivot_rootfs() just 
for
+                        // mounting the fstab entries.
+                        pivot_rootfs("/");
+                    } else {
+                        bsd_shrinker_init();
+                        boot_time.event("ZFS mounted");
+                    }
                 } else {
-                    bsd_shrinker_init();
-                    boot_time.event("ZFS mounted");
+                    debug("Could not load and/or initialize %s.\n", 
libsolaris_file_name);
                 }
             }
         }
diff --git a/usr.manifest.skel b/usr.manifest.skel
index 3c072d01..1f963f65 100644
--- a/usr.manifest.skel
+++ b/usr.manifest.skel
@@ -5,6 +5,7 @@
 /libzfs.so: libzfs.so
 /libuutil.so: libuutil.so
 /zfs.so: zfs.so
+/libsolaris.so: libsolaris.so
 /tools/mkfs.so: tools/mkfs/mkfs.so
 /tools/cpiod.so: tools/cpiod/cpiod.so
 /tools/mount-fs.so: tools/mount/mount-fs.so
-- 
2.31.1

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/20211215223925.1399624-2-jwkozaczuk%40gmail.com.

Reply via email to