Hi Matt, ZFS-team,

Problem
-------
libzpool.so, when calling pwrite(2), splits the write into two. This is
done to simulate partial disk writes. This has the side effect that the
writes are not block aligned. Hence when the underlying device is a raw
device, the write fails.

Note: ztest always runs on top of files and hence does not see this failure.

Solution
--------
Introduce a flag split_io, that when set, causes writes to be split (the
current behavior). This is not set by default and is turned on by ztest.

Patch built on top of build 55 is attached.

Could this patch be accepted into opensolaris?

Regards,
Manoj


Matthew Ahrens wrote:
> Manoj Joseph wrote:
>> Unlike what I had assumed earlier, zio_t that is passed to 
>> vdev_file_io_start() has aligned offset and size.
>>
>> The libzpool library, when writing data to the devices below a zpool, 
>> splits the write into two. This is done for the sake of testing. The 
>> comment in the routine, vn_rdwr() says this:
>> /*
>>  * To simulate partial disk writes, we split writes into two
>>  * system calls so that the process can be killed in between.
>>  */
>>
>> This has the effect of creating misaligned writes to raw devices which 
>> fail with errno=EINVAL.
> 
> Cool, glad you were able to figure it out!
> 
> --matt


diff -r 77d8e3c86357 usr/src/cmd/ztest/ztest.c
--- a/usr/src/cmd/ztest/ztest.c	Mon Dec 11 17:17:14 2006 -0800
+++ b/usr/src/cmd/ztest/ztest.c	Fri Aug 17 10:31:21 2007 -0600
@@ -3228,6 +3228,9 @@ main(int argc, char **argv)
 	/* Override location of zpool.cache */
 	spa_config_dir = "/tmp";
 
+	/* Split writes to simulate partial writes */
+	split_io = B_TRUE;
+
 	ztest_random_fd = open("/dev/urandom", O_RDONLY);
 
 	process_options(argc, argv);
diff -r 77d8e3c86357 usr/src/lib/libzpool/common/kernel.c
--- a/usr/src/lib/libzpool/common/kernel.c	Mon Dec 11 17:17:14 2006 -0800
+++ b/usr/src/lib/libzpool/common/kernel.c	Fri Aug 17 10:31:21 2007 -0600
@@ -36,6 +36,7 @@
 #include <sys/spa.h>
 #include <sys/processor.h>
 
+int split_io = B_FALSE;
 
 /*
  * Emulation of kernel services in userland.
@@ -373,14 +374,19 @@ vn_rdwr(int uio, vnode_t *vp, void *addr
 	if (uio == UIO_READ) {
 		iolen = pread64(vp->v_fd, addr, len, offset);
 	} else {
-		/*
-		 * To simulate partial disk writes, we split writes into two
-		 * system calls so that the process can be killed in between.
-		 */
-		split = (len > 0 ? rand() % len : 0);
-		iolen = pwrite64(vp->v_fd, addr, split, offset);
-		iolen += pwrite64(vp->v_fd, (char *)addr + split,
-		    len - split, offset + split);
+		if (split_io) {
+			/*
+			 * To simulate partial disk writes, we split writes
+			 * into two system calls so that the process can be
+			 * killed in between.
+			 */
+			split = (len > 0 ? rand() % len : 0);
+			iolen = pwrite64(vp->v_fd, addr, split, offset);
+			iolen += pwrite64(vp->v_fd, (char *)addr + split,
+			    len - split, offset + split);
+		} else {
+			iolen = pwrite64(vp->v_fd, addr, len, offset);
+		}
 	}
 
 	if (iolen == -1)
diff -r 77d8e3c86357 usr/src/lib/libzpool/common/sys/zfs_context.h
--- a/usr/src/lib/libzpool/common/sys/zfs_context.h	Mon Dec 11 17:17:14 2006 -0800
+++ b/usr/src/lib/libzpool/common/sys/zfs_context.h	Fri Aug 17 10:31:21 2007 -0600
@@ -341,6 +341,8 @@ typedef struct vattr {
 
 #define	VN_RELE(vp)	vn_close(vp)
 
+extern int split_io;
+
 extern int vn_open(char *path, int x1, int oflags, int mode, vnode_t **vpp,
     int x2, int x3);
 extern int vn_openat(char *path, int x1, int oflags, int mode, vnode_t **vpp,

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to