I've just run into something (on 4.0.1) which looks to me as though write() buffer alignment matters. This sounds to me like a bug, but it appears to have something to do with the hardware, and I'd appreciate any thoughts on how I might best track it down.
I tried to run one of my tools (a disk verifier which, as I ran it, just writes distinctive data to a disk). I ran it as disk-check -m sd1d -w /dev/rsd1d which writes stuff to /dev/rsd1d. I got messages indicating write errors. This seemed weird enough I started investigating further. ktrace indicated that disk-check was doing 1019 1 disk-check CALL open(0xbfbfeb8a,1,0) 1019 1 disk-check NAMI "/dev/rsd1d" 1019 1 disk-check RET open 3 1019 1 disk-check CALL __fstat30(3,0xbfbfe998) 1019 1 disk-check RET __fstat30 0 1019 1 disk-check CALL ioctl(3,DIOCGDINFO,0xbfbfe804) 1019 1 disk-check GIO fd 3 read 404 bytes [actual data snipped for brevity] 1019 1 disk-check RET ioctl 0 1019 1 disk-check CALL __sigaction_sigtramp(SIGINFO,0xbfbfe970,0,0xbbb730c4,2) 1019 1 disk-check RET __sigaction_sigtramp 0 1019 1 disk-check CALL lseek(3,0,0,0,0) 1019 1 disk-check RET lseek 0 1019 1 disk-check CALL write(3,0x804baa0,0x400000) 1019 1 disk-check GIO fd 3 wrote 0 bytes "" 1019 1 disk-check RET write 0 which of course isn't right. I tried with dd, which worked; the only difference I could see in kdump output that looked even barely possibly significant was that dd's buffer was aligned to a multiple of 4K. So I taught disk-check to align its write buffer and the weird write behaviour vanished. Sometimes, depending on I'm not sure what, the write writes 64K of data (but then an attempt to write 4128768 bytes from a buffer address 64K further advanced usually returns 0 immediately). It does not appear to be related to any of my kernel hacks; at least, when I boot the stock distribution GENERIC kernel, I see the same (mis)behaviour from my test program (below) as under my kernel. It does, however, appear to have something to do with the hardware (personally, I suspect the disk driver); if I try it on another 4.0.1 machine on vnd0d backed by an ordinary file, it doesn't misbehave, and if I try it on a real disk partition on that machine (but a non-RAW_PART partition on a wd drive, rather than sd1d), it doesn't misbehave either. I wrote a small test program. I'd be interested to hear if anyone else can get it to misbehave, or if anyone has any suggestions for what could possibly be behind this. The hardware on which this misbehaves is an HP NetServer LP 1000r; the disk is <HP 73.4G, ATLAS10K3_73_SCA, HP11> in an external enclosure, on the built-in SCSI interface. Full dmesg is below. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B Here's that test program, and the results I see: # cat z.c #include <stdio.h> #include <errno.h> #include <unistd.h> #include <strings.h> #define MISALIGN 2321 #define BUFSIZE (1<<22) #define ALIGN 4096 static unsigned char dbuf[BUFSIZE+ALIGN+MISALIGN]; int main(void); int main(void) { unsigned char *dp_align; unsigned char *dp_misalign; int i; for (i=0;i<sizeof(dbuf);i++) dbuf[i] = i & 0xff; dp_align = &dbuf[0]; if ((unsigned long int)dp_align % ALIGN) { dp_align += ALIGN - ((unsigned long int)dp_align % ALIGN); } dp_misalign = dp_align + MISALIGN; lseek(1,0,SEEK_SET); i = write(1,dp_align,BUFSIZE); if (i < 0) { fprintf(stderr,"aligned write: error %d (%s)\n",errno,strerror(errno)); } else { fprintf(stderr,"aligned write: %d\n",i); } lseek(1,0,SEEK_SET); i = write(1,dp_misalign,BUFSIZE); if (i < 0) { fprintf(stderr,"misaligned write: error %d (%s)\n",errno,strerror(errno)); } else { fprintf(stderr,"misaligned write: %d\n",i); } return(0); } # cc -o z z.c # ./z > /dev/rsd1d aligned write: 4194304 misaligned write: 65535 # And the dmesg I promised (this is from my kernel, but, as I said, the stock 4.0.1 GENERIC on this same hardware misbehaves the same way, though I need to boot -c and "disable acpi" to get it to boot at all): Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 The NetBSD Foundation, Inc. All rights reserved. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. NetBSD 4.0.1 (NETSERVER) #0: Mon Jun 20 03:50:52 EDT 2011 mouse@:/home/mouse/kbuild/NETSERVER total memory = 1023 MB rbus: rbus_min_start set to 0x40000000 avail memory = 995 MB timecounter: Timecounters tick every 10.000 msec timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100 BIOS32 rev. 0 found at 0xfd8e2 mainbus0 (root) mainbus0: Intel MP Specification (Version 1.4) (HP LP 1Kr/2Kr ) cpu0 at mainbus0: apid 3 (boot processor) cpu0: Intel Pentium III (686-class), 1266.79 MHz, id 0x6b1 cpu0: features 383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR> cpu0: features 383fbff<PGE,MCA,CMOV,PAT,PSE36,MMX> cpu0: features 383fbff<FXSR,SSE> cpu0: "Intel(R) Pentium(R) III CPU family 1266MHz" cpu0: I-cache 16 KB 32B/line 4-way, D-cache 16 KB 32B/line 4-way cpu0: L2 cache 512 KB 32B/line 8-way cpu0: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative cpu0: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way cpu0: calibrating local timer cpu0: apic clock running at 133 MHz cpu0: 16 page colors cpu1 at mainbus0: apid 0 (application processor) cpu1: starting cpu1: Intel Pentium III (686-class), 1266.71 MHz, id 0x6b1 cpu1: features 383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR> cpu1: features 383fbff<PGE,MCA,CMOV,PAT,PSE36,MMX> cpu1: features 383fbff<FXSR,SSE> cpu1: "Intel(R) Pentium(R) III CPU family 1266MHz" cpu1: I-cache 16 KB 32B/line 4-way, D-cache 16 KB 32B/line 4-way cpu1: L2 cache 512 KB 32B/line 8-way cpu1: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative cpu1: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way mpbios: bus 0 is type PCI mpbios: bus 1 is type PCI mpbios: bus 2 is type ISA ioapic0 at mainbus0 apid 1 (I/O APIC) ioapic0: pa 0xfec00000, version 11, 16 pins ioapic1 at mainbus0 apid 2 (I/O APIC) ioapic1: pa 0xfec01000, version 11, 16 pins mpbios: WARNING: pin 17 for apic 2 too high; assuming ACPI global int value pci0 at mainbus0 bus 0: configuration mode 1 pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok pchb0 at pci0 dev 0 function 0 pchb0: ServerWorks CNB20-HE PCI/AGP bridge (rev. 0x06) pchb1 at pci0 dev 0 function 1 pchb1: ServerWorks CNB20-HE PCI/AGP bridge (rev. 0x06) pci1 at pchb1 bus 1 pci1: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok esiop0 at pci1 dev 5 function 0: Symbios Logic 53c1010-33 (ultra3-wide scsi) esiop0: using on-board RAM esiop0: interrupting at ioapic1 pin 8 (irq 5) scsibus0 at esiop0: 16 targets, 8 luns per target esiop1 at pci1 dev 5 function 1: Symbios Logic 53c1010-33 (ultra3-wide scsi) esiop1: using on-board RAM esiop1: interrupting at ioapic1 pin 9 (irq 12) scsibus1 at esiop1: 16 targets, 8 luns per target fxp0 at pci0 dev 2 function 0: i82559 Ethernet, rev 8 fxp0: interrupting at ioapic1 pin 6 (irq 11) fxp0: Ethernet address 00:30:6e:11:cc:e7 inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto vga1 at pci0 dev 7 function 0: ATI Technologies Rage XL (rev. 0x27) wsdisplay0 at vga1 kbdmux 1: console (80x25, mterm emulation) wsmux1: connecting to wsdisplay0 fxp1 at pci0 dev 8 function 0: i82559 Ethernet, rev 8 fxp1: interrupting at ioapic1 pin 7 (irq 11) fxp1: Ethernet address 00:30:6e:11:cc:e8 inphy1 at fxp1 phy 1: i82555 10/100 media interface, rev. 4 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pcib0 at pci0 dev 15 function 0 pcib0: ServerWorks OSB4 southbridge (rev. 0x50) rccide0 at pci0 dev 15 function 1 rccide0: ServerWorks OSB4 IDE Controller (rev. 0x00) rccide0: bus-master DMA support present rccide0: primary channel configured to compatibility mode rccide0: primary channel interrupting at ioapic0 pin 14 (irq 14) atabus0 at rccide0 channel 0 rccide0: secondary channel configured to compatibility mode rccide0: secondary channel interrupting at ioapic0 pin 15 (irq 15) atabus1 at rccide0 channel 1 ohci0 at pci0 dev 15 function 2: ServerWorks OSB4/CSB5 USB Host Controller (rev. 0x04) ohci0: interrupting at ioapic1 pin 17 (irq 10) ohci0: OHCI version 1.0, legacy support usb0 at ohci0: USB revision 1.0 uhub0 at usb0 uhub0: ServerWorks OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered isa0 at pcib0 com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo pckbc0 at isa0 port 0x60-0x64 pckbd0 at pckbc0 (kbd slot) pckbc0: using irq 1 for kbd slot wskbd0 at pckbd0: console keyboard, using wsdisplay0 attimer0 at isa0 port 0x40-0x43: AT Timer pcppi0 at isa0 port 0x61 pcppi0: children must have an explicit unit midi0 at pcppi0: PC speaker (CPU-intensive output) sysbeep0 at pcppi0 isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support npx0 at isa0 port 0xf0-0xff npx0: reported by CPUID; using exception 16 fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2 pcppi0: attached to attimer0 isapnp0: no ISA Plug 'n Play devices found ioapic0: enabling ioapic1: enabling timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0 fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec Kernelized RAIDframe activated scsibus0: waiting 2 seconds for devices to settle... scsibus1: waiting 2 seconds for devices to settle... sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST39102LCSUN9.0G, 6329> disk fixed sd0: 8637 MB, 4926 cyl, 27 head, 133 sec, 512 bytes/sect x 17689267 sectors sd0: sync (25.00ns offset 15), 16-bit (80.000MB/s) transfers, tagged queueing ses0 at scsibus0 target 11 lun 0: <SDR, GEM318, 0> processor fixed ses0: SAF-TE Compliant Device ses0: async, 8-bit transfers sd1 at scsibus1 target 0 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed sd1: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd1: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd2 at scsibus1 target 1 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed sd2: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd2: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd3 at scsibus1 target 2 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed sd3: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd3: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd4 at scsibus1 target 3 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed sd4: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd4: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd5 at scsibus1 target 4 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed sd5: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd5: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd6 at scsibus1 target 5 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed sd6: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd6: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd7 at scsibus1 target 6 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed sd7: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd7: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd8 at scsibus1 target 8 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed sd8: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd8: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd9 at scsibus1 target 9 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed sd9: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd9: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd10 at scsibus1 target 10 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed sd10: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd10: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd11 at scsibus1 target 11 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed sd11: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd11: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd12 at scsibus1 target 12 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed sd12: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd12: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd13 at scsibus1 target 13 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP05> disk fixed sd13: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd13: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing sd14 at scsibus1 target 14 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP05> disk fixed sd14: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors sd14: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing ses1 at scsibus1 target 15 lun 0: <HP, A6491A, HP16> enclosure services fixed ses1: SCSI-3 SES Device ses1: async, 16-bit transfers atapibus0 at atabus0: 2 targets cd0 at atapibus0 drive 0: <CD-224E, , 1.5A> cdrom removable cd0: 32-bit data port cd0: drive supports PIO mode 4, DMA mode 2 cd0(rccide0:0:0): using PIO mode 4, DMA mode 2 (using DMA) RF autoconfig: raw component list: . RF autoconfig: sets: . RF autoconfig: parts-missing list: . RF autoconfig: dropping unconfigurable list: . boot device: sd0 root on sd0a dumps on sd0b root file system type: ffs cpu1: CPU 0 running wsdisplay0: screen 1 added (80x25, mterm emulation) wsdisplay0: screen 2 added (80x25, mterm emulation) wsdisplay0: screen 3 added (80x25, mterm emulation) wsdisplay0: screen 4 added (80x25, mterm emulation) wsdisplay0: screen 5 added (80x25, mterm emulation) wsdisplay0: screen 6 added (80x25, vt100 emulation) wsdisplay0: screen 7 added (80x25, sun emulation)