We are currently experiencing a very huge perfomance drop on our zfs storage server.
We have 2 pools, pool 1 stor is a raidz out of 7 iscsi nodes, home is a local mirror pool. Recently we had some issues with one of the storagenodes, because of that the pool was degraded. Since we did not succeed in bringing this storagenode back online (on zfs level) we upgraded our nashead from opensolaris b57 to b77. After upgrade we succesfully resilvered the pool (resilver took 1 week! -> 14 TB). Finally we upgraded the pool to version 9 (comming from version 3). Now zpool is healty again, but perfomance realy s*cks. Accessing older data takes way to much time. Doing "dtruss -a find ." in a zfs filesystem on this b77 server is extremely slow, while it is fast in our backup location were we are still using opensolaris b57 and zpool version 3. Writing new data seems normal, we don't see huge issues here. The real problem is do ls, rm or find in filesystems with lots of files (+50000, not in 1 directory spread in multiple subfolders) Today I found that not only zpool upgrade exist, but also zfs upgrade, most filesystems are still version 1 while some new are already version 3. Running zdb we also saw there is a mismatchs in version information, our storage pool is list as version 3 while the uberblock is at version 9, when we run zpool upgrade, it tells us all pools are upgraded to latest version. below the zdb output: zdb stor version=3 name='stor' state=0 txg=6559447 pool_guid=14464037545511218493 hostid=341941495 hostname='fileserver011' vdev_tree type='root' id=0 guid=14464037545511218493 children[0] type='raidz' id=0 guid=179558698360846845 nparity=1 metaslab_array=13 metaslab_shift=37 ashift=9 asize=20914156863488 is_log=0 children[0] type='disk' id=0 guid=640233961847538260 path='/dev/dsk/c2t3d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/iscsi/[EMAIL PROTECTED],0:a' whole_disk=1 DTL=36 children[1] type='disk' id=1 guid=7833573669820754721 path='/dev/dsk/c2t4d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/iscsi/[EMAIL PROTECTED],0:a' whole_disk=1 DTL=22 children[2] type='disk' id=2 guid=13685988517147825972 path='/dev/dsk/c2t5d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/iscsi/[EMAIL PROTECTED],0:a' whole_disk=1 DTL=17 children[3] type='disk' id=3 guid=13514021245008793227 path='/dev/dsk/c2t6d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/iscsi/[EMAIL PROTECTED],0:a' whole_disk=1 DTL=21 children[4] type='disk' id=4 guid=15871506866153751690 path='/dev/dsk/c2t9d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/iscsi/[EMAIL PROTECTED],0:a' whole_disk=1 DTL=20 children[5] type='disk' id=5 guid=11392907262189654902 path='/dev/dsk/c2t7d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/iscsi/[EMAIL PROTECTED],0:a' whole_disk=1 DTL=19 children[6] type='disk' id=6 guid=8472117762643335828 path='/dev/dsk/c2t8d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/iscsi/[EMAIL PROTECTED],0:a' whole_disk=1 DTL=18 Uberblock magic = 0000000000bab10c version = 9 txg = 6692849 guid_sum = 12266969233845513474 timestamp = 1197546530 UTC = Thu Dec 13 12:48:50 2007 fileserver If we compare with zpool home (this pool was craeted after installing b77): zdb home version=9 name='home' state=0 txg=4 pool_guid=11064283759455309967 hostid=341941495 hostname='fileserver011' vdev_tree type='root' id=0 guid=11064283759455309967 children[0] type='mirror' id=0 guid=12887358012104249684 metaslab_array=14 metaslab_shift=31 ashift=9 asize=243784220672 is_log=0 children[0] type='disk' id=0 guid=11054487171079770402 path='/dev/dsk/c1t0d0s7' devid='id1,[EMAIL PROTECTED]/h' phys_path='/[EMAIL PROTECTED],0/pci10f1,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:h' whole_disk=0 children[1] type='disk' id=1 guid=5037155585995287391 path='/dev/dsk/c1t1d0s7' devid='id1,[EMAIL PROTECTED]/h' phys_path='/[EMAIL PROTECTED],0/pci10f1,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:h' whole_disk=0 Uberblock magic = 0000000000bab10c version = 9 txg = 239823 guid_sum = 3149796381215514212 timestamp = 1197541912 UTC = Thu Dec 13 11:31:52 2007 Dataset mos [META], ID 0, cr_txg 4, 99.0K, 18 objects Dataset home [ZPL], ID 5, cr_txg 4, 18.0K, 4 objects Traversing all blocks to verify checksums and verify nothing leaked ... No leaks (block sum matches space maps exactly) bp count: 30 bp logical: 358912 avg: 11963 bp physical: 44544 avg: 1484 compression: 8.06 bp allocated: 124416 avg: 4147 compression: 2.88 SPA allocated: 124416 used: 0.00% capacity operations bandwidth ---- errors ---- description used avail read write read write read write cksum home 122K 226G 63 0 81.5K 0 0 0 0 mirror 122K 226G 63 0 81.5K 0 0 0 0 /dev/dsk/c1t0d0s7 576 0 706K 0 0 0 0 /dev/dsk/c1t1d0s7 561 0 679K 0 0 0 0 dtruss output ID/LWP RELATIVE ELAPSD CPU SYSCALL(args) = return 23762/1: 20215 613147 103 getdents64(0x4, 0xFEDC0000, 0x2000) = 80 0 ./blabla/blabla as you can see getdents64 takes 613147 miliseconds, while it takes only 10 ms on our failover location. Any idea what is happening to me? Thanks in advance for your reply! Krdoor This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss