After I changed the recordsize to 8k, seems the read/write size is not
always 8k when using zpool iostat to check. So ZFS doesn't obey the
recordsize strictly?
UC4-zuc4arch$> zfs get recordsize
NAME PROPERTY
VALUE SOURCE
phximddb03data/zuc4arch/data01 recordsize
8K local
phximddb03data/zuc4arch/data02 recordsize
8K local
UC4-zuc4arch$> zpool iostat phximddb03data 1
capacity operations bandwidth
pool used avail read write read write
-------------- ----- ----- ----- ----- ----- -----
phximddb03data 487G 903G 13 62 1.26M 2.98M
phximddb03data 487G 903G 518 1 4.05M
23.8K ===> here a write is of size 24k
phximddb03data 487G 903G 456 37 3.58M 111K
phximddb03data 487G 903G 551 0 4.34M 11.9K
phximddb03data 487G 903G 496 8 3.86M 239K
phximddb03data 487G 903G 472 229 3.68M 982K
phximddb03data 487G 903G 499 3 3.91M 3.96K
phximddb03data 487G 903G 525 138 4.12M 631K
phximddb03data 487G 903G 497 0 3.89M 0
phximddb03data 487G 903G 562 0 4.38M 0
phximddb03data 487G 903G 337 3 2.63M 47.5K
phximddb03data 487G 903G 140 35 4.55M
4.23M ===> here a write is of size 128k.
phximddb03data 487G 903G 484 272 7.12M 5.44M
phximddb03data 487G 903G 562 0 4.49M 127K
phximddb03data 487G 903G 514 4 4.03M 301K
phximddb03data 487G 903G 505 27 3.99M 1.00M
phximddb03data 487G 903G 518 14 4.10M 692K
phximddb03data 487G 903G 518 1 4.11M 14.4K
phximddb03data 487G 903G 504 2 3.98M 151K
phximddb03data 487G 903G 531 3 4.17M 392K
phximddb03data 487G 903G 375 2 2.95M 380K
phximddb03data 487G 903G 304 5 2.40M 296K
phximddb03data 487G 903G 438 3 3.45M 277K
phximddb03data 487G 903G 376 0 3.00M 0
phximddb03data 487G 903G 239 15 2.84M 1.98M
phximddb03data 487G 903G 221 857 4.51M
16.8M ==> here a read is of size 20k.
On Thu, Dec 25, 2008 at 12:25 PM, Neil Perrin <[email protected]> wrote:
> The default recordsize is 128K. So you are correct, for random reads
> performance will be bad as excess data is read. For Oracle it is
> recommended
> to set the recordsize to 8k. This can be done when creating the filesystem
> using 'zfs create -o recordsize=8k <fs>'. If the fs has already been
> created then you
> can use 'zfs set recordsize=8k <fs>' *however* this only takes effect for
> new files
> so existing databases will retain the old block size.
>
> Hope this helps:
>
> Neil.
>
>
> qihua wu wrote:
>
>> Hi, All,
>>
>> We have an oracle standby running on zfs and the database recovers very
>> very slow. The problem is the IO performance is very bad. I find the
>> recordsize of the ZFS is 128K, and the oracle block size is 8K. My
>>
>> My question is:
>> When oracle tries to write a 8k block, will zfs read in 128K and then
>> write 128K. If that's the case, then zfs will do 16 (128k/8k=16 )times IO
>> as necessary.
>>
>> extended device statistics
>> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
>>
>> 0.0 0.2 0.0 1.6 0.0 0.0 6.0 7.7 0 0 md4
>>
>> 0.0 0.2 0.0 1.6 0.0 0.0 0.0 7.4 0 0 md14
>>
>> 0.0 0.2 0.0 1.6 0.0 0.0 0.0 7.6 0 0 md24
>>
>> 0.0 0.4 0.0 1.7 0.0 0.0 0.0 6.7 0 0 sd0
>>
>> 0.0 0.4 0.0 1.7 0.0 0.0 0.0 6.5 0 0 sd2
>>
>> 0.0 1.4 0.0 105.2 0.0 4.9 0.0 3503.3 0 100 ssd97
>>
>> 0.0 3.0 0.0 384.0 0.0 10.0 0.0 3332.9 0 100 ssd99
>>
>> 0.0 2.6 0.0 332.8 0.0 10.0 0.0 3845.7 0 100 ssd101
>>
>> 0.0 4.4 0.0 563.3 0.0 10.0 0.0 2272.4 0 100 ssd103
>>
>> 0.0 3.4 0.0 435.2 0.0 10.0 0.0 2940.8 0 100 ssd105
>>
>> 0.0 3.6 0.0 460.8 0.0 10.0 0.0 2777.4 0 100 ssd107
>>
>> 0.0 0.2 0.0 25.6 0.0 0.0 0.0 72.8 0 1 ssd112
>>
>>
>>
>>
>> UC4-zuc4arch$> zfs list -o recordsize
>> RECSIZE
>> 128K
>> 128K
>> 128K
>> 128K
>> 128K
>> 128K
>> 128K
>> 128K
>> 128K
>>
>> Thanks,
>> Daniel,
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> [email protected]
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>>
>
>
_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss