Re: [zfs-discuss] Couple questions about ZFS writes and fragmentation
Wow, this forum is great and uber-fast in response, appreciate the responses, makes sense. Only, what does ZFS do to write to data? Let's say that you want to write x blocks somewhere, is ZFS going to find a pointer to the space map of some metaslab and then write there? Is it going to find a metaslab closest to the outside of the HDD for higher bandwidth? And the label thing, heh, I made a mistake in what I read, you are right. Within the vdev array though, after the storage pool location though, it also showed more vdev labels coming after it (vdev 1, vdev 2, boot block, storage space, vdev 3, vdev4). Would there more vdev labels after #4 or more storage space? Thanks again -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Couple questions about ZFS writes and fragmentation
1. Is it true that because block sizes vary (in powers of 2 of course) on each write that there will be very little internal fragmentation? 2. I came upon this statement in a forum post: [i]"ZFS uses 128K data blocks by default whereas other filesystems typically use 4K or 8K blocks. This naturally reduces the potential for fragmentation by 32X over 4k blocks."[/i] How is this true? I mean, if you have a 128k default block size and you store a 4k file within that block then you will have a ton of slack space to clear up. 3. Another statement from a post: [i]"the seek time for single-user contiguous access is essentially zero since the seeks occur while the application is already busy processing other data. When mirror vdevs are used, any device in the mirror may be used to read the data."[/i] All this is saying that is when you are reading off of one physical device you will already be seeking for the blocks that you need from the other device so the seek time will no longer be an issue right? 4. In terms of where ZFS chooses to write data, is it always going to pick one metaslab and write to only free blocks within that metaslab? Or will it go all over the place? 5. When ZFS looks for a place to write data, does it look somewhere to intelligently see that there are some number of free blocks available within this particular metaslab and if so where is this located? 6. Could anyone clarify this post: [i]"ZFS uses a copy-on-write model. Copy-on-write tends to cause fragmentation if portions of existing files are updated. If a large portion of a file is overwritten in a short period of time, the result should be reasonably fragment-free but if parts of the file are updated over a long period of time (like a database) then the file is certain to be fragmented. This is not such a big problem as it appears to be since such files were already typically accessed using random access."[/i] 7. An aside question...I was reading a paper about ZFS and it stated that offsets are something like 8 bytes from the first vdev label. Is there any reason why the storage pool is after 2 vdev labels? Thanks guys -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Basic question about striping and ZFS
So then of what use is the parity? And how is the metadata used to reconstruct bad data? I understand obviously what the metadata contains but I don't get how ZFS traverses through a file system and USES the metadata to construct bad blocks. I understand that you write everything to separate blocks. My question was this: If you have initially two stripes over two disks like this: Disk 1: (Stripe Unit 1) Disk 2: (Stripe Unit 2) You then want to modify something in the first stripe unit with modifications which are smaller so now Disk 1 and Disk 2 stripes look like this: Disk 1: XXYY (the y's indicate modified bits or bytes or whatever) Disk 2: So now, with a full-stripe write, you then make new blocks for both stripes and just copy the data over to the new blocks. Now, tell me if I am write with what happens on a full-stripe write: You read in Disk 1 and Disk 2 stripes in the file system cache. You then apply the modifications to the Disk 1 stripe within the cache. After this, you compute the parity within the cache and finally you write out both Disk 1 Stripe and Disk 2 stripe to new blocks. Since the modifications to the disk 1 stripe (the Ys) were smaller than the total stripe size, the new sector which will be written to will be of a smaller stripe size than the originals. Is this correct? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Basic question about striping and ZFS
Hey Thanks for the slides but some things are still unclear. Slide 18 shows variably sizes extents but doesn't explain the process of full-on write. What I'm looking for is one example. I still don't understand how it works with variable sized extents. So if you have 2 stripe units on one disk and 1 stripe unit for the parity and you modify half of the first stripe unit only, when you do a full-stripe write, what happens in terms of a full-stripe write? I also didn't see a distinction between parity and metadata reconstruction. I still do not know the process behind the metadata reconstruction for bad data and when parity is used for bad data. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Basic question about striping and ZFS
Forgot to add, are those four stripe units (for that one file) above considered the stripe itself? Or are each of those stripe units on the seperate disks considered as separate stripes? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Basic question about striping and ZFS
Researching about ZFS and had a question leating to Raid-Z and the striping. So, I was glacing over Jeff's blog (http://blogs.sun.com/bonwick/entry/raid_z): [i]"RAID-Z is a data/parity scheme like RAID-5, but it uses dynamic stripe width. Every block is its own RAID-Z stripe, regardless of blocksize. This means that every RAID-Z write is a full-stripe write. This, when combined with the copy-on-write transactional semantics of ZFS, completely eliminates the RAID write hole. RAID-Z is also faster than traditional RAID because it never has to do read-modify-write. "[/i] So firstly, is this literally referring to the blocks of a file for example? Also by stripe, is this referring to the stripe UNITS (within a whole stripe) or the ENTIRE stripe across disks? So, let's say that you have a file of 64 kb per sector (stripe units consisting of blocks of whatever size totaling 64k) across four disks. Disk 0: Stripe 1 Disk 1: Stripe 2 Disk 2: Stripe 3 Disk 3: Parity When Jeff's blog mentions that "every block has it's own stripe" what does he exactly mean in the context of this example? And let's say that I am modifying/write out bytes in the first stripe, how does this affect the other stripes/parity? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Motherboard for home zfs/solaris file server
Hello, I am building a home file server and am looking for an ATX mother board that will be supported well with OpenSolaris (onboard SATA controller, network, graphics if any, audio, etc). I decided to go for Intel based boards (socket LGA 775) since it seems like power management is better supported with Intel processors and power efficiency is an important factor. After reading several posts about ZFS it looks like I want ECC memory as well. Does anyone have any recommendations? Here are a few that I found. Any comments about those? Supermicro C2SBX+ http://www.supermicro.com/products/motherboard/Core2Duo/X48/C2SBX+.cfm Gigabyte GA-X48-DS4 gigabyte: http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2810 Intel S3200SHV http://www.intel.com/Products/Server/Motherboards/Entry-S3200SH/Entry-S3200SH-overview.htm Thanks for any help, -Ilya ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss