Re: [zfs-discuss] Couple questions about ZFS writes and fragmentation

2009-11-09 Thread Ilya
Wow, this forum is great and uber-fast in response, appreciate the responses, 
makes sense.

Only, what does ZFS do to write to data? Let's say that you want to write x 
blocks somewhere, is ZFS  going to find a pointer to the space map of some 
metaslab and then write there? Is it going to find a metaslab closest to the 
outside of the HDD for higher bandwidth?

And the label thing, heh, I made a mistake in what I read, you are right. 
Within the vdev array though, after the storage pool location though, it also 
showed more vdev labels coming after it (vdev 1, vdev 2, boot block, storage 
space, vdev 3, vdev4). Would there more vdev labels after #4 or more storage 
space?

Thanks again
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Couple questions about ZFS writes and fragmentation

2009-11-09 Thread Ilya
1. Is it true that because block sizes vary (in powers of 2 of course) on each 
write that there will be very little internal fragmentation?

2. I came upon this statement in a forum post:

[i]"ZFS uses 128K data blocks by default whereas other filesystems typically 
use 4K or 8K blocks. This naturally reduces the potential for fragmentation by 
32X over 4k blocks."[/i]

How is this true? I mean, if you have a 128k default block size and you store a 
4k file within that block then you will have a ton of slack space to clear up.

3. Another statement from a post:

[i]"the seek time for single-user contiguous access is essentially zero since 
the seeks occur while the application is already busy processing other data. 
When mirror vdevs are used, any device in the mirror may be used to read the 
data."[/i]

All this is saying that is when you are reading off of one physical device you 
will already be seeking for the blocks that you need from the other device so 
the seek time will no longer be an issue right?

4. In terms of where ZFS chooses to write data, is it always going to pick one 
metaslab and write to only free blocks within that metaslab? Or will it go all 
over the place?

5. When ZFS looks for a place to write data, does it look somewhere to 
intelligently see that there are some number of free blocks available within 
this particular metaslab and if so where is this located?

6. Could anyone clarify this post:

[i]"ZFS uses a copy-on-write model. Copy-on-write tends to cause fragmentation 
if portions of existing files are updated. If a large portion of a file is 
overwritten in a short period of time, the result should be reasonably 
fragment-free but if parts of the file are updated over a long period of time 
(like a database) then the file is certain to be fragmented. This is not such a 
big problem as it appears to be since such files were already typically 
accessed using random access."[/i]

7. An aside question...I was reading a paper about ZFS and it stated that 
offsets are something like 8 bytes from the first vdev label. Is there any 
reason why the storage pool is after 2 vdev labels?

Thanks guys
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Basic question about striping and ZFS

2009-11-05 Thread Ilya
So then of what use is the parity? 

And how is the metadata used to reconstruct bad data? I understand obviously 
what the metadata contains but I don't get how ZFS traverses through a file 
system and USES the metadata to construct bad blocks.

I understand that you write everything to separate blocks. My question was this:

If you have initially two stripes over two disks like this:

Disk 1:  (Stripe Unit 1)
Disk 2:  (Stripe Unit 2)

You then want to modify something in the first stripe unit with modifications 
which are smaller so now Disk 1 and Disk 2 stripes look like this:

Disk 1: XXYY (the y's indicate modified bits or bytes or whatever)
Disk 2: 

So now, with a full-stripe write, you then make new blocks for both stripes and 
just copy the data over to the new blocks. Now, tell me if I am write with what 
happens on a full-stripe write:

You read in Disk 1 and Disk 2 stripes in the file system cache. You then apply 
the modifications to the Disk 1 stripe within the cache. After this, you 
compute the parity within the cache and finally you write out both Disk 1 
Stripe and Disk 2 stripe to new blocks. Since the modifications to the disk 1 
stripe (the Ys) were smaller than the total stripe size, the new sector which 
will be written to will be of a smaller stripe size than the originals.

Is this correct?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Basic question about striping and ZFS

2009-11-05 Thread Ilya
Hey

Thanks for the slides but some things are still unclear.

Slide 18 shows variably sizes extents but doesn't explain the process of 
full-on write. What I'm looking for is one example. I still don't understand 
how it works with variable sized extents. So if you have 2 stripe units on one 
disk and 1 stripe unit for the parity and you modify half of the first stripe 
unit only, when you do a full-stripe write, what happens in terms of a 
full-stripe write?

I also didn't see a distinction between parity and metadata reconstruction. I 
still do not know the process behind the metadata reconstruction for bad data 
and when parity is used for bad data.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Basic question about striping and ZFS

2009-11-04 Thread Ilya
Forgot to add, are those four stripe units (for that one file) above considered 
the stripe itself? Or are each of those stripe units on the seperate disks 
considered as separate stripes?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Basic question about striping and ZFS

2009-11-04 Thread Ilya
Researching about ZFS and had a question leating to Raid-Z and the striping. 
So, I was glacing over Jeff's blog (http://blogs.sun.com/bonwick/entry/raid_z):

[i]"RAID-Z is a data/parity scheme like RAID-5, but it uses dynamic stripe 
width. Every block is its own RAID-Z stripe, regardless of blocksize. This 
means that every RAID-Z write is a full-stripe write. This, when combined with 
the copy-on-write transactional semantics of ZFS, completely eliminates the 
RAID write hole. RAID-Z is also faster than traditional RAID because it never 
has to do read-modify-write. "[/i]

So firstly, is this literally referring to the blocks of a file for example? 
Also by stripe, is this referring to the stripe UNITS (within a whole stripe) 
or the ENTIRE stripe across disks? 

So, let's say that you have a file of 64 kb per sector (stripe units consisting 
of blocks of whatever size totaling 64k) across four disks. 

Disk 0: Stripe 1
Disk 1: Stripe 2
Disk 2: Stripe 3
Disk 3: Parity

When Jeff's blog mentions that "every block has it's own stripe" what does he 
exactly mean in the context of this example? And let's say that I am 
modifying/write out bytes in the first stripe, how does this affect the other 
stripes/parity?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Motherboard for home zfs/solaris file server

2009-02-23 Thread Ilya Tatar

Hello,
I am building a home file server and am looking for an ATX mother board 
that will be supported well with OpenSolaris (onboard SATA controller, 
network, graphics if any, audio, etc). I decided to go for Intel based 
boards (socket LGA 775) since it seems like power management is better 
supported with Intel processors and power efficiency is an important 
factor. After reading several posts about ZFS it looks like I want ECC 
memory as well.


Does anyone have any recommendations?

Here are a few that I found. Any comments about those?

Supermicro C2SBX+
http://www.supermicro.com/products/motherboard/Core2Duo/X48/C2SBX+.cfm

Gigabyte GA-X48-DS4
gigabyte: 
http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2810


Intel S3200SHV
http://www.intel.com/Products/Server/Motherboards/Entry-S3200SH/Entry-S3200SH-overview.htm

Thanks for any help,
-Ilya



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss