Re: [zfs-discuss] ZFS Performance as a function of Disk Slice

2007-07-06 Thread Darren Dunham
> [...] ZFS gives me the ability to snapshot to archive (I assume it
> works across pools?).

No.  Snapshots are only within a pool.  Pools are independent storage
arenas.  

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
 < This line left intentionally blank to confuse you. >
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Performance as a function of Disk Slice

2007-07-06 Thread Scott Lovenberg
First Post!
Sorry, I had to get that out of the way to break the ice...

I was wondering if it makes sense to zone ZFS pools by disk slice, and if it 
makes a difference with RAIDZ.  As I'm sure we're all aware, the end of a drive 
is half as fast as the beginning ([i]where the zoning stipulates that the 
physical outside is the beginning and going towards the spindle increases hex 
value[/i]).

I usually short stroke my drives so that the variable files on the operating 
system drive are at the beginning, page in center (so if I'm already in 
thrashing I'm at most 1/2 a platters width from page), and static files are 
towards the end.  So, applying this methodology to ZFS, I partition a drive 
into 4 equal-sized quarters, and do this to 4 drives (each on a separate SATA 
channel), and then create 4 pools which hold each 'ring' of the drives.  Will I 
then have 4 RAIDZ pools, which I can mount according to speed needs?  For 
instance, I always put (in Linux... I'm new to Solaris) '/export/archive' all 
the way on the slow tracks since I don't read or write to it often and it is 
almost never accessed at the same time as anything else that would force long 
strokes.

Ideally, I'd like to do a straight ZFS on the archive track.  I move data to 
archive in chunks, 4 gigs at a time - when I roll it in I burn 2 DVDs, 1 gets 
cataloged locally and the other offsite, so if I lose the data, I don't care - 
but, ZFS gives me the ability to snapshot to archive (I assume it works across 
pools?).  Then stripe 1 ring  (I guess this is ZFS native?), /usr/local (or its 
Solaris equivalent) for performance.  Then mirror the root slice.  Finally, 
/export would be RAIDZ or RAIDZ2 on the fastest track, holding my source code, 
large files, and things I want to stream over the LAN.

Does this make sense with ZFS?  Is the spindle count more of a factor than 
stroke latency?  Does ZFS balance these things out on its own via random 
scattering?

Reading back over this post, I've found it sounds like the ramblings of a 
madman.  I guess I know what I want to say, but I'm not sure the right 
questions to ask.  I think I'm saying:  Will my proposed setup afford me the 
flexibility to zone for performance since I have a more intimate knowledge of 
the data going onto the drive, or will brute force by spindle count (I'm 
planning 4-6 drives - single drive to  a bus) and random placement be 
sufficient if I just add the whole drive to a single pool?

I thank you all for your time and patience as I stumble through this, and I 
welcome any point of view or insights (especially those from experience!) that 
might help me decide how to configure my storage server.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS raid is very slow???

2007-07-06 Thread Jeff Bonwick
A couple of questions for you:

(1) What OS are you running (Solaris, BSD, MacOS X, etc)?

(2) What's your config?  In particular, are any of the partitions
on the same disk?

(3) Are you copying a few big files or lots of small ones?

(4) Have you measured UFS-to-UFS and ZFS-to-ZFS performance on the
same platform?  That'd be useful data...

Jeff

On Fri, Jul 06, 2007 at 03:49:43PM -0400, Will Murnane wrote:
> On 7/6/07, Orvar Korvar <[EMAIL PROTECTED]> wrote:
> > have set up a ZFS raidz with 4 samsung 500GB hard drives.
> >
> > It is extremely slow when I mount a ntfs partition and copy everything to 
> > zfs. Its
> > like 100kb/sec or less. Why is that?
> How are you mounting said NTFS partition?
> 
> > When I copy from ZFSpool to UFS, I get like 40MB/sec - isnt it very low
> > considering I have 4 new 500GB discs in raid? And when I copy from UFS to 
> > ZPool
> > I get like 20MB/sec. Strange? Or normal results? Should I expect better
> > performance? As of now, I am disappointed of ZFS.
> How fast is copying a file from ZFS to /dev/null?  That would
> eliminate the UFS disk from the mix.
> 
> Will
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS raid is very slow???

2007-07-06 Thread Will Murnane
On 7/6/07, Orvar Korvar <[EMAIL PROTECTED]> wrote:
> have set up a ZFS raidz with 4 samsung 500GB hard drives.
>
> It is extremely slow when I mount a ntfs partition and copy everything to 
> zfs. Its
> like 100kb/sec or less. Why is that?
How are you mounting said NTFS partition?

> When I copy from ZFSpool to UFS, I get like 40MB/sec - isnt it very low
> considering I have 4 new 500GB discs in raid? And when I copy from UFS to 
> ZPool
> I get like 20MB/sec. Strange? Or normal results? Should I expect better
> performance? As of now, I am disappointed of ZFS.
How fast is copying a file from ZFS to /dev/null?  That would
eliminate the UFS disk from the mix.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS raid is very slow???

2007-07-06 Thread Orvar Korvar
have set up a ZFS raidz with 4 samsung 500GB hard drives.

It is extremely slow when I mount a ntfs partition and copy everything to zfs. 
Its like 100kb/sec or less. Why is that?

When I copy from ZFSpool to UFS, I get like 40MB/sec - isnt it very low 
considering I have 4 new 500GB discs in raid? And when I copy from UFS to ZPool 
I get like 20MB/sec. Strange? Or normal results? Should I expect better 
performance? As of now, I am disappointed of ZFS.




I used this card:
http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm
Express Community build 67 detected it automatically and everything worked. I 
inserted the card into a PCI slot, and it worked too.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance and memory consumption

2007-07-06 Thread johansen-osdev
> But now I have another question.
> How 8k blocks will impact on performance ?

When tuning recordsize for things like databases, we try to recommend
that the customer's recordsize match the I/O size of the database
record.

I don't think that's the case in your situation.  ZFS is clever enough
that changes to recordsize only affect new blocks written to the
filesystem.  If you're seeing metaslab fragmentation problems now,
changing your recordsize to 8k is likely to increase your performance.
This is because you're out of 128k metaslabs, so using a smaller size
lets you make better use of the remaining space.  This also means you
won't have to iterate through all of the used 128k metaslabs looking for
a free one.

If you're asking, "How does setting the recordsize to 8k affect
performance when I'm not encountering fragmentation," I would guess
that there would be some reduction.  However, you can adjust the
recordsize once you encounter this problem with the default size.

-j

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance and memory consumption

2007-07-06 Thread Victor Latushkin
Łukasz пишет:
> After few hours with dtrace and source code browsing I found that in my space 
> map there are no 128K blocks left. 

Actually you may have some 128k or more free space segments, but 
alignment requirements will not allow to allocate them. Consider the 
following example:

1. Space map starts at 0 at its size is 256KB.
2. There are two 512-byte blocks allocated from the space map - one at
the beginning, another one at the end, so space map contains exactly 1
space segment with start 512 and size 255k.

Let's allocate 128k block from such space map. avl_find() will return 
this space segment, then we will calculate offset for the segment we are 
going to allocate:

align = size & -size = 128k & -128k = 0x2 & 0xfffe = 
0x2 = 128k

offset = P2ROUNDUP(ss->start, align) = P2ROUNDUP(512, 128k) =
-(-(512) & -(128k)) = -(0xfe00 & 0xfffe) =
-(0xfffe) = -(-128k) = 128k

then we will check if offset + size is less than or equal to space 
segment end, which is not true in this case
offset + size = 128k + 128k = 256k > 255k = ss->ss_end.

So even though we have 255k free in contiguous space segment, we cannot 
allocate 128k block out of it due to alignment requirements.

What is the reason for such alignment requirements? I can see at least two:
a) reduce number of search locations for big blocks to reduce number of 
iterations in 'while' loop inside metaslab_ff_alloc()
b) since we are using cursors to keep location were the last allocated 
block ended (for each block size), this allows to ensure that 
allocations of smaller size will have a chance not to loop in the 
'while' loop inside metaslab_ff_alloc()

There may be other reasons also.

Bug 6495013 "Loops and recursion in metaslab_ff_alloc can kill 
performance, even on a pool with lots of free data" fixes rather nasty 
race condition issue, which aggravates performance of a 
metaslab_ff_alloc() on a fragmented pool:

http://src.opensolaris.org/source/diff/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c?r2=3848&r1=3713

But loops (and recursion) stay there.

> Try this on your ZFS. 
>   dtrace -n fbt::metaslab_group_alloc:return'/arg1 == -1/{}
> 
> If you will get probes, then you also have the same problem.
> Allocating from space map works like this:
> 1. metaslab_group_alloc want to allocate 128K block size
> 2. for (all metaslabs) {
>read space map and check 128K block size
>if no block then remove flag METASLAB_ACTIVE_MASK
> }
> 3. unload maps for all metaslabs without METASLAB_ACTIVE_MASK
> 
> Thats is why spa_sync take so much time.
> 
> Now the workaround:
>  zfs set recordsize=8K pool
Good idea, but it may have some drawbacks.

> Now the spa_sync functions takes 1-2 seconds, processor is idle, 
> only few metaslabs space maps are loaded:
>> 0600103ee500::walk metaslab |::print struct metaslab ms_map.sm_loaded ! 
>> grep -c "0x"
> 3
> 
> But now I have another question.
> How 8k blocks will impact on performance ?
First of all, you will need more block pointers to address the same 
amount of data, which is not good if you files are big and mostly 
static. If files change frequently this may increase fragmentation 
further...

Victor.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpools and drive duplication.

2007-07-06 Thread Richard Elling
Adam wrote:
> Just to let everyone know what I did to 'fix' the problem.  By halting the 
> zones and the exporting the zpool I was able to duplicate the drive without 
> issue. Just had to import the zpool upon booting and boot the zones. Although 
> my setup uses slices for the zpool (this is not supported by SUN), I did want 
> to let you know it does work.

Hi Adam,
could you explain what you mean by "my setup uses slices for the zpool (this is 
not
supported by SUN)"  This is not a ZFS restriction, zones?
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpools and drive duplication.

2007-07-06 Thread Adam
Just to let everyone know what I did to 'fix' the problem.  By halting the 
zones and the exporting the zpool I was able to duplicate the drive without 
issue. Just had to import the zpool upon booting and boot the zones. Although 
my setup uses slices for the zpool (this is not supported by SUN), I did want 
to let you know it does work.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance and memory consumption

2007-07-06 Thread Łukasz
If you want to know which blocks you do not have:
  dtrace -n fbt::metaslab_group_alloc:entry'{ self->s = arg1; }' -n 
fbt::metaslab_group_alloc:return'/arg1 != -1/{ self->s = 0 }' -n 
fbt::metaslab_group_alloc:return'/self->s && (arg1 == -1)/{ @s = 
quantize(self->s); self->s = 0; }' -n tick-10s'{ printa(@s); }'

and which blocks you do not have in some metaslabs:
 dtrace -n fbt::space_map_alloc:entry'{ self->s = arg1; }' -n 
fbt::space_map_alloc:return'/arg1 != -1/{ self->s = 0 }' -n 
fbt::space_map_alloc:return'/self->s && (arg1 == -1)/{ @s = quantize(self->s); 
self->s = 0; }' -n tick-10s'{ printa(@s); }'

If metaslabs_group_alloc looks like this
  value  - Distribution - count
   65536 |  0
  131072 |@@  9065
  262144 |  0

then you can set zfs record size to 64k
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance and memory consumption

2007-07-06 Thread Łukasz
After few hours with dtrace and source code browsing I found that in my space 
map there are no 128K blocks left. 
Try this on your ZFS. 
  dtrace -n fbt::metaslab_group_alloc:return'/arg1 == -1/{}

If you will get probes, then you also have the same problem.
Allocating from space map works like this:
1. metaslab_group_alloc want to allocate 128K block size
2. for (all metaslabs) {
   read space map and check 128K block size
   if no block then remove flag METASLAB_ACTIVE_MASK
}
3. unload maps for all metaslabs without METASLAB_ACTIVE_MASK

Thats is why spa_sync take so much time.

Now the workaround:
 zfs set recordsize=8K pool

Now the spa_sync functions takes 1-2 seconds, processor is idle, 
only few metaslabs space maps are loaded:
> 0600103ee500::walk metaslab |::print struct metaslab ms_map.sm_loaded ! 
> grep -c "0x"
3

But now I have another question.
How 8k blocks will impact on performance ?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Share and Remote mounting ZFS for anonyous ftp

2007-07-06 Thread Dale Erhart

All,

As a follow up on this issue. This was not a ZFS issue after all it was a
configuration issue which I'm still curious about.

I had changed ownership from root:sys on a directory that was going to 
collect
the anonymous downloads to a user that had the same UID and GID on both 
hosts
and permissions 777. I had also changed it to ftp and a known GID, which 
still

didn't work.

To get the ZFS file system to upload from the remote host the UID:GID 
had to be

set to root:sys on both hosts.

root:sys /export
root:sys /export/ftp
root:sys /export/ftp/incoming  <--- this is my anonymous login 
repository for both hosts


I would like to thank all who had responded.

Dale

Dale Erhart wrote:


Experts,

Sorry if this is a FAQ but I'm not on this alias.
Please reply directly to me.

I'm working on a project setting up a web portal that
will use 2 hosts for load balancing ftp's. I wanted to
use ZFS to showcase it to our customer.

What I've been trying to setup is anonymous ftp to a host that
is sharing a ZFS file system. Anonymous ftp is configured and
does work on the 2 hosts I'm working with. But when I try to
create a common ftp mount point between the 2 hosts (load balancing)
I get permission errors or fchmown errors.

I was wondering if there is a setup/configuration issue or won't ZFS
work with remote mounting and ftp.

Configuration:
SystemA sharing /export/ftp/incoming (zfs)
SystemB mounting SystemA:/export/ftp/incoming

Both hosts have the same permissions on the directories.
I've setup anonymous ftp on both systems with the ftpconfig.
Went through the steps of setting up a shared zfs file system:
zfs sharenfs=on portal/ftp-incoming
zfs set sharenfs=rw=SystemB.domain,root=SystemB.domain portal/ftp-incoming

Mounted the shared file system on SystemB:
mount SystemA:/export/ftp/incoming /export/ftp/incoming

I've setup /etc/ftpd/ftpaccess for upload to /export/ftp/incoming and
to change owner and permissions to a local user:
upload   /export/ftp   /incoming   yes   webadmin   www   0440   nodirs

Problem is that I will get errors when I try to upload a file. Errors are
either permission denied or a fchown error. I've changed ownership on the
/export/ftp/incoming from root to webadmnin to ftp without success.

Need suggestions fast for this project is suppose to go live soon.

Thank you for your help,

Dale



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance and memory consumption

2007-07-06 Thread Łukasz
Field ms_smo.smo_objsize in metaslab struct is size of data on disk. 
I checked the size of metaslabs in memory:
::walk spa | ::walk metaslab | ::print struct metaslab 
ms_map.sm_root.avl_numnodes
I got 1GB

But only some metaslabs are loaded:
::walk spa | ::walk metaslab | ::print struct metaslab 
ms_map.sm_root.avl_numnodes ! grep "0x" | wc -l
 231
  from 664 metaslabs. And number of metaslabs is changing very fast.

Is there a way to keep all metaslabs ion RAM ? Is there any limit ?
I encourage other administrators to check "free map space" size.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss