Re: [zfs-discuss] Performance problem suggestions?

2011-05-12 Thread Jim Klimov
 The thing is- as far as I know the OS doesn't ask the disk to find a place
 to fit the data. Instead the OS tracks what space on the disk is free and
 then tells the disk where to write the data.

Yes and no, I did not formulate my idea clearly enough, sorry for confusion ;)

Yes - The disks don't care about free blocks at all. For them they are just LBA 
sector numbers.

No - The OS does track which sectors correlate to its logical blocks it deems 
suitable for a write, and asks the disk to position its mechanical head to a 
specific track and access a specific sector. This is a slow operation which can 
only be done about 180-250 times per second for very random I/Os (may be more 
with HDD/Controller caching, queuing and faster spindles).

I'm afraid that seeking to very dispersed metadata blocks, such as traversing 
the tree during a scrub on a fragmented drive, may qualify as a very random I/O.

This reminds me of a long-hanging BP Rewrite project which would allow live 
re-arranging of ZFS data allowing, in particular, some extent of 
defragmentation. More useful usages would be changes to RAIDZ levels and number 
of disks though, maybe even removal of top-level VDEVs from a sufficiently 
empty pool... Hopefully the Illumos team or some other developers would push 
this idea into reality ;)

There was a good tip from Jim Litchfield regarding VDEV Queue Sizing, though. 
Possible current default for zfs_vdev_max_pending is 10, which is okay (or may 
be even too much) for individual drives, but is not very much for arrays of 
many disks hidden behind a smart controller with its own caching and queuing, 
be it a SAN box controller or a PCI one which would intercept and reinterpret 
your ZFS's calls.

So maybe this is indeed a bottleneck - which you would see in iostat -Xn 1 as 
actv field numbers which are near the configured queue size. 

//Jim
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance problem suggestions?

2011-05-12 Thread Don
 This is a slow operation which can only be done about 180-250 times per second
 for very random I/Os (may be more with HDD/Controller caching, queuing and
 faster spindles).
 I'm afraid that seeking to very dispersed metadata blocks, such as traversing 
 the
 tree during a scrub on a fragmented drive, may qualify as a very random I/O.
And that's the thing- I would understand if my scrub was slow because the disks 
were just being hammered by IOPS but- all joking aside- my pool is almost 
entirely idle according to an iostat -Xn
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance problem suggestions?

2011-05-11 Thread Jim Klimov
  Disks that have been in use for a longer time may have very fragmented free
  space on one hand, and not so much of it on another, but ZFS is still 
  trying to push
  bits around evenly. And while it's waiting on some disks, others may be 
  blocked as
  well. Something like that...
 This could explain why performance would go up after a large delete but I've 
 not 
 seen large wait times for any of my disks. The service time, percent busy, 
 and 
 every other metric continues to show nearly idle disks.

I believe, in this situation the older fuller disks would show some activity 
and others can show zero or few IOs - because ZFS has no tasks for them. It 
sent a series of blocks to write from the queue, newer disks wrote them and 
stay dormant, while older disks seek around to fit that piece of data... When 
old disks complete the writes, ZFS batches them a new set of tasks.

 If this is the problem- it would be nice if there were a simple zfs or dtrace 
 query 
 that would show it to you.

Well, it seems that the bridge between email and web interfaces to OpenSolaris 
forums has been fixed, for new posts at least, and hopefully Richard Elling or 
some other experts would come up with an idea of a dtrace for your situation.

I have little non-zero hope that the experts would also come to the web-forums 
and review the past month's posts and give their comments to my, your and 
others' questions and findings ;)

//Jim Klimov
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance problem suggestions?

2011-05-11 Thread Jim Litchfield
Keep in mind zfs_vdev_max_pending. In the latest version of S10, this is set
to 10. ZFS will not issue more than the value of this variable requests at
a time for a LUN. Your disks may look relatively idle while ZFS
has a lot of data piled up inside just waiting to be read or written.
I have tweaked this on the fly. 

One key indicator is if your disk queues hover around 10.

Jim
---

- Original Message -
From: jimkli...@cos.ru
To: zfs-discuss@opensolaris.org
Sent: Wednesday, May 11, 2011 3:22:19 AM GMT -08:00 US/Canada Pacific
Subject: Re: [zfs-discuss] Performance problem suggestions?

  Disks that have been in use for a longer time may have very fragmented free
  space on one hand, and not so much of it on another, but ZFS is still 
  trying to push
  bits around evenly. And while it's waiting on some disks, others may be 
  blocked as
  well. Something like that...
 This could explain why performance would go up after a large delete but I've 
 not 
 seen large wait times for any of my disks. The service time, percent busy, 
 and 
 every other metric continues to show nearly idle disks.

I believe, in this situation the older fuller disks would show some activity 
and others can show zero or few IOs - because ZFS has no tasks for them. It 
sent a series of blocks to write from the queue, newer disks wrote them and 
stay dormant, while older disks seek around to fit that piece of data... When 
old disks complete the writes, ZFS batches them a new set of tasks.

 If this is the problem- it would be nice if there were a simple zfs or dtrace 
 query 
 that would show it to you.

Well, it seems that the bridge between email and web interfaces to OpenSolaris 
forums has been fixed, for new posts at least, and hopefully Richard Elling or 
some other experts would come up with an idea of a dtrace for your situation.

I have little non-zero hope that the experts would also come to the web-forums 
and review the past month's posts and give their comments to my, your and 
others' questions and findings ;)

//Jim Klimov
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance problem suggestions?

2011-05-11 Thread Don
 It sent a series of blocks to write from the queue, newer disks wrote them 
 and stay
 dormant, while older disks seek around to fit that piece of data... When old 
 disks
 complete the writes, ZFS batches them a new set of tasks.
The thing is- as far as I know the OS doesn't ask the disk to find a place to 
fit the data. Instead the OS tracks what space on the disk is free and then 
tells the disk where to write the data.

Even if ZFS was waiting for the IO to complete I would expect to see that delay 
reflected in the disk service times. In our case we see no high service times, 
no busy disks, nothing. It seems like ZFS is just sitting there quietly and 
thinking to itself. If the processor were busy that might make sense but even 
there- our processor seems largely idle.

At the same time- even a scrub on this system is a joke right now and that's a 
read intensive operation. I'm seeing a scrub speed of 400K/s but almost no IO's 
to my disks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance problem suggestions?

2011-05-10 Thread Don
I've been going through my iostat, zilstat, and other outputs all to no avail. 
None of my disks ever seem to show outrageous service times, the load on the 
box is never high, and if the darned thing is CPU bound- I'm not even sure 
where to look.

(traversing DDT blocks even if in memory, etc - and kernel times indeed are 
above 50%) as I'm zeroing deleted blocks inside the internal pool. This 
took several days already, but recovered lots of space in my main pool also...
When you say you are zeroing deleted blocks- how are you going about doing that?

Despite claims to the contrary- I can understand ZFS needing some tuning. What 
I can't understand are the baffling differences in performance I see. For 
example- after deleting a large volume- suddenly my performance will skyrocket- 
then gradually degrade- but the question is why?

I'm not running dedup. My disks seem to be largely idle. I have 8 3GHz cores 
that also seem to be idle. I seem to have enough memory. What is ZFS doing 
during this time?

Everything I've read suggests one of two possible causes- too full, or bad 
hardware. Is there anything else that might be an issue here? Another ZFS 
factor I haven't taken into account?

Space seems to be the biggest factor in my performance difference- more free 
space = more performance- but as my fullest disks are less than 70% full, and 
my emptiest disks are less than 10% full- I can't understand why space is an 
issue.

I have a few hardware errors for one of my pool disks- but we're talking about 
a very small number of errors over a long period of time. I'm considering 
replacing this disk but the pool is so slow at times I'm loathe to slow it down 
further by doing a replace unless I can be more certain that is going to fix 
the problem.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance problem suggestions?

2011-05-10 Thread Jim Klimov
Well, as I wrote in other threads - i have a pool named pool on physical 
disks, and a compressed volume in this pool which i loopback-mount over iSCSI 
to make another pool named dcpool.

When files in dcpool are deleted, blocks are not zeroed out by current ZFS 
and they are still allocated for the physical pool. Now i'm doing essentially 
this to clean up the parent pool:
# dd if=/dev/zero of=/dcpool/nodedup/bigzerofile

This file is in a non-deduped dataset, so to the point of view of dcpool, it 
has a growing huge file filled with zeroes - and its referenced blocks 
overwrite garbage left over from older deleted files and no longer referenced 
by dcpool. However for the pool this is a write of compressed zeroed block, 
which is not to be referenced, so the pool releases a volume block and its 
referencing metadata block.

This has already released over half a terabyte in my physical pool (compressed 
blocks filled with zeroes are a special case for ZFS and require none or 
less-than-usual reference metadata blocks) ;)

However, since I have millions of 4kb blocks for volume data and its metadata, 
I guess fragmentation is quite high, maybe even interlacing one-to-one? One way 
or another, this dcpool never saw IOs faster that say 15Mb/s, and usually 
lingers in 1-5Mb/s range, while I can get 30-50Mb/s in the pool easily in 
other datasets (with dynamic block sizes and lengthier contiguous data 
stretches).

Writes had been relatively quick for the first virtual terabyte or so, but it's 
doing the last 100gb for several days now, at several megabytes per minute in 
the dcpool iostat. There's several Mb/sec of IO's on hardware disks to back 
this deletion and clean-up, however (as in my examples in previous post)...

As for disks with different fill ratio - it is a commonly discussed performance 
problem. Seems to boil down to this: free space on all disks (actually on 
top-level VDEVs) is considered for round-robining writes to stripes. Disks that 
have been in use for a longer time may have very fragmented free space on one 
hand, and not so much of it on another, but ZFS is still trying to push bits 
around evenly. And while it's waiting on some disks, others may be blocked as 
well. Something like that...

People on this forum have seen and reported that adding a 100Mb file tanked 
their multiterabyte pool's performance, and removing the file boosted it back 
up.

I don't want to mix up other writers' findings, better search recent 5-10 pages 
of the forum post headings yourself. It's within the last hundred of threads, I 
think, maybe ;)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance problem suggestions?

2011-05-10 Thread Hung-ShengTsao (Lao Tsao) Ph.D.

it is my understanding for write (fast) consider faster HDD (SSD) for ZIL
for read consider faster HDD(SSD) for L2ARC
There were many discussion for V12N env raid1 is better than raidz

On 5/10/2011 3:31 PM, Don wrote:

I've been going through my iostat, zilstat, and other outputs all to no avail. 
None of my disks ever seem to show outrageous service times, the load on the 
box is never high, and if the darned thing is CPU bound- I'm not even sure 
where to look.

(traversing DDT blocks even if in memory, etc - and kernel times indeed are above 50%) as I'm zeroing 
deleted blocks inside the internal pool. This took several days already, but 
recovered lots of space in my main pool also...
When you say you are zeroing deleted blocks- how are you going about doing that?

Despite claims to the contrary- I can understand ZFS needing some tuning. What 
I can't understand are the baffling differences in performance I see. For 
example- after deleting a large volume- suddenly my performance will skyrocket- 
then gradually degrade- but the question is why?

I'm not running dedup. My disks seem to be largely idle. I have 8 3GHz cores 
that also seem to be idle. I seem to have enough memory. What is ZFS doing 
during this time?

Everything I've read suggests one of two possible causes- too full, or bad 
hardware. Is there anything else that might be an issue here? Another ZFS 
factor I haven't taken into account?

Space seems to be the biggest factor in my performance difference- more free 
space = more performance- but as my fullest disks are less than 70% full, and 
my emptiest disks are less than 10% full- I can't understand why space is an 
issue.

I have a few hardware errors for one of my pool disks- but we're talking about 
a very small number of errors over a long period of time. I'm considering 
replacing this disk but the pool is so slow at times I'm loathe to slow it down 
further by doing a replace unless I can be more certain that is going to fix 
the problem.
attachment: laotsao.vcf___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance problem suggestions?

2011-05-10 Thread Don
 # dd if=/dev/zero of=/dcpool/nodedup/bigzerofile
Ahh- I misunderstood your pool layout earlier. Now I see what you were doing.

People on this forum have seen and reported that adding a 100Mb file tanked 
their
 multiterabyte pool's performance, and removing the file boosted it back up.
Sadly I think several of those posts were mine or those of coworkers.

 Disks that have been in use for a longer time may have very fragmented free
 space on one hand, and not so much of it on another, but ZFS is still trying 
 to push
 bits around evenly. And while it's waiting on some disks, others may be 
 blocked as
 well. Something like that...
This could explain why performance would go up after a large delete but I've 
not seen large wait times for any of my disks. The service time, percent busy, 
and every other metric continues to show nearly idle disks.

If this is the problem- it would be nice if there were a simple zfs or dtrace 
query that would show it to you.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss