Re: [zfs-discuss] dfratime on zfs

2007-08-15 Thread Darren Dunham
> > Prompted by a recent /. article on atime vs realtime ranting by some 
> > Linux kernel hackers (Linus included) I went back and looked at the 
> > mount_ufs(1M) man page because I was sure that OpenSolaris had more than 
> > just atime,noatime.  Yep sure enough UFS has dfratime.
> > 
> > So that got me wondering does ZFS need dfratime or is it just not a 
> > problem because ZFS works in a different way.
> 
> I believe ZFS will delay atime updates waiting for more writes to come
> in, but it will eventually write them anyway (5 seconds?).  dfratime
> postpones the write until another write comes in, so it seems legitimate
> to me for ZFS to have such an option.

But a traditional filesystem isn't going to write anything without a
request.  ZFS is constantly updating the pool/uberblock status the way
things currently work.  So even if you choose to defer the atime update
until much longer, it won't prevent writes from being scheduled anyway.

> >If ZFS did have dfratime 
> > how would it impact the "always consistent on disk" requirement.  One 
> > though was that the ZIL would need to be used to ensure that the writes 
> > got to disk eventually, but then that would mean we were still writing 
> > just to the ZIL instead of the dataset itself.
> 
> I don't think dfratime changes the disk consistency at all.  Wouldn't
> writing to the ZIL with dfratime on defeat the purpose?  If we need to
> write to the ZIL for some other write, though, then it would be ok to
> flush the atime updates out too.
> 
> All that said, I believe the primary use case for dfratime is laptops,
> and therefore it shouldn't be a high priority for the ZFS team.

I think I'd guess it to be most useful when atime updates are a
significant fraction of scheduled writes, so that their removal or
deferral makes a difference.  I just don't think that'll be the case on
ZFS at the moment.  Of course gathering some actual numbers would be a
good idea.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
 < This line left intentionally blank to confuse you. >
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub halts

2007-08-15 Thread Gary Gendel
Al,

That makes so much sense that I can't believe I missed it. One bay was the one 
giving me the problems. Switching drives didn't affect that. Switching cabling 
didn't affect that. Changing Sata controllers didn't affect that. However, 
reorienting the case on it's side did!

I'll be putting in a larger fan into the disk-stack case.

Gary

> On Tue, 14 Aug 2007, Richard Elling wrote:
> 
> > Rick Wager wrote:
> >> We see similar problems on a SuperMicro with 5 500
> GB Seagate sata drives. This is using the AHCI
> driver. We do not, however, see problems with the
> same hardware/drivers if we use 250GB drives.
> >
> > Duh.  The error is from the disk :-)
> 
> A likely possiblity is that the disk drives are
> simply not getting 
> enough (cool) airflow and are over-heating during
> periods of high 
> system activity that generates a lot of disk head
> movement; for 
> example, during a zpool scrub.  And the extra
> platters present in the 
> larger disk drives would require even more cooling
> capacity - which 
> would validate your observations.
> 
> Best to actually *measure* the effectiveness of the
> disk cooling 
> design/installation.  Recommendation: investigate the
> Fluke mini 
> infrared thermometers - for example - the Fluke 62
> at: 
> http://www.testequipmentdepot.com/fluke/thermometers/6
> 2.htm
> 
> In some disk drive installations, its possible for
> the infrared probe 
> to "see" the disk HDA (Head Disk Assembly) without
> disturbing the 
> drive.
> 
> PS: I use a much older Fluke 80T-IR in combination
> with a digital 
> multimeter with millivolt resolution (a Fluke meter
> of course!).
> 
> >> We sometimes see bad blocks reported (are these
> automatically remapped somehow so they are not used
> again?) and sometimes sata port resets.
> >
> > Depending on how the errors are reported, the
> driver may attempt a reset
> > to clear.  The drive may also automaticaly spare
> bad blocks.
> >
> >> Here is a sample of the log output. Any help
> understanding and/or resolving this issue greatly
> appreciated. I very much don't wont to have freezes
> in production.
> >>
> >> Aug 14 11:20:28 chazz1  port 2: device reset
> >> Aug 14 11:20:28 chazz1 scsi: [ID 107833
> kern.warning] WARNING:
> /[EMAIL PROTECTED],0/pci15d9,[EMAIL PROTECTED],2/[EMAIL PROTECTED],0 (sd3):
> >> Aug 14 11:20:28 chazz1  Error for Command: write
>   Error Level: Retryable
> chazz1 scsi: [ID 107833 kern.notice]Requested
>  Block: 530   Error Block: 530
> > Aug 14 11:20:28 chazz1 scsi: [ID 107833
> kern.notice]Vendor: ATA
>Serial Number:
> [ID 107833 kern.notice]Sense Key:
>  No_Additional_Sense
> > Aug 14 11:20:28 chazz1 scsi: [ID 107833
> kern.notice]ASC: 0x0 (no additional sense info),
> ASCQ: 0x0, FRU: 0x0
> >
> > This error was transient and retried.  If it was a
> fatal error (still
> > failed after retries) then you'll have another,
> different message
> > describing the failed condition.
> >  -- richard
> >
> 
> Regards,
> 
> Al Hopper  Logical Approach Inc, Plano, TX.
>  [EMAIL PROTECTED]
> Voice: 972.379.2133 Fax: 972.379.2134
>   Timezone: US CDT
> enSolaris Governing Board (OGB) Member - Apr 2005 to
> Mar 2007
> http://www.opensolaris.org/os/community/ogb/ogb_2005-2
> 007/
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Opensolaris ZFS version & Sol10U4 compatibility

2007-08-15 Thread David Evans
As the release date Solaris 10 Update 4 approaches (hope, hope),  I was 
wondering if someone could comment on which versions of opensolaris ZFS will 
seamlessly work when imported.

Thanks.

dce
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dfratime on zfs

2007-08-15 Thread David Bustos
Quoth Darren J Moffat on Thu, Aug 09, 2007 at 10:32:02AM +0100:
> Prompted by a recent /. article on atime vs realtime ranting by some 
> Linux kernel hackers (Linus included) I went back and looked at the 
> mount_ufs(1M) man page because I was sure that OpenSolaris had more than 
> just atime,noatime.  Yep sure enough UFS has dfratime.
> 
> So that got me wondering does ZFS need dfratime or is it just not a 
> problem because ZFS works in a different way.

I believe ZFS will delay atime updates waiting for more writes to come
in, but it will eventually write them anyway (5 seconds?).  dfratime
postpones the write until another write comes in, so it seems legitimate
to me for ZFS to have such an option.

>If ZFS did have dfratime 
> how would it impact the "always consistent on disk" requirement.  One 
> though was that the ZIL would need to be used to ensure that the writes 
> got to disk eventually, but then that would mean we were still writing 
> just to the ZIL instead of the dataset itself.

I don't think dfratime changes the disk consistency at all.  Wouldn't
writing to the ZIL with dfratime on defeat the purpose?  If we need to
write to the ZIL for some other write, though, then it would be ok to
flush the atime updates out too.

All that said, I believe the primary use case for dfratime is laptops,
and therefore it shouldn't be a high priority for the ZFS team.


David
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] restore lost pool after vtoc re-label

2007-08-15 Thread Robert Milkowski
Hello o,

Wednesday, August 15, 2007, 12:17:04 AM, you wrote:

op> hi all,

op> i've been using a SAN LUN as the sole member of a zpool with one
op> additional zfs filesystem. this is a flat SAN fabric, so this LUN
op> was available to other systems on the fabric, and one of them came
op> up with "wrong magic number" for several drives, and, as best i
op> can tell, the vtoc for my zpool LUN was over-written on that host
op> via format labeling to correct the error. 

op> all of the data should still be there, nobody else has touched
op> the LUN, but zpool doesn't see anything. 

op> is it possible to recover that zpool somehow? i know it exists
op> and the LUN has been un-touched since the labeling.


How that LUN was being used by ZFS? I mean did you specify entire LUN
(c0t0d0 - without specifying a slice?) If it's the case try to 'format
-e disk' with proer disk, then create EFI label with s0 covering
entire disk (minus that reservation in EFI which you won't be able to
overwrite anyway I guess). Then try to 'zpool import' and check if it
works.

If it's not the case then you'll have to recreate whatever label there
was...



-- 
Best regards,
 Robert Milkowski   mailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely long creat64 latencies on higly utilized zpools

2007-08-15 Thread johansen-osdev
You might also consider taking a look at this thread:

http://mail.opensolaris.org/pipermail/zfs-discuss/2007-July/041760.html

Although I'm not certain, this sounds a lot like the other pool
fragmentation issues.

-j

On Wed, Aug 15, 2007 at 01:11:40AM -0700, Yaniv Aknin wrote:
> Hello friends,
> 
> I've recently seen a strange phenomenon with ZFS on Solaris 10u3, and was 
> wondering if someone may have more information.
> 
> The system uses several zpools, each a bit under 10T, each containing one zfs 
> with lots and lots of small files (way too many, about 100m files and 75m 
> directories).
> 
> I have absolutely no control over the directory structure and believe me I 
> tried to change it.
> 
> Filesystem usage patterns are create and read, never delete and never rewrite.
> 
> When volumes approach 90% usage, and under medium/light load (zpool iostat 
> reports 50mb/s and 750iops reads), some creat64 system calls take over 50 
> seconds to complete (observed with 'truss -D touch'). When doing manual 
> tests, I've seen similar times on unlink() calls (truss -D rm). 
> 
> I'd like to stress this happens on /some/ of the calls, maybe every 100th 
> manual call (I scripted the test), which (along with normal system 
> operations) would probably be every 10,000th or 100,000th call.
> 
> Other system parameters (memory usage, loadavg, process number, etc) appear 
> nominal. The machine is an NFS server, though the crazy latencies were 
> observed both local and remote.
> 
> What would you suggest to further diagnose this? Has anyone seen trouble with 
> high utilization and medium load? (with or without insanely high filecount?)
> 
> Many thanks in advance,
>  - Yaniv
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely long creat64 latencies on higly utilized zpools

2007-08-15 Thread michael schuster
Yaniv,

I'm adding dtrace-discuss to this email for reasons that will be obvious 
immediately :-) - see below

Yaniv Aknin wrote:

> When volumes approach 90% usage, and under medium/light load (zpool
> iostat reports 50mb/s and 750iops reads), some creat64 system calls take
> over 50 seconds to complete (observed with 'truss -D touch'). When doing
> manual tests, I've seen similar times on unlink() calls (truss -D rm).
> 
> I'd like to stress this happens on /some/ of the calls, maybe every
> 100th manual call (I scripted the test), which (along with normal system
> operations) would probably be every 10,000th or 100,000th call.

I'd suggest you do something like this (not tested, so syntax errors etc 
may be lurking; I'd also suggest you get the DTrace guide off of 
opensolaris.org and read the chapter about speculations):

#!/usr/sbin/drace -Fs

int limit  ONE_SECOND   /* you need to replace this with 10^9, I think)

syscall::creat64:entry
{
self->spec = speculation();
speculate(self->spec);
self->ts=timestamp();
self->duration = 0;
}

fbt:::entry,
fbt:::return
/self->spec/
{
speculate(self->spec);
}

syscall::creat64:return
/self->spec/
{
speculate(self->spec);
self->duration = timestamp() - self->ts;
}

syscall::creat64:return
/self->duration > limit/
{
commit(self->spec);
self->spec = 0;
}

syscall::creat64:return
/self->spec/
{
discard(self->spec);
self->spec = 0;
}


you may need to use a different timestamp (walltimestamp?); and perhaps 
you'll want to somehow reduce the number of fbt probes, but that's up to 
you. I hope you can take it from here.

cheers
Michael
-- 
Michael SchusterSun Microsystems, Inc.
recursion, n: see 'recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub halts

2007-08-15 Thread Al Hopper
On Tue, 14 Aug 2007, Richard Elling wrote:

> Rick Wager wrote:
>> We see similar problems on a SuperMicro with 5 500 GB Seagate sata drives. 
>> This is using the AHCI driver. We do not, however, see problems with the 
>> same hardware/drivers if we use 250GB drives.
>
> Duh.  The error is from the disk :-)

A likely possiblity is that the disk drives are simply not getting 
enough (cool) airflow and are over-heating during periods of high 
system activity that generates a lot of disk head movement; for 
example, during a zpool scrub.  And the extra platters present in the 
larger disk drives would require even more cooling capacity - which 
would validate your observations.

Best to actually *measure* the effectiveness of the disk cooling 
design/installation.  Recommendation: investigate the Fluke mini 
infrared thermometers - for example - the Fluke 62 at: 
http://www.testequipmentdepot.com/fluke/thermometers/62.htm

In some disk drive installations, its possible for the infrared probe 
to "see" the disk HDA (Head Disk Assembly) without disturbing the 
drive.

PS: I use a much older Fluke 80T-IR in combination with a digital 
multimeter with millivolt resolution (a Fluke meter of course!).

>> We sometimes see bad blocks reported (are these automatically remapped 
>> somehow so they are not used again?) and sometimes sata port resets.
>
> Depending on how the errors are reported, the driver may attempt a reset
> to clear.  The drive may also automaticaly spare bad blocks.
>
>> Here is a sample of the log output. Any help understanding and/or resolving 
>> this issue greatly appreciated. I very much don't wont to have freezes in 
>> production.
>>
>> Aug 14 11:20:28 chazz1  port 2: device reset
>> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
>> PROTECTED],0/pci15d9,[EMAIL PROTECTED],2/[EMAIL PROTECTED],0 (sd3):
>> Aug 14 11:20:28 chazz1  Error for Command: write   Error 
>> Level: Retryable
>> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice]Requested Block: 530 
>>   Error Block: 530
>> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice]Vendor: ATA  
>>   Serial Number:
>> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice]Sense Key: 
>> No_Additional_Sense
>> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice]ASC: 0x0 (no 
>> additional sense info), ASCQ: 0x0, FRU: 0x0
>
> This error was transient and retried.  If it was a fatal error (still
> failed after retries) then you'll have another, different message
> describing the failed condition.
>  -- richard
>

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Extremely long creat64 latencies on higly utilized zpools

2007-08-15 Thread Yaniv Aknin
Hello friends,

I've recently seen a strange phenomenon with ZFS on Solaris 10u3, and was 
wondering if someone may have more information.

The system uses several zpools, each a bit under 10T, each containing one zfs 
with lots and lots of small files (way too many, about 100m files and 75m 
directories).

I have absolutely no control over the directory structure and believe me I 
tried to change it.

Filesystem usage patterns are create and read, never delete and never rewrite.

When volumes approach 90% usage, and under medium/light load (zpool iostat 
reports 50mb/s and 750iops reads), some creat64 system calls take over 50 
seconds to complete (observed with 'truss -D touch'). When doing manual tests, 
I've seen similar times on unlink() calls (truss -D rm). 

I'd like to stress this happens on /some/ of the calls, maybe every 100th 
manual call (I scripted the test), which (along with normal system operations) 
would probably be every 10,000th or 100,000th call.

Other system parameters (memory usage, loadavg, process number, etc) appear 
nominal. The machine is an NFS server, though the crazy latencies were observed 
both local and remote.

What would you suggest to further diagnose this? Has anyone seen trouble with 
high utilization and medium load? (with or without insanely high filecount?)

Many thanks in advance,
 - Yaniv
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss