[zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-23 Thread Ross
Has anybody here got any thoughts on how to resolve this problem:
http://www.opensolaris.org/jive/thread.jspa?messageID=261204&tstart=0

It sounds like two of us have been affected by this now, and it's a bit of a 
nuisance your entire server hanging when a drive is removed, makes you worry 
about how Solaris would handle a drive failure.

Has anybody tried pulling a drive on a live Thumper, surely they don't hang 
like this?  Although, having said that I do remember they do have a great big 
warning in the manual about using cfgadm to stop the disk before removal saying:

"Caution - You must follow these steps before removing a disk from service.  
Failure to follow the procedure can corrupt your data or render your file 
system inoperable."

Ross
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool

2008-07-23 Thread Richard Elling
Richard Elling wrote:
> Rainer Orth wrote:
>   
>> Richard Elling writes:
>>
>>   
>> 
 I've found out what the problem was: I didn't specify the -F zfs option to
 installboot, so only half of the ZFS bootblock was written.  This is a
 combination of two documentation bugs and a terrible interface:
   
   
 
>>> Mainly because there is no -F option?
>>> 
>>>   
>> Huh?  From /usr/sbin/installboot:
>>   
>> 
>
> Which build you see this? It isn't in the online source
> browser or b93... there might be another issue lurking here...
>   

Never mind. I found it.  We'll try to get this straightened out
in a way that will not confuse people who might not realize
that the x86 and sparc versions are so different.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can anyone help me?

2008-07-23 Thread Aaron Botsis
Hello, I've hit this same problem. 

Hernan/Victor, I sent you an email asking for the description of this solution. 
I've also got important data on my array. I went to b93 hoping there'd be a 
patch for this.

I caused the problem in a manner identical to Hernan; by removing a zvol clone. 
Exact same symptoms, userspace seems to go away, network stack is still up, no 
disk activity, system never recovers. 

If anyone has the solution to this, PLEASE help me out. Thanks a million in 
advance.

Aaron

> Well, finally managed to solve my issue, thanks to
> the invaluable help of Victor Latushkin, who I can't
> thank enough.
> 
> I'll post a more detailed step-by-step record of what
> he and I did (well, all credit to him actually) to
> solve this. Actually, the problem is still there
> (destroying a huge zvol or clone is slow and takes a
> LOT of memory, and will die when it runs out of
> memory), but now I'm able to import my zpool and all
> is there.
> 
> What Victor did was hack ZFS (libzfs) to force a
> rollback to "abort" the endless destroy, which was
> re-triggered every time the zpool was imported, as it
> was inconsistent. With this custom version of libzfs,
> setting an environment variable makes libzfs to
> bypass the destroy and jump to rollback, "undoing"
> the last destroy command.
> 
> I'll be posting the long version of the story soon.
> 
> Hernán
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Donald Murray, P.Eng.
On Wed, Jul 23, 2008 at 7:21 PM, Miles Nordin <[EMAIL PROTECTED]> wrote:
*SNIP*

>
> Anyway, you can find more anecdotes in the archives of this list.
> IIRC someone else corroborated that he found, among non-DoA drives,
> failures are more likely in the first month than in the second month,
> but I couldn't find the post.
>
> I did find Richard Elling's posting of this paper:
>
>  
> http://www.usenix.org/event/fast08/tech/full_papers/bairavasundaram/bairavasundaram.pdf
>
> but it does not support my claim about first-month failures.  Maybe my
> experience is related to something NetApp didn't have, maybe related
> to the latest batch of consumer drives released after that study, or
> to the consumer supply chain.

*SNIP*

For another good read on drive failures, there's
also "Failure Trends in a Large Disk Drive Population":
http://labs.google.com/papers/disk_failures.html
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 performance tuning.

2008-07-23 Thread Jorgen Lundman
>> SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc
> That's a very old release, have you considered upgrading?
> Ian.
> 

It was the absolute latest version available when we received the x4500, 
and now it is live and supporting a large number of customers. However, 
the 2nd unit will arrive next week (Will be Sol10 508, as that is the 
only/newest OS version the vendor will support).

So yes, in a way we will move to a newer version if we can work out a 
good way to migrate from one x4500 to another x4500:)

But in the meanwhile, we were hoping we could do some kernel tweaking, 
reboot (3 minute downtime) and it would perform a little better. It 
would be nice to have someone who knows more than me, give their opinion 
  as to if my guesses has any chances of succeeding.

For example, Postfix delivering mail, system calls like open() and 
fdsync() is currently taking upwards of 7 seconds to complete.

Lund


-- 
Jorgen Lundman   | <[EMAIL PROTECTED]>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 performance tuning.

2008-07-23 Thread Ian Collins
Jorgen Lundman writes: 

> 
> We are having slow performance with the UFS volumes on the x4500. They
> are slow even on the local server. Which makes me think it is (for once) 
> not NFS related. 
> 
> 
> Current settings: 
> 
> SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc 
> 
That's a very old release, have you considered upgrading? 

Ian. 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] x4500 performance tuning.

2008-07-23 Thread Jorgen Lundman

We are having slow performance with the UFS volumes on the x4500. They
are slow even on the local server. Which makes me think it is (for once) 
not NFS related.


Current settings:

SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc

# cat /etc/release
 Solaris Express Developer Edition 9/07 snv_70b X86
Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
 Use is subject to license terms.
 Assembled 30 August 2007

NFSD_SERVERS=1024
LOCKD_SERVERS=128

PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP

  12249 daemon   7204K 6748K sleep   60  -20  54:16:26  14% nfsd/731

load averages:  2.22,  2.32,  2.42 12:31:35
63 processes:  62 sleeping, 1 on cpu
CPU states: 68.7% idle,  0.0% user, 31.3% kernel,  0.0% iowait,  0.0% swap
Memory: 16G real, 1366M free, 118M swap in use, 16G swap free


/etc/system:

set ndquot=5048000


We have a setup like:

/export/zfs1
/export/zfs2
/export/zfs3
/export/zfs4
/export/zfs5
/export/zdev/vol1/ufs1
/export/zdev/vol2/ufs2
/export/zdev/vol3/ufs3

What is interesting is that if I run "df", it will display everything at 
normal speed, but pause before "vol1/ufs1" file system. truss confirms 
that statvfs64() is slow (5 seconds usually). All other ZFS and UFS 
filesystems behave normally. vol1/ufs1 is the most heavily used UFS 
filesystem.

Disk:
/dev/zvol/dsk/zpool1/ufs1
991G   224G   758G23%/export/ufs1

Inodes:
/dev/zvol/dsk/zpool1/ufs1
 37698475 2504405360%   /export/ufs1




Possible problems:

# vmstat -s
866193018 total name lookups (cache hits 57%)

# kstat -n inode_cache
module: ufs instance: 0
name:   inode_cache class:ufs
maxsize 129797
maxsize reached 269060
thread idles319098740
vget idles  62136


This leads me to think we should consider setting;

set ncsize=259594(doubled... are there better values?)
set ufs_ninode=259594

in /etc/system, and reboot. But it is costly to reboot based only on my
guess. Do you have any other suggestions to explore? Will this help?


Sincerely,

Jorgen Lundman


-- 
Jorgen Lundman   | <[EMAIL PROTECTED]>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Bob Friesenhahn
On Wed, 23 Jul 2008, Miles Nordin wrote:

> the problem is that it's common for a very large drive to have
> unreadable sectors.  This can happen because the drive is so big that
> its bit-error-rate matters.  But usually it happens because the drive
> is starting to go bad but you don't realize this because you haven't
> been scrubbing it weekly.  Then, when some other drive actually does
> fail hard, you notice and replace the hard-failed drive, and you're
> forced to do an implicit scrub, and THEN you discover the second
> failed drive.  too late for mirrors or raidz to help.

The computed MTTDL is better for raidz2 than for two-way mirrors but 
the chance of loss is already small enough that humans are unlikely to 
notice.  Consider that during resilvering the mirror case only has to 
read data from one disk whereas with raidz2 it seems that the number 
of disks which need to be read are the number of total disks minus 
two.  This means that resilvering the mirror will be much faster and 
since it takes less time and fewer components are involved in the 
recovery, there is less opportunity for a second failure.  The concern 
over scrub is not usually an issue since a simple cron job can take 
care of it.

Richard's MTTDL writeup at 
http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl 
is pretty interesting.  However, Richard's writeup is also `flawed' 
since it only considers the disks involved and ignores the rest of the 
system.  This is admitted early on in the statement that "MTTDL 
calculation is ONE attribute" of all the good things we are hoping 
for.

Raw disk space is cheap.  Mirrors are fast and simple and you can plan 
your hardware so that the data path to the disk is independent of the 
other disk.  When in doubt add a third mirror.  If you start out with 
just a little bit of data which grows over time, you can use three way 
mirroring and transition the extra mirror disks to become regular data 
disks later on.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Aaron Theodore
> >  3. burn in the raidset for at least one month before trusting the
> > disks to not all fail simultaneously. 
> > 
> Has anyone ever seen this happen for real?  I seriously doubt it will
> happen 
> with new drives. 

I have seen it happen on my own home ZFS fileserver...
purchased two new 500gb drives (WD RE2 enterprise ones), both started failing 
within a few days.
Luckly I managed to get both replaced without loosing any data in my RAID-Z 
pool.
Looking at the drive serial numbers they were part of the same batch.

Aaron
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Miles Nordin
> "ic" == Ian Collins <[EMAIL PROTECTED]> writes:

ic> I'd use mirrors rather than raidz2.  You should see better
ic> performance 

the problem is that it's common for a very large drive to have
unreadable sectors.  This can happen because the drive is so big that
its bit-error-rate matters.  But usually it happens because the drive
is starting to go bad but you don't realize this because you haven't
been scrubbing it weekly.  Then, when some other drive actually does
fail hard, you notice and replace the hard-failed drive, and you're
forced to do an implicit scrub, and THEN you discover the second
failed drive.  too late for mirrors or raidz to help.

 http://www.opensolaris.org/jive/message.jspa?messageID=255647&tstart=0#255647

If you don't scrub, in my limited experience this situation is the
rule rather than the exception.  especially with digital video from
security cameras and backups of large DVD movie collections---where
most blocks don't get read for years unless you scrub.

ic> you really can grab two of the disks and still leave behind a
ic> working file server!

this really works with 4-disk raidz2, too.

I don't fully understand ZFS's quorum rules, but I have tried a 4-disk
raidz2 pool running on only 2 disks.  

You're right, it doesn't work quite as simply as two 2-disk mirrors.
Since I have half my disks in one tower, half in another, and each
tower connected to ZFS with iSCSI, I often want to shutdown one whole
tower without rebooting the ZFS host.  I find I can do that with
mirrors, but not with 4-disk raidz2.  I'll elaborate.

The only shitty thing is, zpool will only let you offline one of the
four disks.  When you try to offline the second, it says ``no valid
replicas.''  A pair of mirrors doesn't have that problem.

But, if you forcibly take two disks away from 4-disk raidz2, the pool
does keep working as promised.  The next problem(s) comes after you
give the two disks back.

 1. zpool shows all four disks ONLINE, and then resilvers.  There's no
indication as to which disks are being resilvered and which are
already ``current,'' though---it just shows all four as ONLINE.
so you don't know which two disks absolutely cannot be
removed---which are the target of the resilver and which are the
source.  SVM used to tell you this.  What happens when a disk
fails during the resilver?  Does something different happen
depending on whether it's an up-to-date disk or a resilveree disk?
probably worth testing, but I haven't.

Secondly, if you have many 4-disk raidz2 vdev's, there's no
indication about which vdev is being resilvered.  If I have 20
vdev's, I may very well want to proceed to another vdev, offline
one disk (or two, damnit!), maintain it, before the resilver
finishes.  not enough information in zpool status to do this.  Is
it even possible to 'zpool offline' a disk in another raidz2 vdev
during the resilver, or will it say 'no valid replicas'?  I
haven't tested, probably should, but I only have two towers so
far.

so, (a) disks which will result in 'no valid replicas' when you
attempt to offline them should not be listed as ONLINE in 'zpool
status'.  They're different and should be singled out.

and (b) the set of these disks should be as small as arrangeably
possible

 2. after resilvering says it's complete, 0 errrors everywhere, zpool
still will not let you offline ANY of the four disks, not even
one.  no valid replicas.

 3. 'zpool scrub'

 4. now you can offline any one of the four disks.  You can also
online the disk, and offline a different disk, as much as you like
so long as only one disk is offline (but you're supposed to get
two!).  You do not need to scrub in between.  If you take away a
disk forcibly instead of offlining it, then you go back to step 1
and cannot offline anything without a scrub.

 5. insert a 'step 1.5, reboot' or 'step 2.5, reboot', and although I
didn't test it, I fear checksum errors.  I used to have that
problem, and 6675685 talks about it.  SVM could handle rebooting
during a resilver somewhat well.  I fear at least unwarranted
generosity, like I bet 'step 2.5 reboot' can substitute for 'step
3 scrub', letting me use zpool offline again even though whatever
failsafe was stopping me from using it before can't possibly have
resolved itself.

so, (c) the set of disks which result in 'no valid replicas' when
you attempt to offline them seems to have no valid excuse for
changing across a reboot, yet I'm pretty sure it does.

kind of annoying and confusing.

but, if your plan is to stuff two disks in your bag and catch the next
flight to Tel Aviv, my experience says raidz2 should work ok for that.

 c> 3. burn in the raidset for at least one month before trusting
 c> the disks to not all fail simultaneously.

ic> Has anyone ever seen this happen for real?

yeah.  Among 2

Re: [zfs-discuss] evaluate ZFS ACL

2008-07-23 Thread Paul B. Henson
On Wed, 23 Jul 2008, Ian Collins wrote:

> I don't know if such a tool exists, but I'm in the process or writing one
> (as part of a larger ACL admin tool) if you are intersted.

If there is no standard routine to handle this functionality, I would very
much appreciate a copy of your code...

Thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Bob Friesenhahn
On Wed, 23 Jul 2008, Brandon High wrote:
>
> With raidz2, you can grab any two disks. With mirroring, you have to
> grab the correct two.
>
> Personally, with only 4 drives I would use raidz to increase the
> available storage or mirroring for better performance rather than use
> raidz2.

If mirroring is chosen, then it is also useful to install two 
interface cards and split the mirrors across the cards so that if a 
card (or its driver) fails, the system keeps on running.  I was 
reminded of this just a few days ago when my dual-channel fiber 
channel card locked up and the system paniced since the ZFS pool was 
not accessible.  With two interface cards there would not have been a 
panic.

With raidz and raidz2 it is not easy to achieve the system robustness 
possible when using mirrors.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Tomas Ögren
On 23 July, 2008 - Brandon High sent me these 1,3K bytes:

> On Wed, Jul 23, 2008 at 3:21 PM, Ian Collins <[EMAIL PROTECTED]> wrote:
> >>  2. get four disks and do raidz2.
> >>
> >> In addition to increasing MTTF, this is good because if you need
> >> to leave in a hurry, you can grab two of the disks and still leave
> >> behind a working file server.  I think this is important for home
> >> setups.
> >>
> > I'd use mirrors rather than raidz2.  You should see better performance and
> > you really can grab two of the disks and still leave behind a working file
> > server!
> 
> With raidz2, you can grab any two disks. With mirroring, you have to
> grab the correct two.
> 
> Personally, with only 4 drives I would use raidz to increase the
> available storage or mirroring for better performance rather than use
> raidz2.
> 
> >>  3. burn in the raidset for at least one month before trusting the
> >> disks to not all fail simultaneously.
> >>
> > Has anyone ever seen this happen for real?  I seriously doubt it will happen
> > with new drives.
> 
> My new workstation in the office had it's (sole) 400gb drive die after
> about 2 months. It does happen. Production lots share failure
> characteristics.

Bit errors, failing S.M.A.R.T test after 27 hours.

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Brandon High
On Wed, Jul 23, 2008 at 3:21 PM, Ian Collins <[EMAIL PROTECTED]> wrote:
>>  2. get four disks and do raidz2.
>>
>> In addition to increasing MTTF, this is good because if you need
>> to leave in a hurry, you can grab two of the disks and still leave
>> behind a working file server.  I think this is important for home
>> setups.
>>
> I'd use mirrors rather than raidz2.  You should see better performance and
> you really can grab two of the disks and still leave behind a working file
> server!

With raidz2, you can grab any two disks. With mirroring, you have to
grab the correct two.

Personally, with only 4 drives I would use raidz to increase the
available storage or mirroring for better performance rather than use
raidz2.

>>  3. burn in the raidset for at least one month before trusting the
>> disks to not all fail simultaneously.
>>
> Has anyone ever seen this happen for real?  I seriously doubt it will happen
> with new drives.

My new workstation in the office had it's (sole) 400gb drive die after
about 2 months. It does happen. Production lots share failure
characteristics.

-B

-- 
Brandon High [EMAIL PROTECTED]
"The good is the enemy of the best." - Nietzsche
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC is Solaris 10?

2008-07-23 Thread Brendan Gregg - Sun Microsystems
On Wed, Jul 23, 2008 at 03:20:47PM -0700, Brendan Gregg - Sun Microsystems 
wrote:
> G'Day Jeff,
> 
> On Tue, Jul 22, 2008 at 02:45:13PM -0400, Jeff Taylor wrote:
> > When will L2ARC be available in Solaris 10?
> 
> There are no current plans to back port;

Sorry - I should have said that I wasn't aware if a back port project had
began yet. :)

Brendan

-- 
Brendan
[CA, USA]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Ian Collins
Miles Nordin writes: 

>> "mh" == Matt Harrison <[EMAIL PROTECTED]> writes:
> 
> mh> http://breden.org.uk/2008/03/02/home-fileserver-zfs-hardware/ 
> 
> that's very helpful.  I'll reshop for nForce 570 boards.  i think my
> untested guess was an nForce 630 or something, so it probably won't
> work. 
> 
> I would add: 
> 
>  1. do not get three disks all from the same manufacturer 
> 
>  2. get four disks and do raidz2. 
> 
> In addition to increasing MTTF, this is good because if you need
> to leave in a hurry, you can grab two of the disks and still leave
> behind a working file server.  I think this is important for home
> setups. 
> 
I'd use mirrors rather than raidz2.  You should see better performance and 
you really can grab two of the disks and still leave behind a working file 
server! 

>  3. burn in the raidset for at least one month before trusting the
> disks to not all fail simultaneously. 
> 
Has anyone ever seen this happen for real?  I seriously doubt it will happen 
with new drives. 

Ian 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC is Solaris 10?

2008-07-23 Thread Brendan Gregg - Sun Microsystems
G'Day Jeff,

On Tue, Jul 22, 2008 at 02:45:13PM -0400, Jeff Taylor wrote:
> When will L2ARC be available in Solaris 10?

There are no current plans to back port; if we were to, I think it would be
ideal (or maybe a requirement) to sync up zpool features:

VER  DESCRIPTION
---  
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property
 7   Separate intent log devices
 8   Delegated administration
 9   refquota and refreservation properties
 10  Cache devices

(that's from "zpool upgrade -v" on Solaris Nevada).  L2ARC devices are
version 10, the last Solaris 10 rease I saw was version 4.

However we are working on getting the L2ARC into the hands of customers
much sooner - they won't need to wait for a Solaris 10 backport.

Brendan

-- 
Brendan
[CA, USA]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS questions

2008-07-23 Thread Richard Elling
Thommy M. wrote:
> Richard Gilmore wrote:
>   
>> Hello Zfs Community,
>>
>> I am trying to locate if zfs has a compatible tool to Veritas's 
>> vxbench?  Any ideas?  I see a tool called vdbench that looks close, but 
>> it is not a Sun tool, does Sun recommend something to customers moving 
>> from Veritas to ZFS and like vxbench and its capabilities?
>> 
>
>
> filebench
>
> http://sourceforge.net/projects/filebench/
> http://www.solarisinternals.com/wiki/index.php/FileBench
> http://blogs.sun.com/dom/entry/filebench:_a_zfs_v_vxfs
>   

Also, /usr/benchmarks/filebench for later Solaris releases.

IIRC, vdbench is in process of becoming open source, but I do
not know the current status.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Brandon High
On Wed, Jul 23, 2008 at 2:05 PM, Steve <[EMAIL PROTECTED]> wrote:
> bhigh:
> so the best is 780G?

I'm not sure if it's the best, but it's a good choice. A motherboard
and cpu can be had for about $150. Personally, I'm waiting for the AMD
790GX / SB750 which is due out this month. The 780G has 1 x16 PCIe
slot, the 790GX uses 2 x16 (x8 electrical) slots. I'm planning on
using an LSI 1068e based controller to add more drives, which has an
x8 physical connector.

The nForce 570 works and is well supported, but doesn't have
integrated video. The Nvidia 8200 which has video should be supported
as well. I believe both chipsets support 6 SATA ports.

My current shopping list is here:
http://secure.newegg.com/WishList/PublicWishDetail.aspx?WishListNumber=7739092

This system will act as a NAS, backup location, and media server for
our Roku and Popcorn Hour media players.

The system will boot from flash using the CF to IDE converter. The two
cards will be mirrored. The drives will be in a raidz2.

The motherboard I've chosen is a 780G board, but has 2 x16 slots. If I
decide to add more drives, I want to have the option of a second
controller.

I could use a Sil3132 based card instead of the LSI, which would give
me exactly 8 SATA ports and save about $250. I may still go this route
but given the overall cost it's not that big of a deal.

-B

-- 
Brandon High [EMAIL PROTECTED]
"The good is the enemy of the best." - Nietzsche
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs:zfs_arc_max

2008-07-23 Thread Richard Elling
W. Wayne Liauh wrote:
> Is it possible to input the value of zfs:zfs_arc_max in 10-based format or 
> other more common form (e.g., zfs:zfs_arc_max = 1GB, etc.), in addition to 
> the current hex format?
>  
>   

Parameters set in /etc/system follow the rules as described in the
system(4) man page.  As per normal, hex numbers start with 0x,
octal with 0, and decimals [1-9].
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Miles Nordin
> "mh" == Matt Harrison <[EMAIL PROTECTED]> writes:

mh> http://breden.org.uk/2008/03/02/home-fileserver-zfs-hardware/

that's very helpful.  I'll reshop for nForce 570 boards.  i think my
untested guess was an nForce 630 or something, so it probably won't
work.

I would add:

 1. do not get three disks all from the same manufacturer

 2. get four disks and do raidz2.

In addition to increasing MTTF, this is good because if you need
to leave in a hurry, you can grab two of the disks and still leave
behind a working file server.  I think this is important for home
setups.

 3. burn in the raidset for at least one month before trusting the
disks to not all fail simultaneously.

The three steps are really necessary with the bottom-shelf drives they
are feeding us.


pgprYtC6DkLt4.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Miles Nordin
> "s" == Steve  <[EMAIL PROTECTED]> writes:

 s> Apart from the other components, the main problem is to choose
 s> the motherboard. The offer is incredibly high and I'm lost.

here is cut-and-paste of my shopping so far:


2008-07-18
 via
  http://www.logicsupply.com/products/sn1eg -- 4 sata.  $251

 opteron
  1U barebones: Tyan B2935G28V4H
Supermicro H8DMU+
  amd opteron 2344he x2 $412
 bad choice.  stepping B3 needed to avoid TLB bug, xx50he or higher
  amd opteron 2352 x2   $628
  kingston kvr667d2d8p5/2g $440
  motherboard Supermicro H8DMU+ supports steppping BA
  Tyan 2915-E and other -E supports stepping BA
TYAN S3992G3NR-E $430
also avail from 
https://secure.flickerdown.com/index.php?crn=290&rn=497&action=show_\
detail

 phenom
  phenom 9550  $175
 do not get 9600.  it has the B2 stepping TLB bug.
  crucial CT2KIT25672AA667 x2 ~$200
  ecs NFORCE6M-A(3.0)   $50
 downside: old, many reports of DoA, realtek ethernet according to newegg 
comment?--\
-often they uselessly give the PHY model, no builtin video?!
  ASRock ALiveNF7G or  ABIT AN-M2HD $85
 nforce ethernet, builtin video, relatively new (2007-09) chip.  downside:  
slow HT \
bus?

This is **NOT** very helpful to you because none of it is tested with
OpenSolaris.  There are a few things to consider:

 * can you possibly buy something, and then bury it in the sand for a
   year?  or two years if you want it to work with the stable Solaris build.
   or maybe replace a Linux box with new hardware, and run
   OpenSolaris on the old hardware?

 * look on wikipedia to see the stepping of the AMD chip you're
   looking at.  some steppings of the quad-core chips are
   unfashionable.

 * may have better hardware support in SXCE, because OpenSolaris can
   only include closed-source drivers which are freely
   redistributable.  It includes a lot of closed drivers, but maybe
   you'll get some more with SXCE, particularly for SATA chips.

   Unfortunately I don't know one page where you can get a quick view
   of the freedom status of each driver.  I think it is hard even to
   RTFS because some of the drivers are in different ``gates'' than
   the main one, but I'm not sure.  I care about software freedom and 
   get burned on this repeatedly.  And there are people in here a couple 
   times asking for Marvell source to fix a lockup bug or add hotplug, 
   and they cannot get it.  

 * the only network card which works well is the Intel gigabit cards.
   All the other cards, if they work, it is highly dependent on which
   exact stepping, revision, and PHY of the chip you get whether the
   card will work at all, and whether or not it'll have serious
   performance problems.  but intel cards, copper, fiber, new, old,
   3.3V, 5V, PCI-e, have a much better shot of working than the
   broadcom 57xx, via, or realtek.  i was planning to try an nForce on
   the cheap desktop board and hope for luck, then put an intel card
   in the slow 33mhz pci slot if it doesn't work.

 * a lot of motherboards on newegg say they have a ``realtek'' gigabit
   chip, but that's just because they're idiots.  It's really an
   nForce gigabit chip, with a realtek PHY.  i don't know if this
   works well.

 * it sounds like the only SATA card that works well with Solaris is
   the LSI mpt board.  There have been reports of problems and poor
   performance with basically everything else, and in particular the
   AMD northbridge (that's why I picked less-open NVidia chips above).
   the supermicro marvell card his highly sensitive to chipset? or
   BIOS? revisions.  maybe the Sil3124 is okay, I dont know.  I have
   been buying sil3124 from newegg, though they've been through two
   chip steppings silently in the last 6months.  In any case, you
   should plan on plugging your disks into a PCI card, not the
   motherboard, so that you can try a few differnet cards when the
   first one starts locking up for 2s every 5min, or locking up all
   the ports when a bad disk is attached to one port, or giving really
   slow performance, some other weird bullshit.

 * the server boards are nice for solaris because:

   + they can have 3.3V PCI slots, so you can use old boards (which
 have working drivers) on a 64-bit 100mhz bus.  The desktop boards
 will give you a fast interface only in PCIe format, not PCI.

   + they take 4x as much memory as desktop (2x as much per CPU, and 2
 CPUs), though you do have to buy ``registered/buffered'' memory instead
 of ``unregistered/unbuffered'')

   + the chipsets demanded by quad-core are older, I think, and maybe 
 more likely to work.  It is even possible to get LSI mpt onboard 
 with some of them, but maybe it is the wrong stepping of mpt or 
 something.

 * the nVidia boards with 6 sata ports have only 4 useable sata ports.
   the other two ports are behind some kind of goofyraid controller.  
   anyway, plan on running your di

Re: [zfs-discuss] evaluate ZFS ACL

2008-07-23 Thread Ian Collins
Paul B. Henson writes: 

> 
> I was curious if there was any utility or library function available to
> evaluate a ZFS ACL. The standard POSIX access(2) call is available to
> evaluate access by the current process, but I would like to evaluate an ACL
> in one process that would be able to determine whether or not some other
> user had a particular permission. Obviously, the running process would need
> to have read privileges on the ACL itself, but I'd rather not reimplement
> the complexity of actually interpreting the ACL. Something like: 
> 
>   access("/path/to/file", R_OK, 400) 
> 
> Where 400 is the UID of the user whose access should be tested. Clearly
> there is already code to do so within the filesystem layer, given that
> privileges are enforced. It's probably unlikely, but I was hoping this code
> could be reutilized from a user level process to make the same
> determination rather than having to read the entire ACL, verify what groups
> the user is in, etc. 
> 
> Thanks for any suggestions... 
> 
I don't know if such a tool exists, but I'm in the process or writing one 
(as part of a larger ACL admin tool) if you are intersted. 

Ian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] evaluate ZFS ACL

2008-07-23 Thread Paul B. Henson

I was curious if there was any utility or library function available to
evaluate a ZFS ACL. The standard POSIX access(2) call is available to
evaluate access by the current process, but I would like to evaluate an ACL
in one process that would be able to determine whether or not some other
user had a particular permission. Obviously, the running process would need
to have read privileges on the ACL itself, but I'd rather not reimplement
the complexity of actually interpreting the ACL. Something like:

access("/path/to/file", R_OK, 400)

Where 400 is the UID of the user whose access should be tested. Clearly
there is already code to do so within the filesystem layer, given that
privileges are enforced. It's probably unlikely, but I was hoping this code
could be reutilized from a user level process to make the same
determination rather than having to read the entire ACL, verify what groups
the user is in, etc.

Thanks for any suggestions...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS questions

2008-07-23 Thread Thommy M.
Richard Gilmore wrote:
> Hello Zfs Community,
> 
> I am trying to locate if zfs has a compatible tool to Veritas's 
> vxbench?  Any ideas?  I see a tool called vdbench that looks close, but 
> it is not a Sun tool, does Sun recommend something to customers moving 
> from Veritas to ZFS and like vxbench and its capabilities?


filebench

http://sourceforge.net/projects/filebench/
http://www.solarisinternals.com/wiki/index.php/FileBench
http://blogs.sun.com/dom/entry/filebench:_a_zfs_v_vxfs

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Steve
Thank you for all the replays!
(and in the meantime I was just having a dinner! :-)

To recap:

tcook:
you are right, in fact I'm thinking to have just 3/4 for now, without anything 
else (no cd/dvd, no videocard, nothing else than mb and drives)
the case will be the second choice, but I'll try to stick to micro ATX for 
space reason

Charles Menser:
4 is ok, so is the "ASUS M2A-VM" good?

Matt Harrison:
The post is superb (very compliment to Simon)! And in fact I was already on 
that, but the MB is unfortunatly ATX. If it will be the only or the suggested 
choice I would go for it, but I hope there will be a littler one

bhigh:
so the best is 780G?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs:zfs_arc_max

2008-07-23 Thread W. Wayne Liauh
Is it possible to input the value of zfs:zfs_arc_max in 10-based format or 
other more common form (e.g., zfs:zfs_arc_max = 1GB, etc.), in addition to the 
current hex format?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS questions

2008-07-23 Thread Richard Gilmore
Hello Zfs Community,

I am trying to locate if zfs has a compatible tool to Veritas's 
vxbench?  Any ideas?  I see a tool called vdbench that looks close, but 
it is not a Sun tool, does Sun recommend something to customers moving 
from Veritas to ZFS and like vxbench and its capabilities?

Thanks,
Richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Brandon High
On Wed, Jul 23, 2008 at 12:37 PM, Steve <[EMAIL PROTECTED]> wrote:
> Minimum requisites should be:
> - working well with Open Solaris ;-)
> - micro ATX (I would put in a little case)
> - low power consumption but more important reliable (!)
> - with Gigabit ethernet
> - 4+ (even better 6+) sata 3gb controller

I'm pretty sure the AMD 780G/SB700 works with Opensolaris in AHCI
mode. There may be a few 780G/SB600 boards, so make sure you check.
I'm not sure how well the integrated video works. The chipset combined
with a 45W CPU should have low power draw. The SB700 can handle up to
6 SATA ports.

Be wary of the SB600 - There's a DMA issue with the controller when
using more than 2GB memory.

There are a lot of 780G boards available in all sorts of form factors
from almost every manufacturer.

> Also: what type of RAM to select toghether? (I would chose if good ECC, but 
> the rest?)

2GB or more of ECC should do it. I believe all the AMD CPUs support
ECC, but you should verify this before buying.

-B

-- 
Brandon High [EMAIL PROTECTED]
"The good is the enemy of the best." - Nietzsche
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Matt Harrison
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Steve wrote:
| I'm a fan of ZFS since I've read about it last year.
|
| Now I'm on the way to build a home fileserver and I'm thinking to go
with Opensolaris and eventually ZFS!!
|
| Apart from the other components, the main problem is to choose the
motherboard. The offer is incredibly high and I'm lost.
|
| Minimum requisites should be:
| - working well with Open Solaris ;-)
| - micro ATX (I would put in a little case)
| - low power consumption but more important reliable (!)
| - with Gigabit ethernet
| - 4+ (even better 6+) sata 3gb controller
|
| Also: what type of RAM to select toghether? (I would chose if good
ECC, but the rest?)
|
| Does it make sense? What are the possibilities?
|

I have just setup a home fileserver with ZFS on opensolaris, I used some
posts from a blog to choose my hardware and eventually went with exactly
the same as the author. I can confirm that after 3 months of running
there hasn't even been a hint of a problem with the hardware choice.

You can see the hardware post here

http://breden.org.uk/2008/03/02/home-fileserver-zfs-hardware/

Hope this helps you decide a bit more easily.

Matt
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)

iEYEARECAAYFAkiHk7AACgkQxNZfa+YAUWHYdQCg8N6FJUWe24jbja8Si1SpCRzl
vj8AoK0qYEHjo0sslB4uogrU2dwjwTxQ
=D/Rf
-END PGP SIGNATURE-

No virus found in this outgoing message.
Checked by AVG - http://www.avg.com 
Version: 8.0.138 / Virus Database: 270.5.4/1567 - Release Date: 22/07/2008 16:05


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] Peak every 4-5 second

2008-07-23 Thread Brandon High
On Tue, Jul 22, 2008 at 10:35 PM, Tharindu Rukshan Bamunuarachchi
<[EMAIL PROTECTED]> wrote:
>
> Dear Mark/All,
>
> Our trading system is writing to local and/or array volume at 10k
> messages per second.
> Each message is about 700bytes in size.
>
> Before ZFS, we used UFS.
> Even with UFS, there was evey 5 second peak due to fsflush invocation.
>
> However each peak is about ~5ms.
> Our application can not recover from such higher latency.

Is the pool using raidz, raidz2, or mirroring? How many drives are you using?

-B

-- 
Brandon High [EMAIL PROTECTED]
"The good is the enemy of the best." - Nietzsche
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Charles Menser
I am wondering how many SATA controllers most motherboards have for
their built-in SATA ports.

Mine, an ASUS M2A-VM, has four ports, but OpenSolaris reports them as
belonging to two controllers.

I have seen motherboards with 6+ SATA ports, and would love to know if
any of them have more controller density or if two-to-one is the norm.

Charles

On Wed, Jul 23, 2008 at 3:37 PM, Steve <[EMAIL PROTECTED]> wrote:
> I'm a fan of ZFS since I've read about it last year.
>
> Now I'm on the way to build a home fileserver and I'm thinking to go with 
> Opensolaris and eventually ZFS!!
>
> Apart from the other components, the main problem is to choose the 
> motherboard. The offer is incredibly high and I'm lost.
>
> Minimum requisites should be:
> - working well with Open Solaris ;-)
> - micro ATX (I would put in a little case)
> - low power consumption but more important reliable (!)
> - with Gigabit ethernet
> - 4+ (even better 6+) sata 3gb controller
>
> Also: what type of RAM to select toghether? (I would chose if good ECC, but 
> the rest?)
>
> Does it make sense? What are the possibilities?
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Tim
On Wed, Jul 23, 2008 at 2:37 PM, Steve <[EMAIL PROTECTED]> wrote:

> I'm a fan of ZFS since I've read about it last year.
>
> Now I'm on the way to build a home fileserver and I'm thinking to go with
> Opensolaris and eventually ZFS!!
>
> Apart from the other components, the main problem is to choose the
> motherboard. The offer is incredibly high and I'm lost.
>
> Minimum requisites should be:
> - working well with Open Solaris ;-)
> - micro ATX (I would put in a little case)
> - low power consumption but more important reliable (!)
> - with Gigabit ethernet
> - 4+ (even better 6+) sata 3gb controller
>
> Also: what type of RAM to select toghether? (I would chose if good ECC, but
> the rest?)
>
> Does it make sense? What are the possibilities?
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


Just wondering what case you're going to put a micro-atx motherboard in
that's going to support 6+ drives without overheating.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Steve
I'm a fan of ZFS since I've read about it last year.

Now I'm on the way to build a home fileserver and I'm thinking to go with 
Opensolaris and eventually ZFS!!

Apart from the other components, the main problem is to choose the motherboard. 
The offer is incredibly high and I'm lost.

Minimum requisites should be:
- working well with Open Solaris ;-)
- micro ATX (I would put in a little case)
- low power consumption but more important reliable (!)
- with Gigabit ethernet
- 4+ (even better 6+) sata 3gb controller

Also: what type of RAM to select toghether? (I would chose if good ECC, but the 
rest?)

Does it make sense? What are the possibilities?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool

2008-07-23 Thread Richard Elling
Rainer Orth wrote:
> Richard Elling writes:
>
>   
>>> I've found out what the problem was: I didn't specify the -F zfs option to
>>> installboot, so only half of the ZFS bootblock was written.  This is a
>>> combination of two documentation bugs and a terrible interface:
>>>   
>>>   
>> Mainly because there is no -F option?
>> 
>
> Huh?  From /usr/sbin/installboot:
>   

Which build you see this? It isn't in the online source
browser or b93... there might be another issue lurking here...

> COUNT=15
>
> while getopts F: a; do
> case $a in
> F) case $OPTARG in
>ufs) COUNT=15;;
>hsfs) COUNT=15;;
>zfs) COUNT=31;;
>*) away 1 "$OPTARG: Unknown fstype";;
>esac;;
>
> Without -F zfs, only part of the zfs bootblock would be copied.
>
>   
>> I think that it should be very unusual that installboot would be run
>> interactively.  That is really no excuse for making it only slightly
>> 
>
> Indeed: it should mostly be run behind the scenes e.g. by live upgrade, but
> obviously there are scenarios where it is necessary (like this one).
>
>   
>> smarter than dd, but it might be hard to justify changes unless some
>> kind person were to submit a bug with an improved implementation
>> (would make a good short project for someone :-)
>> 
>
> The problem here might be that an improved implementation would probably
> mean an incompatible change (like doing away with the explicit bootblk
> argument).
>
>   

Yes, though there are good reasons to use other bootblks.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] Peak every 4-5 second

2008-07-23 Thread Richard Elling
[EMAIL PROTECTED] wrote:
> On Wed, 23 Jul 2008, Tharindu Rukshan Bamunuarachchi wrote:
>
>   
>> 10,000 x 700 = 7MB per second ..
>>
>> We have this rate for whole day 
>>
>> 10,000 orders per second is minimum requirments of modern day stock 
>> exchanges ...
>>
>> Cache still help us for ~1 hours, but after that who will help us ...
>>
>> We are using 2540 for current testing ...
>> I have tried same with 6140, but no significant improvement ... only one or 
>> two hours ...
>> 
>
> It might not be exactly what you have in mind, but this "how do I get 
> latency down at all costs" thing reminded me of this old paper:
>
>   http://www.sun.com/blueprints/1000/layout.pdf
>
> I'm not a storage architect, someone with more experience in the area care 
> to comment on this ? With huge disks as we have these days, the "wide 
> thin" idea has gone under a bit - but how to replace such setups with 
> modern arrays, if the workload is such that caches eventually must get 
> blown and you're down to spindle speed ?
>   

Bob Larson wrote that article, and I would love to ask him for an
update.  Unfortunately, he passed away a few years ago :-(
http://blogs.sun.com/relling/entry/bob_larson_my_friend

I think the model still holds true, the per-disk performance hasn't
significantly changed since it was written.

This particular problem screams for a queuing model.  You don't
really need to have a huge cache as long as you can de-stage
efficiently.  However, the original poster hasn't shared the read
workload details... if you never read, it is a trivial problem to
solve with a WOM.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool

2008-07-23 Thread Mark J Musante
On Wed, 23 Jul 2008, [EMAIL PROTECTED] wrote:

> Rainer,
>
> Sorry for your trouble.
>
> I'm updating the installboot example in the ZFS Admin Guide with the
> -F zfs syntax now. We'll fix the installboot man page as well.
>
> Mark, I don't have an x86 system to test right now, can you send me the 
> correct installgrub syntax for booting a ZFS file system?

It's just installboot that had to change.  The installgrub CLI remains the 
same.



Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94

2008-07-23 Thread Jürgen Keil
I wrote:
> Bill Sommerfeld wrote:
> > On Fri, 2008-07-18 at 10:28 -0700, Jürgen Keil wrote:
> > > > I ran a scrub on a root pool after upgrading to snv_94, and got 
> > > > checksum errors:
> > > 
> > > Hmm, after reading this, I started a zpool scrub on my mirrored pool, 
> > > on a system that is running post snv_94 bits:  It also found checksum 
> > > errors
> > > 
> > once is accident.  twice is coincidence.  three times is enemy action :-)
> > 
> > I'll file a bug as soon as I can 
> 
> I filed 6727872, for the problem with zpool scrub checksum errors
> on unmounted zfs filesystems with an unplayed ZIL.

6727872 has already been fixed, in what will become snv_96.

For my zpool, zpool scrub doesn't report checksum errors any more.

But: something is still a bit strange with the data reported by zpool status.
The error counts displayed by zpool status are all 0 (during the scrub, and when
the scrub has completed), but when zpool scrub completes it tells me that
"scrub completed after 0h58m with 6 errors".  But it doesn't list the errors.

# zpool status -v files
  pool: files
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: scrub in progress for 0h57m, 99.39% done, 0h0m to go
config:

NAME  STATE READ WRITE CKSUM
files ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c8t0d0s6  ONLINE   0 0 0
c9t0d0s6  ONLINE   0 0 0

errors: No known data errors


# zpool status -v files
  pool: files
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: scrub completed after 0h58m with 6 errors on Wed Jul 23 18:23:00 2008
config:

NAME  STATE READ WRITE CKSUM
files ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c8t0d0s6  ONLINE   0 0 0
c9t0d0s6  ONLINE   0 0 0

errors: No known data errors
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool

2008-07-23 Thread Rainer Orth
Cindy,

> Sorry for your trouble.

no problem.

> I'm updating the installboot example in the ZFS Admin Guide with the
> -F zfs syntax now. We'll fix the installboot man page as well.

Great, thanks.

Rainer

-
Rainer Orth, Faculty of Technology, Bielefeld University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool

2008-07-23 Thread Cindy . Swearingen
Rainer,

Sorry for your trouble.

I'm updating the installboot example in the ZFS Admin Guide with the
-F zfs syntax now. We'll fix the installboot man page as well.

Mark, I don't have an x86 system to test right now, can you send
me the correct installgrub syntax for booting a ZFS file system?

Thanks,

Cindy

Rainer Orth wrote:
> Rainer Orth <[EMAIL PROTECTED]> writes:
> 
> 
>>>instlalboot on the new disk and see if that fixes it.
>>
>>Unfortunately, it didn't.  Reconsidering now, I see that I ran installboot
>>against slice 0 (reduced by 1 sector as required by CR 6680633) instead of
>>slice 2 (whole disk).  Doing so doesn't fix the problem either, though.
> 
> 
> I've found out what the problem was: I didn't specify the -F zfs option to
> installboot, so only half of the ZFS bootblock was written.  This is a
> combination of two documentation bugs and a terrible interface:
> 
> * With the introduction of zfs boot, installboot got a new -F 
>   option.  Unfortunately, this is documented neither on installboot(1M)
>   (which wasn't update at all, it seems) nor in the ZFS Admin Guide
>   (p.80, workaround for CR 6668666).
> 
> * Apart from that, I've never understood why it is necessary to specify the
>   full path to the bootblock to installboot like this
> 
> installboot /usr/platform/`uname -i`/lib/fs//bootblk 
> /dev/rdsk/c0t0d0s0
> 
>   It would be far easier to just specify the fstype (or even let
>   installboot figure that out by itself using fstyp) than having to give
>   the full pathname.  In that case, installboot could just dd the whole
>   bootblk file instead of hardcoding the block counts for the different
>   filesystem types (probably to avoid corrupting the filesystem if the user
>   gives a file that is not a bootblock).
> 
> Overall, a terrible mess ;-(
> 
> Regards.
>   Rainer
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool

2008-07-23 Thread Rainer Orth
Richard Elling writes:

> > I've found out what the problem was: I didn't specify the -F zfs option to
> > installboot, so only half of the ZFS bootblock was written.  This is a
> > combination of two documentation bugs and a terrible interface:
> >   
> 
> Mainly because there is no -F option?

Huh?  From /usr/sbin/installboot:

COUNT=15

while getopts F: a; do
case $a in
F) case $OPTARG in
   ufs) COUNT=15;;
   hsfs) COUNT=15;;
   zfs) COUNT=31;;
   *) away 1 "$OPTARG: Unknown fstype";;
   esac;;

Without -F zfs, only part of the zfs bootblock would be copied.

> I think that it should be very unusual that installboot would be run
> interactively.  That is really no excuse for making it only slightly

Indeed: it should mostly be run behind the scenes e.g. by live upgrade, but
obviously there are scenarios where it is necessary (like this one).

> smarter than dd, but it might be hard to justify changes unless some
> kind person were to submit a bug with an improved implementation
> (would make a good short project for someone :-)

The problem here might be that an improved implementation would probably
mean an incompatible change (like doing away with the explicit bootblk
argument).

Unfortunately, I've too many other issues on my plate right now to attack
this one.

Rainer

-
Rainer Orth, Faculty of Technology, Bielefeld University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] Peak every 4-5 second

2008-07-23 Thread Ellis, Mike
Would adding a dedicated ZIL/SLOG (what is the difference between those 2 
exactly? Is there one?) help meet your requirement?

The idea would be to use some sort of relatively large SSD drive of some 
variety to absorb the initial write-hit. After hours when things quieit down 
(or perhaps during "slow periods" in the day) data is transparently destaged 
into the main disk-pool, providing you a transparent/rudimentary form of HSM. 

Have a look at Adam Leventhal's blog and ACM article for some interesting 
perspectives on this stuff... (Specifically the potential "return of the 3600 
rpm drive" ;-)

Thanks -- mikee


- Original Message -
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: Tharindu Rukshan Bamunuarachchi <[EMAIL PROTECTED]>
Cc: zfs-discuss@opensolaris.org 
Sent: Wed Jul 23 11:22:51 2008
Subject: Re: [zfs-discuss] [zfs-code] Peak every 4-5 second

On Wed, 23 Jul 2008, Tharindu Rukshan Bamunuarachchi wrote:

> 10,000 x 700 = 7MB per second ..
> 
> We have this rate for whole day 
> 
> 10,000 orders per second is minimum requirments of modern day stock exchanges 
> ...
> 
> Cache still help us for ~1 hours, but after that who will help us ...
> 
> We are using 2540 for current testing ...
> I have tried same with 6140, but no significant improvement ... only one or 
> two hours ...

Does your application request synchronous file writes or use fsync()? 
While normally fsync() slows performance I think that it will also 
serve to even the write response since ZFS will not be buffering lots 
of unwritten data.  However, there may be buffered writes from other 
applications which gets written periodically and which may delay the 
writes from your critical application.  In this case reducing the ARC 
size may help so that the ZFS sync takes less time.

You could also run a script which executes 'sync' every second or two 
in order to convince ZFS to cache less unwritten data. This will cause 
a bit of a performance hit for the whole system though.

You 7MB per second is a very tiny write load so it is worthwhile 
investigating to see if there are other factors which are causing your 
storage system to not perform correctly.  The 2540 is capable of 
supporting writes at hundreds of MB per second.

As an example of "another factor", let's say that you used the 2540 to 
create 6 small LUNs and then put them into a ZFS zraid.  However, in 
this case the 2540 allocated all of the LUNs from the same disk (which 
it is happy to do by default) so now that disk is being severely 
thrashed since it is one disk rather than six.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool

2008-07-23 Thread Richard Elling
Rainer Orth wrote:
> Rainer Orth <[EMAIL PROTECTED]> writes:
>
>   
>>> instlalboot on the new disk and see if that fixes it.
>>>   
>> Unfortunately, it didn't.  Reconsidering now, I see that I ran installboot
>> against slice 0 (reduced by 1 sector as required by CR 6680633) instead of
>> slice 2 (whole disk).  Doing so doesn't fix the problem either, though.
>> 
>
> I've found out what the problem was: I didn't specify the -F zfs option to
> installboot, so only half of the ZFS bootblock was written.  This is a
> combination of two documentation bugs and a terrible interface:
>   

Mainly because there is no -F option?

> * With the introduction of zfs boot, installboot got a new -F 
>   option.  Unfortunately, this is documented neither on installboot(1M)
>   (which wasn't update at all, it seems) nor in the ZFS Admin Guide
>   (p.80, workaround for CR 6668666).
>
> * Apart from that, I've never understood why it is necessary to specify the
>   full path to the bootblock to installboot like this
>
> installboot /usr/platform/`uname -i`/lib/fs//bootblk 
> /dev/rdsk/c0t0d0s0
>   

That is because installboot is simply a wrapper for dd.
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/psm/stand/bootblks/ufs/i386/installboot.sh

The first argument is copied to the second argument using dd.

>   It would be far easier to just specify the fstype (or even let
>   installboot figure that out by itself using fstyp) than having to give
>   the full pathname.  In that case, installboot could just dd the whole
>   bootblk file instead of hardcoding the block counts for the different
>   filesystem types (probably to avoid corrupting the filesystem if the user
>   gives a file that is not a bootblock).
>
> Overall, a terrible mess ;-(
>
>   

I think that it should be very unusual that installboot would be run
interactively.  That is really no excuse for making it only slightly
smarter than dd, but it might be hard to justify changes unless some
kind person were to submit a bug with an improved implementation
(would make a good short project for someone :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Moving ZFS root pool to different system breaks boot

2008-07-23 Thread Rainer Orth
=?UTF-8?Q?J=C3=BCrgen_Keil?= <[EMAIL PROTECTED]> writes:

> > Recently, I needed to move the boot disks containing a ZFS root pool in an
> > Ultra 1/170E running snv_93 to a different system (same hardware) because
> > the original system was broken/unreliable.
> > 
> > To my dismay, unlike with UFS, the new machine wouldn't boot:
> > 
> > WARNING: pool 'root' could not be loaded as it was
> > last accessed by another system (host:  hostid:
> > 0x808f7fd8).  See: http://www.sun.com/msg/ZFS-8000-EY
> > 
> > panic[cpu0]/thread=180e000: BAD TRAP: type=31 rp=180acc0 addr=0 mmu_fsr=0 
> > occurred in module "unix" due to a NULL pointer dereference
> ...
> > suffering from the absence of SPARC failsafe archives after liveupgrade
> > (recently mentioned on install-discuss), I'd have been completely stuck.
[...]
> I guess that on SPARC you could boot from the installation optical media
> (or from a network server), and zpool import -f the root pool; that should
> put the correct hostid into the root pool's label.

That's what I did with the snv_93 UFS BE I had still around, with the
exception that I used zpool import -f -R /mnt to avoid pathname clashes
between the miniroot and the imported pool.  I think I even exported the
pool afterwards, but I'm no longer certain about this: I seem to remember
problems with exported root pools being no longer bootable.

Rainer

-- 
-
Rainer Orth, Faculty of Technology, Bielefeld University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] Peak every 4-5 second

2008-07-23 Thread Bob Friesenhahn
On Wed, 23 Jul 2008, Tharindu Rukshan Bamunuarachchi wrote:

> 10,000 x 700 = 7MB per second ..
> 
> We have this rate for whole day 
> 
> 10,000 orders per second is minimum requirments of modern day stock exchanges 
> ...
> 
> Cache still help us for ~1 hours, but after that who will help us ...
> 
> We are using 2540 for current testing ...
> I have tried same with 6140, but no significant improvement ... only one or 
> two hours ...

Does your application request synchronous file writes or use fsync()? 
While normally fsync() slows performance I think that it will also 
serve to even the write response since ZFS will not be buffering lots 
of unwritten data.  However, there may be buffered writes from other 
applications which gets written periodically and which may delay the 
writes from your critical application.  In this case reducing the ARC 
size may help so that the ZFS sync takes less time.

You could also run a script which executes 'sync' every second or two 
in order to convince ZFS to cache less unwritten data. This will cause 
a bit of a performance hit for the whole system though.

You 7MB per second is a very tiny write load so it is worthwhile 
investigating to see if there are other factors which are causing your 
storage system to not perform correctly.  The 2540 is capable of 
supporting writes at hundreds of MB per second.

As an example of "another factor", let's say that you used the 2540 to 
create 6 small LUNs and then put them into a ZFS zraid.  However, in 
this case the 2540 allocated all of the LUNs from the same disk (which 
it is happy to do by default) so now that disk is being severely 
thrashed since it is one disk rather than six.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Moving ZFS root pool to different system breaks boot

2008-07-23 Thread Jürgen Keil
> Recently, I needed to move the boot disks containing a ZFS root pool in an
> Ultra 1/170E running snv_93 to a different system (same hardware) because
> the original system was broken/unreliable.
> 
> To my dismay, unlike with UFS, the new machine wouldn't boot:
> 
> WARNING: pool 'root' could not be loaded as it was
> last accessed by another system (host:  hostid:
> 0x808f7fd8).  See: http://www.sun.com/msg/ZFS-8000-EY
> 
> panic[cpu0]/thread=180e000: BAD TRAP: type=31 rp=180acc0 addr=0 mmu_fsr=0 
> occurred in module "unix" due to a NULL pointer dereference
...
> suffering from the absence of SPARC failsafe archives after liveupgrade
> (recently mentioned on install-discuss), I'd have been completely stuck.

Yes, on x86 you can boot into failsafe and let it mount the root pool
under /a and then reboot.  This removes the hostid from the configuration
information in the zpool's label.

I guess that on SPARC you could boot from the installation optical media
(or from a network server), and zpool import -f the root pool; that should
put the correct hostid into the root pool's label.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OT: Formatting Problem of ZFS Adm Guide (pdf)

2008-07-23 Thread David Collier-Brown
  One can carve furniture with an axe, especially if it's razor-sharp,
but that doesn't make it a spokeshave, plane and saw.

  I love star office, and use it every day, but my publisher uses
Frame, so that's what I use for books.

--dave

W. Wayne Liauh wrote:
>>I doubt so. Star/OpenOffice are word processors...
>>and like Word they are not suitable for typesetting
>>documents.
>>
>>SGML, FrameMaker & TeX/LateX are the only ones
>>capable of doing that.
> 
> 
> This was pretty much true about a year ago.  However, after version 2.3, 
> which adds the kerning feature, OpenOffice.org can produce very 
> professionally looking documents.
> 
> All of the OOo User Guides, which are every bit as complex as if not more so 
> than our own user guides, are now "self-generated".  Solveig Haugland, a 
> highly respected OpenOffice.org consultant, published her book 
> "OpenOffice.org 2 Guidebook" (a 527-page book complete with drawings, table 
> of contents, multi-column index, etc.) entirely on OOo.
> 
> Another key consideration, in addition to perhaps a desire to support our 
> sister product, is that the documents so generated are guaranteed to be 
> displayable on the OS they are intended to serve.  This is a pretty important 
> consideration IMO.  :-)
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 

-- 
David Collier-Brown| Always do right. This will gratify
Sun Microsystems, Toronto  | some people and astonish the rest
[EMAIL PROTECTED] |  -- Mark Twain
(905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583
bridge: (877) 385-4099 code: 506 9191#
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool

2008-07-23 Thread Rainer Orth
Rainer Orth <[EMAIL PROTECTED]> writes:

> > instlalboot on the new disk and see if that fixes it.
> 
> Unfortunately, it didn't.  Reconsidering now, I see that I ran installboot
> against slice 0 (reduced by 1 sector as required by CR 6680633) instead of
> slice 2 (whole disk).  Doing so doesn't fix the problem either, though.

I've found out what the problem was: I didn't specify the -F zfs option to
installboot, so only half of the ZFS bootblock was written.  This is a
combination of two documentation bugs and a terrible interface:

* With the introduction of zfs boot, installboot got a new -F 
  option.  Unfortunately, this is documented neither on installboot(1M)
  (which wasn't update at all, it seems) nor in the ZFS Admin Guide
  (p.80, workaround for CR 6668666).

* Apart from that, I've never understood why it is necessary to specify the
  full path to the bootblock to installboot like this

installboot /usr/platform/`uname -i`/lib/fs//bootblk /dev/rdsk/c0t0d0s0

  It would be far easier to just specify the fstype (or even let
  installboot figure that out by itself using fstyp) than having to give
  the full pathname.  In that case, installboot could just dd the whole
  bootblk file instead of hardcoding the block counts for the different
  filesystem types (probably to avoid corrupting the filesystem if the user
  gives a file that is not a bootblock).

Overall, a terrible mess ;-(

Regards.
Rainer

-- 
-
Rainer Orth, Faculty of Technology, Bielefeld University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-23 Thread Mike Gerdts
On Tue, Jul 22, 2008 at 10:44 PM, Erik Trimble <[EMAIL PROTECTED]> wrote:
> More than anything, Bob's reply is my major feeling on this.  Dedup may
> indeed turn out to be quite useful, but honestly, there's no broad data
> which says that it is a Big Win (tm) _right_now_, compared to finishing
> other features.  I'd really want a Engineering Study about the
> real-world use (i.e. what percentage of the userbase _could_ use such a
> feature, and what percentage _would_ use it, and exactly how useful
> would each segment find it...) before bumping it up in the priority
> queue of work to be done on ZFS.

I get this.  However, for most of my uses of clones dedup is
considered finishing the job.  Without it, I run the risk of having
way more writable data than I can restore.  Another solution to this
is to consider the output of "zfs send" to be a stable format and get
integration with enterprise backup software that can perform restores
in a way that maintains space efficiency.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] Peak every 4-5 second

2008-07-23 Thread Frank . Hofmann
On Wed, 23 Jul 2008, Tharindu Rukshan Bamunuarachchi wrote:

> 10,000 x 700 = 7MB per second ..
> 
> We have this rate for whole day 
> 
> 10,000 orders per second is minimum requirments of modern day stock exchanges 
> ...
> 
> Cache still help us for ~1 hours, but after that who will help us ...
> 
> We are using 2540 for current testing ...
> I have tried same with 6140, but no significant improvement ... only one or 
> two hours ...

It might not be exactly what you have in mind, but this "how do I get 
latency down at all costs" thing reminded me of this old paper:

http://www.sun.com/blueprints/1000/layout.pdf

I'm not a storage architect, someone with more experience in the area care 
to comment on this ? With huge disks as we have these days, the "wide 
thin" idea has gone under a bit - but how to replace such setups with 
modern arrays, if the workload is such that caches eventually must get 
blown and you're down to spindle speed ?

FrankH.

> 
> Robert Milkowski wrote:
>
>  Hello Tharindu,
> 
> Wednesday, July 23, 2008, 6:35:33 AM, you wrote:
> 
> TRB> Dear Mark/All,
> 
> TRB> Our trading system is writing to local and/or array volume at 10k 
> TRB> messages per second.
> TRB> Each message is about 700bytes in size.
> 
> TRB> Before ZFS, we used UFS.
> TRB> Even with UFS, there was evey 5 second peak due to fsflush invocation.
> 
> TRB> However each peak is about ~5ms.
> TRB> Our application can not recover from such higher latency.
> 
> TRB> So we used several tuning parameters (tune_r_* and autoup) to decrease
> TRB> the flush interval.
> TRB> As a result peaks came down to ~1.5ms. But it is still too high for our
> TRB> application.
> 
> TRB> I believe, if we could reduce ZFS sync interval down to ~1s, peaks will
> TRB> be reduced to ~1ms or less.
> TRB> We like <1ms peaks per second than 5ms peak per 5 second :-)
> 
> TRB> Are there any tunable, so i can reduce ZFS sync interval.
> TRB> If there is no any tunable, can not I use "mdb" for the job ...?
> 
> TRB> This is not general and we are ok with increased I/O rate.
> TRB> Please advice/help.
> 
> txt_time/D
> 
> btw:
>  10,000 * 700 = ~7MB
> 
> What's your storage subsystem? Any, even small, raid device with write
> cache should help.
> 
>
> 
> 
> 
>

--
No good can come from selling your freedom, not for all the gold in the world,
for the value of this heavenly gift far exceeds that of any fortune on earth.
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] Peak every 4-5 second

2008-07-23 Thread Tharindu Rukshan Bamunuarachchi




> txt_time/D
mdb: failed to dereference symbol: unknown symbol name
> txg_time/D
mdb: failed to dereference symbol: unknown symbol name


Am I doing something wrong 

Robert Milkowski wrote:

  Hello Tharindu,

Wednesday, July 23, 2008, 6:35:33 AM, you wrote:

TRB> Dear Mark/All,

TRB> Our trading system is writing to local and/or array volume at 10k 
TRB> messages per second.
TRB> Each message is about 700bytes in size.

TRB> Before ZFS, we used UFS.
TRB> Even with UFS, there was evey 5 second peak due to fsflush invocation.

TRB> However each peak is about ~5ms.
TRB> Our application can not recover from such higher latency.

TRB> So we used several tuning parameters (tune_r_* and autoup) to decrease
TRB> the flush interval.
TRB> As a result peaks came down to ~1.5ms. But it is still too high for our
TRB> application.

TRB> I believe, if we could reduce ZFS sync interval down to ~1s, peaks will
TRB> be reduced to ~1ms or less.
TRB> We like <1ms peaks per second than 5ms peak per 5 second :-)

TRB> Are there any tunable, so i can reduce ZFS sync interval.
TRB> If there is no any tunable, can not I use "mdb" for the job ...?

TRB> This is not general and we are ok with increased I/O rate.
TRB> Please advice/help.

txt_time/D

btw:
 10,000 * 700 = ~7MB

What's your storage subsystem? Any, even small, raid device with write
cache should help.


  


***

"The information contained in this email including in any attachment is 
confidential and is meant to be read only by the person to whom it is 
addressed. If you are not the intended recipient(s), you are prohibited from 
printing, forwarding, saving or copying this email. If you have received this 
e-mail in error, please immediately notify the sender and delete this e-mail 
and its attachments from your computer."

***___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] Peak every 4-5 second

2008-07-23 Thread Tharindu Rukshan Bamunuarachchi




10,000 x 700 = 7MB per
second ..

We have this rate for whole day 

10,000 orders per second is minimum requirments of modern day stock
exchanges ...

Cache still help us for ~1 hours, but after that who will help us ...

We are using 2540 for current testing ...
I have tried same with 6140, but no significant improvement ... only
one or two hours ...

Robert Milkowski wrote:

  Hello Tharindu,

Wednesday, July 23, 2008, 6:35:33 AM, you wrote:

TRB> Dear Mark/All,

TRB> Our trading system is writing to local and/or array volume at 10k 
TRB> messages per second.
TRB> Each message is about 700bytes in size.

TRB> Before ZFS, we used UFS.
TRB> Even with UFS, there was evey 5 second peak due to fsflush invocation.

TRB> However each peak is about ~5ms.
TRB> Our application can not recover from such higher latency.

TRB> So we used several tuning parameters (tune_r_* and autoup) to decrease
TRB> the flush interval.
TRB> As a result peaks came down to ~1.5ms. But it is still too high for our
TRB> application.

TRB> I believe, if we could reduce ZFS sync interval down to ~1s, peaks will
TRB> be reduced to ~1ms or less.
TRB> We like <1ms peaks per second than 5ms peak per 5 second :-)

TRB> Are there any tunable, so i can reduce ZFS sync interval.
TRB> If there is no any tunable, can not I use "mdb" for the job ...?

TRB> This is not general and we are ok with increased I/O rate.
TRB> Please advice/help.

txt_time/D

btw:
 10,000 * 700 = ~7MB

What's your storage subsystem? Any, even small, raid device with write
cache should help.


  


***

"The information contained in this email including in any attachment is 
confidential and is meant to be read only by the person to whom it is 
addressed. If you are not the intended recipient(s), you are prohibited from 
printing, forwarding, saving or copying this email. If you have received this 
e-mail in error, please immediately notify the sender and delete this e-mail 
and its attachments from your computer."

***___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] Peak every 4-5 second

2008-07-23 Thread Robert Milkowski
Hello Tharindu,

Wednesday, July 23, 2008, 6:35:33 AM, you wrote:

TRB> Dear Mark/All,

TRB> Our trading system is writing to local and/or array volume at 10k 
TRB> messages per second.
TRB> Each message is about 700bytes in size.

TRB> Before ZFS, we used UFS.
TRB> Even with UFS, there was evey 5 second peak due to fsflush invocation.

TRB> However each peak is about ~5ms.
TRB> Our application can not recover from such higher latency.

TRB> So we used several tuning parameters (tune_r_* and autoup) to decrease
TRB> the flush interval.
TRB> As a result peaks came down to ~1.5ms. But it is still too high for our
TRB> application.

TRB> I believe, if we could reduce ZFS sync interval down to ~1s, peaks will
TRB> be reduced to ~1ms or less.
TRB> We like <1ms peaks per second than 5ms peak per 5 second :-)

TRB> Are there any tunable, so i can reduce ZFS sync interval.
TRB> If there is no any tunable, can not I use "mdb" for the job ...?

TRB> This is not general and we are ok with increased I/O rate.
TRB> Please advice/help.

txt_time/D

btw:
 10,000 * 700 = ~7MB

What's your storage subsystem? Any, even small, raid device with write
cache should help.


-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] where was zpool status information keeping.

2008-07-23 Thread wan_jm
the os 's / first is on mirror  /dev/dsk/c1t0d0s0 and /dev/dsk/c1t1d0s0, and 
then created  home_pool using mirror, here is the mirror information.
  pool: omp_pool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
omp_pool  ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t3d0s0  ONLINE   0 0 0
c1t2d0s0  ONLINE   0 0 0

then changed the root to raw /dev/dsk/c1t1d0s0, and then reboot the system from 
it. zfs is ok now as everything keeps unchanged. then I run zpool detach 
command and then zpool attach. zfs is still ok in the root environment. but 
when I boot the system on /dev/dsk/c1t0d0s0(disk0). home_pool is UNAVAIL now.
pool: home_pool
 state: UNAVAIL
status: One or more devices could not be used because the the label is missing 
or invalid.  There are insufficient replicas for the pool to continue
functioning.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
home_pool UNAVAIL  0 0 0  insufficient replicas
  mirror  UNAVAIL  0 0 0  insufficient replicas
c1t1d0s7  FAULTED  0 0 0  corrupted data
c1t0d0s7  FAULTED  0 0 0  corrupted data

what is the reason. In My opinion, there must some information keep in /. so 
what and where it is ?
thanks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-23 Thread Justin Stringfellow

> with other Word files.  You will thus end up seeking all over the disk 
> to read _most_ Word files.  Which really sucks.  



> very limited, constrained usage. Disk is just so cheap, that you 
> _really_ have to have an enormous amount of dup before the performance 
> penalties of dedup are countered.

Neither of these hold true for SSDs though, do they? Seeks are essentially 
free, and the devices are not cheap.

cheers,
--justin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss