[zfs-discuss] Sudden and Dramatic Performance Drop-off

2012-10-04 Thread Knipe, Charles
Hey guys,

I've run into another ZFS performance disaster that I was hoping someone might 
be able to give me some pointers on resolving.  Without any significant change 
in workload write performance has dropped off dramatically.  Based on previous 
experience we tried deleting some files to free space, even though we're not 
near 60% full yet.  Deleting files seemed to help for a little while, but now 
we're back in the weeds.

We already have our metaslab_min_alloc_size set to 0x500, so I'm reluctant to 
go lower than that.  One thing we noticed, which is new to us, is that 
zio_state shows a large number of threads in CHECKSUM_VERIFY.  I'm wondering if 
that's generally indicative of anything in particular.  I've got no errors on 
any disks, either in zpool status or iostat -e.  Any ideas as to where else I 
might want to dig in to figure out where my performance has gone?

Thanks

-Charles

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS resilvering loop from hell

2011-07-26 Thread Charles Stephens
I'm on S11E 150.0.1.9 and I replaced one of the drives and the pool seems to be 
stuck in a resilvering loop.  I performed a 'zpool clear' and 'zpool scrub' and 
just complains that the drives I didn't replace are degraded because of too 
many errors.  Oddly the replaced drive is reported as being fine.  The CKSUM 
counts get up to about 108 or so when the resilver is completed.

I'm now trying to evacuate the pool onto another pool, however the zfs 
send/receive is dying after 380GB into sending the first dataset.

Here is some output.  Any help or insights will be helpful.  Thanks

cfs

  pool: dpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Tue Jul 26 15:03:32 2011
63.4G scanned out of 5.02T at 6.81M/s, 212h12m to go
15.1G resilvered, 1.23% done
config:

NAMESTATE READ WRITE CKSUM
dpool   DEGRADED 0 0 6
  raidz1-0  DEGRADED 0 012
c9t0d0  DEGRADED 0 0 0  too many errors
c9t1d0  DEGRADED 0 0 0  too many errors
c9t3d0  DEGRADED 0 0 0  too many errors
c9t2d0  ONLINE   0 0 0  (resilvering)

errors: Permanent errors have been detected in the following files:

:<0x0>
[redacted list of 20 files, mostly in the same directory]


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intermittent ZFS hang

2010-09-23 Thread Charles J. Knipe
So, I'm still having problems with intermittent hangs on write with my ZFS 
pool.  Details from my original post are below.  Since posting that, I've gone 
back and forth with a number of you, and gotten a lot of useful advice, but I'm 
still trying to get to the root of the problem so I can correct it.  Since the 
original post I have:

-Gathered a great deal of information in the form of kernel thread dumps, 
zio_state dumps, and live crash dumps while the problem is happening.
-Been advised that my ruling out of dedupe was probably premature, as I still 
likely have a good deal of deduplicated data on-disk.
-Checked just about every log and counter that might indicate a hardware error, 
without finding one.

I was wondering at this point if someone could give me some pointers on the 
following:
1. Given the dumps and diagnostic data I've gathered so far, is there a way I 
can determine for certain where in the ZFS driver I'm spending so much time 
hanging?  At the very least I'd like to try to determine whether it is, in-fact 
a deduplication issue.
2. If it is, in fact, a deduplication issue, would my only recourse be a new 
pool and a send/receive operation?  The data we're storing is VMFS volumes for 
ESX.  We're tossing around the idea of creating new volumes in the same pool 
(now that dedupe is off) and migrating VMs over in small batches.  The theory 
is that we would be writing non-deduped data this way, and when we were done we 
could remove the deduplicated volumes.  Is this sound?

Thanks again for all the help!

-Charles

> Howdy,
> 
> We're having a ZFS performance issue over here that I
> was hoping you guys could help me troubleshoot.  We
> have a ZFS pool made up of 24 disks, arranged into 7
> raid-z devices of 4 disks each.  We're using it as an
> iSCSI back-end for VMWare and some Oracle RAC
> clusters.
> 
> Under normal circumstances performance is very good
> both in benchmarks and under real-world use.  Every
> couple days, however, I/O seems to hang for anywhere
> between several seconds and several minutes.  The
> hang seems to be a complete stop of all write I/O.
>  The following zpool iostat illustrates:
> 
> pool0   2.47T  5.13T120  0   293K  0
> pool0   2.47T  5.13T127  0   308K  0
> pool0   2.47T  5.13T131  0   322K  0
> pool0   2.47T  5.13T144  0   347K  0
> pool0   2.47T  5.13T135  0   331K  0
> pool0   2.47T  5.13T122  0   295K  0
> pool0   2.47T  5.13T135  0   330K  0
> 
> While this is going on our VMs all hang, as do any
> "zfs create" commands or attempts to touch/create
> files in the zfs pool from the local system.  After
> several minutes the system "un-hangs" and we see very
> high write rates before things return to normal
> across the board.
> 
> Some more information about our configuration:  We're
> running OpenSolaris svn-134.  ZFS is at version 22.
> Our disks are 15kRPM 300gb Seagate Cheetahs, mounted
> in Promise J610S Dual enclosures, hanging off a Dell
> SAS 5/e controller.  We'd tried out most of this
> configuration previously on OpenSolaris 2009.06
> without running into this problem.  The only thing
> that's new, aside from the newer OpenSolaris/ZFS is
>  a set of four SSDs configured as log disks.
> 
> At first we blamed de-dupe, but we've disabled that.
> Next we suspected the SSD log disks, but we've seen
>  the problem with those removed, as well.
> 
> Has anyone seen anything like this before?  Are there
> any tools we can use to gather information during the
> hang which might be useful in determining what's
> going wrong?
> 
> Thanks for any insights you may have.
> 
> -Charles
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intermittent ZFS hang

2010-09-13 Thread Charles J. Knipe
> > At first we blamed de-dupe, but we've disabled that. Next we
> suspected
> > the SSD log disks, but we've seen the problem with those removed, as
> > well.
> 
> Did you have dedup enabled and then disabled it? If so, data can (or
> will) be deduplicated on the drives. Currently the only way of de-
> deduping them is to recopy them after disabling dedup.

That's a good point.  There is deduplicated data still present on disk.  Do you 
think the issue we're seeing may be related to the existing deduped data?  I'm 
not against copying the contents of the pool over to a new pool, but 
considering the effort/disruption I'd want to make sure it's not just a shot in 
the dark.

If I don't have a good theory in another week, that's when I start shooting in 
the dark...

-Charles
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intermittent ZFS hang

2010-09-13 Thread Charles J. Knipe
> 
> 
> Charles,
> 
> Just like UNIX, there are several ways to drill down
> on the problem.  I
> would probably start with a live crash dump (savecore
> -L) when you see
> the problem.  Another method would be to grap
> multiple "stats" commands
> during the problem to see where you can drill down
> later.  I would
> probably use this method if the problem lasts for a
> while and drill
> down with dtrace base on what I saw.  But each
> method is going to
> depend on your skill, when looking at the
> problem.
> 
> Dave
> 

Dave,

After running clean since my last post the problem occurred again today.  This 
time I was able to gather some data while it was going on.  The only thing that 
jumps out at my so far is the output of echo ::zio_state | mdb -k.

Under normal operations this usually looks like this:

ADDRESS  TYPE  STAGEWAITER

ff090eb69328 NULL  OPEN -
ff090eb69c88 NULL  OPEN -

Here are a couple samples while the issue was happening:

ADDRESS  TYPE  STAGEWAITER

ff0bfe8c59b0 NULL  CHECKSUM_VERIFY  
ff003e2f2c60
ff090eb69328 NULL  OPEN -
ff090eb69c88 NULL  OPEN -

ADDRESS  TYPE  STAGEWAITER

ff09bb12a040 NULL  CHECKSUM_VERIFY  
ff003d6acc60
ff0bfe8c59b0 NULL  CHECKSUM_VERIFY  
ff003e2f2c60
ff090eb69328 NULL  OPEN -
ff090eb69c88 NULL  OPEN -

Operating under the assumption that the waiter column is referencing kernel 
threads, I went looking for those addresses in the thread list.  Here are the 
threadlist entries for ff003d6acc60 and ff003e2f2c60 from the example 
directly above taken at about the same time as that output:

ff003d6acc60 ff0930d8c700 ff09172f9de0   2   0 ff09bb12a348
  PC: _resume_from_idle+0xf1CMD: zpool-pool0
  stack pointer for thread ff003d6acc60: ff003d6ac360
  [ ff003d6ac360 _resume_from_idle+0xf1() ]
swtch+0x145()
cv_wait+0x61()
zio_wait+0x5d()
dbuf_read+0x1e8()
dmu_buf_hold+0x93()
zap_get_leaf_byblk+0x56()
zap_deref_leaf+0x78()
fzap_length+0x42()
zap_length_uint64+0x84()
ddt_zap_lookup+0x4b()
ddt_object_lookup+0x6d()
ddt_lookup+0x115()
zio_ddt_free+0x42()
zio_execute+0x8d()
taskq_thread+0x248()
thread_start+8()

ff003e2f2c60 fbc2dbb00   0  60 ff0bfe8c5cb8
  PC: _resume_from_idle+0xf1THREAD: txg_sync_thread()
  stack pointer for thread ff003e2f2c60: ff003e2f2a40
  [ ff003e2f2a40 _resume_from_idle+0xf1() ]
swtch+0x145()
cv_wait+0x61()
zio_wait+0x5d()
spa_sync+0x40c()
txg_sync_thread+0x24a()
thread_start+8()

Not sure if any of that sheds any light on the problem.  I also have a live 
dump from the period when the problem was happening, a bunch of iostats, 
mpstats, and ::arc, ::spa, ::zio_state, and ::threadlist -v from mdb -k at 
several points during the issue.

If you have any advice on how to proceed from here in debugging this issue I'd 
greatly appreciate it.  So you know, I'm generally very comfortable with unix, 
but dtrace and the solaris kernel are unfamiliar territory.  

In any event, thanks again for all the help thus far.

-Charles
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intermittent ZFS hang

2010-08-30 Thread Charles J. Knipe
David,

Thanks for your reply.  Answers to your questions are below.

> Is it just ZFS hanging (or what it appears to be is
> slowing down or
> blocking) or does the whole system hang?  

Only the ZFS storage is affected.  Any attempt to write to it blocks until the 
issue passes.  Other than that the system behaves normally.  I have not, as far 
as I remember, tried writing to the root pool while this is going on, I'll have 
to check that next time.  I suspect the problem is likely limited to a single 
pool.

> What does iostat show during the time period of the
> slowdown?
> What does mpstat show during the time of the
> slowdown?
> 
> You can look at the metadata statistics by running
> the following.
> echo ::arc | mdb -k
> When looking at a ZFS problem, I usually like to
> gather
> echo ::spa | mdb -k
> echo ::zio_state | mdb -k

I will plan to dump information from all of these sources next time I can catch 
it in the act.  Any other diag commands you think might be useful?

> I suspect you could drill down more with dtrace or
> lockstat to see
> where the slowdown is happening.

I'm brand new to DTrace.  I'm doing some reading now toward being in a position 
to ask intelligent questions.

-Charles
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Intermittent ZFS hang

2010-08-30 Thread Charles J. Knipe
Howdy,

We're having a ZFS performance issue over here that I was hoping you guys could 
help me troubleshoot.  We have a ZFS pool made up of 24 disks, arranged into 7 
raid-z devices of 4 disks each.  We're using it as an iSCSI back-end for VMWare 
and some Oracle RAC clusters.

Under normal circumstances performance is very good both in benchmarks and 
under real-world use.  Every couple days, however, I/O seems to hang for 
anywhere between several seconds and several minutes.  The hang seems to be a 
complete stop of all write I/O.  The following zpool iostat illustrates:

pool0   2.47T  5.13T120  0   293K  0
pool0   2.47T  5.13T127  0   308K  0
pool0   2.47T  5.13T131  0   322K  0
pool0   2.47T  5.13T144  0   347K  0
pool0   2.47T  5.13T135  0   331K  0
pool0   2.47T  5.13T122  0   295K  0
pool0   2.47T  5.13T135  0   330K  0

While this is going on our VMs all hang, as do any "zfs create" commands or 
attempts to touch/create files in the zfs pool from the local system.  After 
several minutes the system "un-hangs" and we see very high write rates before 
things return to normal across the board.

Some more information about our configuration:  We're running OpenSolaris 
svn-134.  ZFS is at version 22.  Our disks are 15kRPM 300gb Seagate Cheetahs, 
mounted in Promise J610S Dual enclosures, hanging off a Dell SAS 5/e 
controller.  We'd tried out most of this configuration previously on 
OpenSolaris 2009.06 without running into this problem.  The only thing that's 
new, aside from the newer OpenSolaris/ZFS is a set of four SSDs configured as 
log disks.

At first we blamed de-dupe, but we've disabled that.  Next we suspected the SSD 
log disks, but we've seen the problem with those removed, as well.

Has anyone seen anything like this before?  Are there any tools we can use to 
gather information during the hang which might be useful in determining what's 
going wrong?

Thanks for any insights you may have.

-Charles
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is the J4200 SAS array suitable for Sun Cluster?

2010-05-16 Thread Charles Hedrick
We use this configuration. It works fine. However I don't know enough about the 
details to answer all of your questions.

The disks are accessible from both systems at the same time. Of course with ZFS 
you had better not actually use them from both systems.

Actually, let me be clear about what we do. We have two J4200's and one J4400. 
One J4200 uses SAS disks, the others SATA. The two with SATA disks are used in 
Sun cluster configurations as NFS servers. They fail over just fine, losing no 
state. The one with SAS is not used with Sun Cluster. Rather, it's a Mysql 
server with two systems, one of them as a hot spare. (It also acts as a mysql 
slave server, but it uses different storage for that.) That means that our 
actual failover experience is with the SATA configuration. I will say from 
experience that in the SAS configuration both systems see the disks at the same 
time. I even managed to get ZFS to mount the same pool from both systems, which 
shouldn't be possible. Behavior was very strange until we realized what was 
going on.

I get the impression that they have special hardware in the SATA version that 
simulates SAS dual interface drives. That's what lets you use SATA drives in a 
two-node configuration. There's also some additional software setup for that 
configuration.

Note however that they do not support SSD in the J4000. That means that a Sun 
cluster configuration is going to have slow write performance in any 
application that uses synchronous writes (e.g. the NFS server). The recommended 
approach is to put the ZIL in SSD. But in SunCluster it would have to be SSD 
that's shared between the two systems, or you'd lose the contents of the ZIL 
when you do a failover. Since you can't put SSD in the J4200, it's not clear 
how you'd set that up.

Personally I consider this a very serious disadvantage to the J4000 series. I 
kind of wish we had gotten a higher end storage system with some non-volatile 
cache. Of course when we got the hardware, Sun claimed they were going to 
support SSD in it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation

2010-05-10 Thread charles
Yes, I have recently tried the userquota option, (one ZFS filesystem with 
60,000 quotas and 60,000 ordinary 'mkdir' home directories within), and this 
works finebut you end up with less granularity of snapshots.

It does seem odd that after only 1000 ZFS filesystems there is a slow down. It 
does sound like a bug rather than a limitation. I know that the slow boot 
problem you mentioned did get resolved in more recent versions of Solaris, 
although I cannot test it with 60,000 filesystems, as life is too short to wait 
for them to create!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation

2010-05-10 Thread charles
Hi,

This thread refers to Solaris 10, but it was suggested that I post it here as 
ZFS developers may well be more likely to respond.

http://forums.sun.com/thread.jspa?threadID=5438393&messageID=10986502#10986502

Basically after about ZFS 1000 filesystem creations the creation time slows 
down to around 4 seconds, and gets progressively worse.

This is not the case for normal mkdir which creates thousands of directories 
very quickly.

I wanted users home directories (60,000 of them) all to be individual ZFS file 
systems, but there seems to be a bug/limitation due to the prohibitive creation 
time.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] CIFS in production and my experience so far, advice needed

2010-04-12 Thread charles
I am looking at opensolaris with ZFS and CIFS shares as an option for large 
scale production use with Active Directory.

I have successfully joined the opensolaris CIFS server to our Windows AD test 
domain and created an SMB share that the Windows server 2003 can see. I have 
also created test users on the Windows Server, specifying the profile and home 
directory path to the SMB share on the opensolaris server. This creates a home 
directory with relevant ACL's on ZFS on the fly. All appears to work fine.
I could then in theory apply quotas on opensolaris if needed with the rich easy 
to use ZFS commands, and even tweak ACLS with /usr/bin/chmod.

In theory I could batch user creation on the Windows server, all homedirs would 
be on ZFS and the world would be a better place.

This is all using the ephemeral user model on opensolaris, so only the SIDs are 
stored permanently on opensolaris, the uids and username 'just passing through 
in memory', the ACLs importantly remaining associated to the SIDS (this is my 
experience with ACLs but I am unsure how stable the permanency of ACLs are 
considering the appear to be associated with temporoay uids in opensolaris if 
you interrogate with /usr/bin/ls -V  ?)

The scale is around 60,000 users all with home directories and a total of 40 
million files, soon to be increasing. Around 600 concurrent users.

This is where I stop to thinkis opensolaris and CIFS designed to be robust 
enough for use like this in production?

There are several opensolaris  bug reports I have noted with CIFS like for 
example you cannot unmount a share because  'device busy' (bug id 6819639).

This make me worry about how robust it really is.

I would dearly love to have a ZFS fileserver for my Windows users, especially 
as NTFS is not as scalable and flexible and frankly as nice.

I have been put off Solaris (10) and SAMBA because the config looks complex and 
you appear to need identity on the Solaris server, which would be a major 
headache with 60,000 users. Opensolaris and CIFS solves this issue immediately 
with idmap and ephemeral users.

What are peoples thought on the viability of opensolaris in production?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can't destroy snapshot

2010-03-31 Thread Charles Hedrick
So that eliminates one of my concerns. However the other one is still an issue. 
Presumably Solaris Cluster shouldn't import a pool that's still active on the 
other system. We'll be looking more carefully into that.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can't destroy snapshot

2010-03-31 Thread Charles Hedrick
Ah, I hadn't thought about that. That may be what was happening. Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can't destroy snapshot

2010-03-31 Thread Charles Hedrick
So we tried recreating the pool and sending the data again.

1) compression wasn't set on the copy, even though I did sent -R, which is 
supposed to send all properties
2) I tried killing to send | receive pipe. Receive couldn't be killed. It hung.
3) This is Solaris Cluster. We tried forcing a failover. The pool mounted on 
the other server without dismounting on the first. zpool list showed it mounted 
on both machines. zpool iostat showed I/O actually occurring on both systems.

Altogether this does not give me a good feeling about ZFS. I'm hoping the 
problem is just with receive and CLuster, and the it works properly on a single 
system. Because i'm running a critical database on ZFS on another system.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can't destroy snapshot

2010-03-31 Thread Charles Hedrick
# zfs destroy -r OIRT_BAK/backup_bad
cannot destroy 'OIRT_BAK/backup_...@annex-2010-03-23-07:04:04-bad': dataset 
already exists


No, there are no clones.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can't destroy snapshot

2010-03-31 Thread Charles Hedrick
Incidentally, this is on Solaris 10, but I've seen identical reports from 
Opensolaris.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] can't destroy snapshot

2010-03-31 Thread Charles Hedrick
We're getting the notorious "cannot destroy ... dataset already exists". I've 
seen a number of reports of this, but none of the reports seem to get any 
response. Fortunately this is a backup system, so I can recreate the pool, but 
it's going to take me several days to get all the data back. Is there any known 
workaround?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] shrinking a zpool - roadmap

2010-02-22 Thread Charles Hedrick
I talked with our enterprise systems people recently. I don't believe they'd 
consider ZFS until it's more flexible. Shrink is a big one, as is removing an 
slog. We also need to be able to expand a raidz, possibly by striping it with a 
second one and then rebalancing the sizes.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] performance problem with Mysql

2010-02-20 Thread Charles Hedrick
I hadn't considered stress testing the disks. Obviously that's a good idea. 
We'll look at doing something in May, when we have the next opportunity to take 
down the database. I doubt that doing testing during production is a good 
idea...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] performance problem with Mysql

2010-02-20 Thread Charles Hedrick
We had been using the same pool for a backup Mysql server for 6 months before 
using it for the primary server. Neither zpool status -v nor fmdump shows any 
signs of problems.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] performance problem with Mysql

2010-02-20 Thread Charles Hedrick
We recently moved a Mysql database from NFS (Netapp) to a local disk array 
(J4200 with SAS disks). Shortly after moving production, the system effectively 
hung. CPU was at 100%, and one disk drive was at 100%.

I had tried to follow the tuning recommendations for Mysql mostly:
* recordsize set to 16K
* primarycache=metadata
* zfs_prefetch_disable=1

The theory of primarycache=metadata is that Mysql will do a better job of 
caching internally than ZFS does. However continuous read of a disk suggested 
to me that perhaps Mysql was reading something repeatedly. Thus (after 
restarting everything) I put primarycache back to the default. I haven't seen 
the problem again. But there's no way to know whether I actually fixed it or 
whether it was just a fluke.

At this point load on the storage is low enough that further tuning doesn't 
seem worth it. We average less than 1 MB / sec read and write.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] available space

2010-02-15 Thread Charles Hedrick
Thanks. That makes sense. This is raidz2.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool import with failed ZIL device now possible ?

2010-02-13 Thread Charles Hedrick
I have a similar situation. I have a system that is used for backup copies of 
logs and other non-critical things, where the primary copy is on a Netapp. Data 
gets written in batches a few times a day. We use this system because storage 
on it is a lot less expensive than on the Netapp. It's only non-critical data 
that is sent via NFS. Critical data is sent to this server either by zfs send | 
receive, or by an rsync running on the server that reads from the Netapp over 
NFS. Thus the important data shouldn't go through the ZIL.

I am seriously considering turning off the ZIL, because NFS write performance 
is so lousy.

I'd use SSD, except that I can't find a reasonable way of doing so. I have a 
pair of servers with Sun Cluster, sharing a J4200 JBOD. If there's a failure, 
operations move to the other server. Thus a local SSD is no better than ZIL 
disabled. I'd love to put an SSD in the J4200, but the claim that this was 
going to be supported seems to have vanished.

Someone once asked why I both with redundant systems if I don't care about the 
data. The answer is that if the NFS mounts hang, my production service hang. 
Also, I do care about some of the data. It just happens not to go through the 
ZIL.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] available space

2010-02-13 Thread Charles Hedrick
I have the following pool:

NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
OIRT  6.31T  3.72T  2.59T58%  ONLINE  /

"zfs list" shows the following for a typical file system:

NAMEUSED  AVAIL  REFER  MOUNTPOINT
OIRT/sakai/production  1.40T  1.77T  1.40T  /OIRT/sakai/production

Why is available lower when shown by zfs than zpool?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!

2010-01-18 Thread Charles Hedrick
>From the web page it looks like this is a card that goes into the computer 
>system. That's not very useful for enterprise applications, as they are going 
>to want to use an external array that can be used by a redundant pair of 
>servers.

I'm very interested in a cost-effective device that will interface to two 
systems.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + OpenSolaris for home NAS?

2010-01-15 Thread Charles Edge
To have Mac OS X connect via iSCSI:
http://krypted.com/mac-os-x/how-to-use-iscsi-on-mac-os-x/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fast mirror resync?

2010-01-15 Thread Charles Menser
Perhaps an ISCSI mirror for a laptop? Online it when you are back
"home" to keep your backup current.

Charles

On Thu, Jan 14, 2010 at 7:04 PM, A Darren Dunham  wrote:
> On Thu, Jan 14, 2010 at 06:11:10PM -0500, Miles Nordin wrote:
>> zpool offline / zpool online of a mirror component will indeed
>> fast-resync, and I do it all the time.  zpool detach / attach will
>> not.
>
> Yes, but the offline device is still part of the pool.  What are you
> doing with the device when you take it offline?  (What's the reason
> you're offlining it?)
>
> --
> Darren
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] getting decent NFS performance

2009-12-22 Thread Charles Hedrick
Is ISCSI reliable enough for this?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] getting decent NFS performance

2009-12-22 Thread Charles Hedrick
It turns out that our storage is currently being used for

* backups of various kinds, run daily by cron jobs
* saving old log files from our production application
* saving old versions of java files from our production application

Most of the usage is write-only, and a fair amount of it involves copying huge 
directories. There's no actual current user data.

I think zil_disable may actually make sense.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] getting decent NFS performance

2009-12-22 Thread Charles Hedrick
Thanks. That's what I was looking for.

Yikes! I hadn't realized how expensive the Zeus is.

We're using Solaris cluster, so if the system goes down, the other one takes 
over. That means that if the ZIL is on a local disk, we lose it in a crash. 
Might as well just set zil_disable (something I'm considering doing anyway).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] getting decent NFS performance

2009-12-22 Thread Charles Hedrick
We have a server using Solaris 10. It's a pair of systems with a shared J4200, 
with Solaris cluster. It works very nicely. Solaris cluster switches over 
transparently.

However as an NFS server it is dog-slow. This is the usual synchronous write 
problem. Setting zfs_disable fixes the problem. otherwise it can take more than 
an hour to copy files that take me 2 min with our netapp.

The obvious solution is to use a flash disk for the ZIL. However I'm clueless 
what hardware to use. Can anyone suggest either a flash drive that will work in 
the J4200 (SATA), or some way to connect a drive to two machines so that 
Solaris cluster will work? Sun used to claim that they were going to support a 
flash drive in the J4200. How that statement seems to have disappeared, and 
their SATA flash drive seems to be vapor, despite appearing real on the Sun web 
site. (I tried to order one.)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [indiana-discuss] Boot failure with snv_122 and snv_123

2009-09-23 Thread Charles Menser
Cross-posted to ZFS-Discuss per Vikram's suggestion.

Summary: I upgraded to snv_123 and the system hangs on boot. snv_121, and 
earlier are working fine.

Booting with -kv, the system still hung, but after a few minutes, the system 
continued, spit out more text (referring to disks, but I could not capture the 
text). Here is what was left on the screen once the debugger kicked in:

PCI Express-device: i...@0, ata0
ata0 is /p...@0,0/pci-...@14,1/i...@0
PCI Express-device: pci1002,5...@4, pcieb1
pcieb1 is /p...@0,0/pci1002,5...@4
PCI Express-device: pci1002,5...@7
pcieb3 is /p...@0,0/pci1002,5...@7
UltraDMA mode 4 selected
sd4 at ata0: target 0 lun 0
sd4 is /p...@0,0/pci-...@14,1/i...@0/s...@0,0
NOTICE: Can not read the pool label from '/p...@0,0/pci1043,8...@12/d...@0,0:a'
NOTICE: spa_import_rootpool: error 5
Cannot mount root on /p...@0,0/pci1043,8...@12/d...@0,0:a fstype zfs

panic[cpu0]/thread=fbc2efe0: vfs_mountroot: cannot mount root

fbc50ce0 genunix:vfs_mountroot+350 ()
fbc50d10 genunix:main+e7 ()
fbc50d20 unix:_locore_start+92 ()

panic: entering debugger (do dump device, continue to reboot)

Then I ran ::stack and ::status:

[0]> ::stack
kmdb_enter+0xb()
debug_enter+0x38(fb934340)
panicsys+0x41c(fbb89070, fbc50c70, fbc58e80, 1)
vpanic+0x15c()
panic+0x94()
vfs_mountroot+0x350()
main+0xe7()
_locore_start+0x92()

[0]> ::status
debugging live kernel (64-bit) on (not set)
operating system: 5.11 snv_123 (i86pc)
CPU-specific support: AMD
DTrace state: inactive
stopped on: debugger entry trap


To clarify, when build 122 was announced, I tried upgrading. The new BE would 
not boot, hanging in the same way that snv_123 does. I later deleted the 
snv_122 BE.

Also, I checked my grub config, and nothing seems out of line there (though I 
have edited the boot entries to remove the splashimage, foreground, background, 
and console=graphics).

Thanks,
Charles


> Hi,
> 
> A problem with your root pool - something went wrong
> when you upgraded 
> which explains why snv_122 no longer works as well.
> One of the ZFS 
> experts on this list could help you - I suspect
> others may have run into 
> similar issues before.
> 
> Vikram
> 
> Charles Menser wrote:
> > Vikram,
> >
> > Thank you for the prompt reply!
> >
> > I have made no BIOS changes. The last time I
> changed the BIOS was before reinstalling OpenSolaris
> 2009.06 after changing my SATA controller to AHCI
> mode. This was some time ago, and I have been using
> the /dev repo and installed several development
> builds since then (the latest that worked was
> snv_121).
> >
> > I switched to a USB keyboard and mdb was happy. I
> am curious why a PS/AUX keyboard works with the
> system normally, but not MDB.
> >
> > Here is what I have from MDB so far:
> >
> > I rebooted with -kv, and after a few minutes, the
> system continued, spit out more text (referring to
> disks, but I could not capture the text). Here is
> what was left on the screen once the debugger kicked
> in:
> >
> > PCI Express-device: i...@0, ata0
> > ata0 is /p...@0,0/pci-...@14,1/i...@0
> > PCI Express-device: pci1002,5...@4, pcieb1
> > pcieb1 is /p...@0,0/pci1002,5...@4
> > PCI Express-device: pci1002,5...@7
> > pcieb3 is /p...@0,0/pci1002,5...@7
> > UltraDMA mode 4 selected
> > sd4 at ata0: target 0 lun 0
> > sd4 is /p...@0,0/pci-...@14,1/i...@0/s...@0,0
> > NOTICE: Can not read the pool label from
> '/p...@0,0/pci1043,8...@12/d...@0,0:a'
> > NOTICE: spa_import_rootpool: error 5
> > Cannot mount root on
> /p...@0,0/pci1043,8...@12/d...@0,0:a fstype zfs
> >
> > panic[cpu0]/thread=fbc2efe0: vfs_mountroot:
> cannot mount root
> >
> > fbc50ce0 genunix:vfs_mountroot+350 ()
> > fbc50d10 genunix:main+e7 ()
> > fbc50d20 unix:_locore_start+92 ()
> >
> > panic: entering debugger (do dump device, continue
> to reboot)
> >
> > [again, the above is hand transcribed, and may
> contain typos]
> >
> > Then I ran ::stack and ::status:
> >
> > [0]> ::stack
> > kmdb_enter+0xb()
> > debug_enter+0x38(fb934340)
> > panicsys+0x41c(fbb89070, fbc50c70,
> fbc58e80, 1)
> > vpanic+0x15c()
> > panic+0x94()
> > vfs_mountroot+0x350()
> > main+0xe7()
> > _locore_start+0x92()
> >
> > [0]> ::status
> > debugging live kernel (64-bit) on (not set)
> > operating system: 5.11 snv_123 (i86pc)
> > CPU-specific support: AMD
> > DTrace state: inactive
> > stopped on: debugger entry trap
> >
> > The motherboard is an ASUS M3A32-MVP, wit

[zfs-discuss] Question about mirror vdev performance considerations

2009-08-12 Thread Charles Menser
With four drives A,A,B,B where A is fast access and/or
high-throughput, and B is either slow to seek and/or has slower
transfer speed, what are the implications for mirrored ZFS pools?

In particular I am wondering how the IO performance will compare between:

zpool create mypool mirror A A mirror B B

and

zpool create mypool mirror A B mirror A B

and

zpool create mypool mirror A B mirror B A

Thanks,
Charles Menser
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] add-view for the zfs snapshot

2009-08-07 Thread Charles Baker
> I frist create lun by "stmfadm create-lu ", and
> add-view , so the initiator can see the created
>  lun.
> 
> Now I use "zfs snapshot"  to create snapshot for the
> created lun.
>  
> hat can I do to make the snapshot is accessed by the
> Initiator? Thanks.

Hi,

This is a good question and something that I have not tried.
Please see Chapter 7 of the zfs manual linked below.

http://dlc.sun.com/pdf/819-5461/819-5461.pdf

Cross posting with the zfs discuss

regards
Chuck
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-04 Thread Charles Baker
> My testing has shown some serious problems with the
> iSCSI implementation for OpenSolaris.
> 
> I setup a VMware vSphere 4 box with RAID 10
> direct-attached storage and 3 virtual machines:
> - OpenSolaris 2009.06 (snv_111b) running 64-bit
> - CentOS 5.3 x64 (ran yum update)
> - Ubuntu Server 9.04 x64 (ran apt-get upgrade)
> 
> I gave each virtual 2 GB of RAM, a 32 GB drive and
> setup a 16 GB iSCSI target on each (the two Linux vms
> used iSCSI Enterprise Target 0.4.16 with blockio).
> VMware Tools was installed on each. No tuning was
> done on any of the operating systems.
> 
> I ran two tests for write performance - one one the
> server itself and one from my Mac connected via
> Gigabit (mtu of 1500) iSCSI connection using
> globalSAN’s latest initiator.
> 
> Here’s what I used on the servers:
> time dd if=/dev/zero of=/root/testfile bs=1048576k
> count=4
> and the Mac OS with the iSCSI connected drive
> (formatted with GPT / Mac OS Extended journaled):
> time dd if=/dev/zero of=/Volumes/test/testfile
> bs=1048576k count=4
> 
> The results were very interesting (all calculations
> using 1 MB = 1,084,756 bytes)
> 
> For OpenSolaris, the local write performance averaged
> 86 MB/s. I turned on lzjb compression for rpool (zfs
> set compression=lzjb rpool) and it went up to 414
> MB/s since I’m writing zeros). The average
> performance via iSCSI was an abysmal 16 MB/s (even
> with compression turned on - with it off, 13 MB/s).
> 
> For CentOS (ext3), local write performance averaged
> 141 MB/s. iSCSI performance was 78 MB/s (almost as
> fast as local ZFS performance on the OpenSolaris
> server when compression was turned off).
> 
> Ubuntu Server (ext4) had 150 MB/s for the local
> write. iSCSI performance averaged 80 MB/s.
> 
> One of the main differences between the three virtual
> machines was that the iSCSI target on the Linux
> machines used partitions with no file system. On
> OpenSolaris, the iSCSI target created sits on top of
> ZFS. That creates a lot of overhead (although you do
> get some great features).
> 
> Since all the virtual machines were connected to the
> same switch (with the same MTU), had the same amount
> of RAM, used default configurations for the operating
> systems, and sat on the same RAID 10 storage, I’d say
> it was a pretty level playing field. 
> 
> While jumbo frames will help iSCSI performance, it
> won’t overcome inherit limitations of the iSCSI
> target’s implementation.

cross-posting with zfs discuss.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] core dump on zfs receive

2009-06-27 Thread Charles Hedrick
I'd like to maintain a backup of the main pool on an external drive. Can you 
suggest a way to do that? I was hoping to do zfs send | zfs receive and then do 
that with incrementals. It seems that this isn't going to work. How do people 
actually back up ZFS-based systems?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] core dump on zfs receive

2009-06-22 Thread Charles Hedrick
I'm trying to do a simple backup. I did

zfs snapshot -r rp...@snapshot
zfs send -R rp...@snapshot | zfs receive -Fud external/rpool

zfs snapshot -r rp...@snapshot2
zfs send -RI rp...@snapshot1 rp...@snapshot2 | zfs receive -d external/rpool

The receive coredumps
$c
libc_hwcap1.so.1`strcmp+0xec(809ba50, 0, 8044938, 1020)
libzfs.so.1`recv_incremental_replication+0xb57(8088648, 8047430, 2, 8084ca8, 
80841e8, 40)
libzfs.so.1`zfs_receive_package+0x436(8088648, 0, 8047e2b, 2, 8047580, 80476c0)
libzfs.so.1`zfs_receive_impl+0x689(8088648, 8047e2b, 2, 0, 0, 8047c6c)
libzfs.so.1`zfs_receive+0x35(8088648, 8047e2b, 2, 0, 0, 8047d40)
zfs_do_receive+0x172(3, 8047d40, 8047d3c, 807187c)
main+0x2af(4, 8047d3c, 8047d50, feffb7b4)
_start+0x7d(4, 8047e1c, 8047e20, 8047e28, 8047e2b, 0)

I've tried a number of variants to arguments for send and receive, but it 
always coredumps.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] how to do backup

2009-06-20 Thread Charles Hedrick
I have a USB disk, to which I want to do a backup. I've used send | receive. It 
works fine until I try to reboot. At that point the system fails to come up 
because the backup copy is set to be mounted at the original location so the 
system tries to mount two different things the same place. I guess I can have 
the script set mountpoint=none, but I'd think there would be a better approach.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] two pools on boot disk?

2009-06-20 Thread Charles Hedrick
I have a small system that is going to be a file server. It has two disks. I'd 
like just one pool for data. Is it possible to create two pools on the boot 
disk, and then add the second disk to the second pool? The result would be a 
single small pool for root, and a second pool containing the rest of that disk 
plus the second disk? 

The installer seems to want to use the whole disk for the root pool. Is there a 
way to change that?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-10 Thread Charles Binford
DE - could you please post the output of your 'zpool umount usbhdd1'
command?  I believe the output will prove useful to the point being
discussed below.

Charles

D. Eckert wrote:
> (...)
> You don't move a pool with 'zfs umount', that only unmounts a single zfs
> filesystem within a pool, but the pool is still active.. 'zpool export'
> releases the pool from the OS, then 'zpool import' on the other machine.
> (...)
>
> with all respect: I never read such a non logic ridiculous .
>
> I have a single zpool set up over the entire available disk space on an 
> external USB drive without any other filesystems inside this particular pool.
>
> so how on earth should I be sure, that the pool is still a live pool inside 
> the operating system if the output of 'mount' cmd tells me, the pool is no 
> longer attached to the root FS
>
> this doesn't make sense at all and it is a vulnerability of ZFS.
>
> so if the output of the mount cmd tells you the FS / ZPOOL is not mounted I 
> can't face any reason why the filesystem should be still up and running, 
> because I just unmounted the only one available ZPOOL.
>
> And by the way: After performing: 'zpool umount usbhdd1' I can NOT access any 
> single file inside /usbhdd1.
>
> What else should be released from the OS FS than a single zpool containing no 
> other sub Filesystems?
>
> Why? The answer is quite simple: The pool is unmounted and no longer hooked 
> up to the system's filesystem. so what should me prevent from unplugging the 
> usb wire?
>
> Regards,
> DE
>   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-10 Thread Charles Binford
Jeff, what do you mean by "disks that simply blow off write ordering."? 
My experience is that most enterprise disks are some flavor of SCSI, and
host SCSI drivers almost ALWAYS use simple queue tags, implying the
target is free to re-order the commands for performance.  Are talking
about something else, or does ZFS request Order Queue Tags on certain
commands?

Charles

Jeff Bonwick wrote:
>> There is no substitute for cord-yank tests - many and often. The  
>> weird part is, the ZFS design team simulated millions of them.
>> So the full explanation remains to be uncovered?
>> 
>
> We simulated power failure; we did not simulate disks that simply
> blow off write ordering.  Any disk that you'd ever deploy in an
> enterprise or storage appliance context gets this right.
>
> The good news is that ZFS is getting popular enough on consumer-grade
> hardware.  The bad news is that said hardware has a different set of
> failure modes, so it takes a bit of work to become resilient to them.
> This is pretty high on my short list.
>
> Jeff
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris better Than Solaris10u6 with requards to ARECA Raid Card

2009-01-16 Thread Charles Wright
I tested with zfs_vdev_max_pending=8
I hoped this should make the error messages 
   arcmsr0: too many outstanding commands (257 > 256)
go away but it did not.

zfs_vdev_max_pending=8 this should have only allowed 128 commands total to 
be outstanding I would think (16 Drives * 8 = 128).
 
However I haven't yet been able to corrupt zfs with it set to 8 (yet...)
So it seems to have helped.

I took a log of iostat -x 1 while I was doing a log of I/O and posted it here
http://wrights.webhop.org/areca/solaris-info/zfs_vdev_max_pending-tests/8/iostat-8.txt

You can see the number of errors and other info here
http://wrights.webhop.org/areca/solaris-info/zfs_vdev_max_pending-tests/8/

Information about my system can also be found here
http://wrights.webhop.org/areca/

Thanks for the suggestion.   I'm working with James and Erich and hopefully 
they will find something in the driver code.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris better Than Solaris10u6 with requards to ARECA Raid Card

2009-01-15 Thread Charles Wright
I've tried putting this in /etc/system and rebooting
set zfs:zfs_vdev_max_pending = 16

Are we sure that number equates to a scsi command?
Perhaps I should set it to 8 and see what happens.
(I have 256 scsi commands I can queue across 16 drives)

I still got these error messages in the log.

Jan 15 15:29:40 yoda arcmsr: [ID 659062 kern.notice] arcmsr0: too many 
outstanding commands (257 > 256)
Jan 15 15:29:40 yoda arcmsr: [ID 659062 kern.notice] arcmsr0: too many 
outstanding commands (256 > 256)
Jan 15 15:29:43 yoda last message repeated 73 times

I watched iostat -x a good bit and usually it is 0.0 or 0.1

r...@yoda:~# iostat -x
 extended device statistics 
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd0   0.00.00.00.0  0.0  0.00.0   0   0 
sd1   0.42.0   22.3   13.5  0.1  0.0   39.3   1   2 
sd2   0.52.0   25.6   13.5  0.1  0.0   40.4   2   2 
sd3   0.3   21.5   18.7  334.4  0.7  0.1   40.1  13  15 
sd4   0.3   21.6   18.9  334.4  0.7  0.1   40.6  13  15 
sd5   0.3   21.5   19.2  334.4  0.7  0.1   39.7  12  15 
sd6   0.3   21.6   18.6  334.4  0.7  0.2   40.4  13  15 
sd7   0.3   21.6   18.7  334.4  0.7  0.1   40.3  12  15 
sd8   0.3   21.6   18.7  334.4  0.7  0.2   40.1  13  15 
sd9   0.3   21.5   18.5  334.5  0.7  0.1   40.0  12  14 
sd10  0.3   21.4   18.9  333.6  0.7  0.1   40.2  12  14 
sd11  0.3   21.4   18.9  333.6  0.7  0.1   39.3  12  15 
sd12  0.3   21.4   19.4  333.6  0.7  0.2   40.0  13  15 
sd13  0.3   21.4   18.9  333.6  0.7  0.1   40.3  13  15 
sd14  0.3   21.4   19.0  333.6  0.7  0.1   38.8  12  14 
sd15  0.3   21.4   19.1  333.6  0.7  0.1   39.6  12  14 
sd16  0.3   21.4   18.7  333.6  0.7  0.1   39.3  12  14
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris better Than Solaris10u6 with requards to ARECA Raid Card

2009-01-14 Thread Charles Wright
Here's an update:

I thought that the error message
arcmsr0: too many outstanding commands
might be due to a Scsi queue being over ran

The areca driver has
#*define*ARCMSR_MAX_OUTSTANDING_CMD 
256

What might be happening is each raid set results in a new instance of the areca 
driver getting loaded so perhaps
the scsi queue on the card is just get over ran as each drive is getting a 
queue depth of 256, as such I tested with sd_max_throttle:16

(16 Drives * 16 Queues = 256)

I verified sd_max_throttle got set via:
r...@yoda:~/solaris-install-stuff# echo "sd_max_throttle/D" |mdb -k
sd_max_throttle:
sd_max_throttle:16  



I see that if I run this script to create a bunch of small files I can make a 
lot of drives jump to degrade in a hurry.

#!/bin/bash

dir=/backup/homebackup/junk
mkdir -p $dir

cd $dir

date
printf "Creating 1 1k files in $dir \n"
i=1
while [ $i -ge 0 ]
do
   j=`expr $i - 1`
   dd if=/dev/zero of=$i count=1 bs=1k &> /dev/null
   i=$j
done

date

i=1
printf "Deleting 1 1k files in $dir \n"
while [ $i -ge 0 ]
do
   j=`expr $i - 1`
   rm $i
   i=$j
done
date


Before running the script:
r...@yoda:~# zpool status
 pool: backup
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   backup   ONLINE   0 0 0
 raidz1 ONLINE   0 0 2
   c4t2d0   ONLINE   0 0 0
   c4t3d0   ONLINE   0 0 0
   c4t4d0   ONLINE   0 0 0
   c4t5d0   ONLINE   0 0 0
   c4t6d0   ONLINE   0 0 0
   c4t7d0   ONLINE   0 0 0
   c4t8d0   ONLINE   0 0 0
 raidz1 ONLINE   0 0 2
   c4t9d0   ONLINE   0 0 0
   c4t10d0  ONLINE   0 0 0
   c4t11d0  ONLINE   0 0 0
   c4t12d0  ONLINE   0 0 0
   c4t13d0  ONLINE   0 0 0
   c4t14d0  ONLINE   0 0 0
   c4t15d0  ONLINE   0 0 0

errors: No known data errors

 pool: rpool
state: ONLINE
scrub: none requested
config:

   NAME  STATE READ WRITE CKSUM
   rpool ONLINE   0 0 0
 mirror  ONLINE   0 0 0
   c4t0d0s0  ONLINE   0 0 0
   c4t1d0s0  ONLINE   0 0 0

errors: No known data errors


   AFTER running the script:

r...@yoda:~/solaris-install-stuff# zpool status -v
 pool: backup
state: DEGRADED
status: One or more devices has experienced an error resulting in data
   corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
   entire pool from backup.
  see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   backup   DEGRADED 0 0 5
 raidz1 DEGRADED 0 014
   c4t2d0   DEGRADED 0 0 0  too many errors
   c4t3d0   ONLINE   0 0 0
   c4t4d0   ONLINE   0 0 0
   c4t5d0   DEGRADED 0 0 0  too many errors
   c4t6d0   ONLINE   0 0 0
   c4t7d0   DEGRADED 0 0 0  too many errors
   c4t8d0   DEGRADED 0 0 0  too many errors
 raidz1 DEGRADED 0 012
   c4t9d0   DEGRADED 0 0 0  too many errors
   c4t10d0  DEGRADED 0 0 0  too many errors
   c4t11d0  DEGRADED 0 0 0  too many errors
   c4t12d0  DEGRADED 0 0 0  too many errors
   c4t13d0  DEGRADED 0 0 0  too many errors
   c4t14d0  DEGRADED 0 0 0  too many errors
   c4t15d0  DEGRADED 0 0 1  too many errors

errors: Permanent errors have been detected in the following files:

   backup/homebackup:<0x0>

 pool: rpool
state: ONLINE
scrub: none requested
config:

   NAME  STATE READ WRITE CKSUM
   rpool ONLINE   0 0 0
 mirror  ONLINE   0 0 0
   c4t0d0s0  ONLINE   0 0 0
   c4t1d0s0  ONLINE   0 0 0

errors: No known data errors

BTW 
I called Seagate to check the drive firmware.They confirm that Firmware 
version 3.AEK is the latest for the drives I have.   This is the version 
running on all 16 of my drives.

I'm about out of ideas to try.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris better Than Solaris10u6 with requards to ARECA Raid Card

2009-01-14 Thread Charles Wright
Thanks for the info.I'm running the Latest Firmware for my card: V1.46
with BOOT ROM Version V1.45

Could you tell me how you have your card configured?   Are you using JBOD, 
RAID, or Pass Through?   What is your Max SATA mode set too?   How may drives 
do you have attached?

What is your ZFS config like?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris better Than Solaris10u6 with requards to ARECA Raid Card

2009-01-13 Thread Charles Wright
Thanks for the reply,  I've also had issue with consumer class drives and other 
raid cards.

The drives I have here (all 16 drives) are Seagate® Barracuda® ES enterprise 
hard drives Model Number ST3500630NS

If the problem was with the drive I would expect the same behavior in both 
solaris and opensolaris.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] OpenSolaris better Than Solaris10u6 with requards to ARECA Raid Card

2009-01-13 Thread Charles Wright
Under Solaris 10 u6 ,  No matter how I configured my ARECA 1261ML Raid card
I got errors on all drives that result from SCSI timeout errors. 

yoda:~ # tail -f /var/adm/messages
Jan  9 11:03:47 yoda.asc.edu scsi: [ID 107833 kern.notice]  
Requested Block: 239683776 Error Block: 239683776
Jan  9 11:03:47 yoda.asc.edu scsi: [ID 107833 kern.notice]  Vendor: 
SeagateSerial Number:
Jan  9 11:03:47 yoda.asc.edu scsi: [ID 107833 kern.notice]  Sense 
Key: Not Ready
Jan  9 11:03:47 yoda.asc.edu scsi: [ID 107833 kern.notice]  ASC: 0x4 
(LUN is becoming ready), ASCQ: 0x1, FRU: 0x0
Jan  9 11:03:47 yoda.asc.edu scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@17/pci17d3,1...@0/s...@c,0 (sd14):
Jan  9 11:03:47 yoda.asc.eduError for Command: 
write(10)   Error Level: Retryable
Jan  9 11:03:47 yoda.asc.edu scsi: [ID 107833 kern.notice]  
Requested Block: 239683776 Error Block: 239683776
Jan  9 11:03:47 yoda.asc.edu scsi: [ID 107833 kern.notice]  Vendor: 
SeagateSerial Number:
Jan  9 11:03:47 yoda.asc.edu scsi: [ID 107833 kern.notice]  Sense 
Key: Not Ready
Jan  9 11:03:47 yoda.asc.edu scsi: [ID 107833 kern.notice]  ASC: 0x4 
(LUN is becoming ready), ASCQ: 0x1, FRU: 0x0

zfs eventually would degrade the drives due to the errors.   I'm positive that 
there is nothing wrong with my hardware.

Here is the driver I used under Solaris 10 u6.
ftp://ftp.areca.com.tw/RaidCards/AP_Drivers/Solaris/DRIVER/1.20.00.16-80731/readme.txt

I got these errors using either JBOD or configuring the Drives as
pass-through.

I turned off NCQ and Tagged Queuing and still got errors.

yoda:~/bin # zpool status
 pool: backup
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
   attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
   using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver completed after 1h7m with 0 errors on Fri Jan  9 
09:57:46 2009
config:

   NAME STATE READ WRITE CKSUM
   backup   ONLINE   0 0 0
 raidz1 ONLINE   0 0 0
   c2t2d0   ONLINE   0 5 0
   c2t3d0   ONLINE   0 1 0
   c2t4d0   ONLINE   0 1 0
   c2t5d0   ONLINE   0 2 0
   c2t6d0   ONLINE   0 2 0
   c2t7d0   ONLINE   0 2 0
   c2t8d0   ONLINE   0 3 0
 raidz1 ONLINE   0 0 0
   c2t9d0   ONLINE   0 2 0
   c2t10d0  ONLINE   0 2 0
   c2t11d0  ONLINE   0 3 0
   c2t12d0  ONLINE   0 3 0
   c2t13d0  ONLINE   0 3 0
   c2t14d0  ONLINE   0 2 0
   c2t15d0  ONLINE   051 0

errors: No known data errors

 pool: rpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
   attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
   using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:

   NAME  STATE READ WRITE CKSUM
   rpool ONLINE   0 0 0
 mirror  ONLINE   0 0 0
   c2t0d0s0  ONLINE   0 5 0
   c2t1d0s0  ONLINE   3 2 0

errors: No known data errors

Under opensolaris, I don't get the SCSI timeout errors but I do get error 
messages like this:

Jan 13 09:30:39 yoda last message repeated 5745 times
Jan 13 09:30:39 yoda arcmsr: [ID 659062 kern.notice] arcmsr0: too many 
outstanding commands (255 > 256)
Jan 13 09:30:39 yoda arcmsr: [ID 659062 kern.notice] arcmsr0: too many 
outstanding commands (256 > 256)
Jan 13 09:30:49 yoda last message repeated 2938 times
Jan 13 09:30:49 yoda arcmsr: [ID 659062 kern.notice] arcmsr0: too many 
outstanding commands (254 > 256)
Jan 13 09:30:49 yoda arcmsr: [ID 659062 kern.notice] arcmsr0: too many 
outstanding commands (256 > 256)
Jan 13 09:30:53 yoda last message repeated 231 times
Jan 13 09:30:53 yoda arcmsr: [ID 659062 kern.notice] arcmsr0: too many 
outstanding commands (257 > 256)
Jan 13 09:30:53 yoda last message repeated 2 times
Jan 13 09:30:53 yoda arcmsr: [ID 659062 kern.notice] arcmsr0: too many 
outstanding commands (256 > 256)
Jan 13 09:31:11 yoda last message repeated 1191 times
Jan 13 09:31:11 yoda arcmsr: [ID 659062 kern.notice] arcmsr0: too many 
outstanding commands (255 > 256)
Jan 13 09:31:11 yoda arcmsr: [ID 659062 kern.notice] arcmsr0: too many 
outstanding commands (256 > 256)

Fortunately it looks like zpoo

Re: [zfs-discuss] Problem with time-slider

2008-12-30 Thread Charles
Yeah


Thanks a lot to timf and mgerdts, it's working now !
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem with time-slider

2008-12-29 Thread Charles
Thanks you all for you help 


First, I tried the command and this is what I get:

svcs -v zfs/auto-snapshot

STATE  NSTATESTIMECTID   FMRI
online - 14:48:56  - 
svc:/system/filesystem/zfs/auto-snapshot:weekly
online - 14:48:57  - 
svc:/system/filesystem/zfs/auto-snapshot:monthly
maintenance- 14:48:59  - 
svc:/system/filesystem/zfs/auto-snapshot:daily
maintenance- 14:49:01  - 
svc:/system/filesystem/zfs/auto-snapshot:hourly
maintenance- 14:49:01  - 
svc:/system/filesystem/zfs/auto-snapshot:frequent



And for timf, I have red your pages, but I don't understand what to do. In the 
bug report, the workaround is to unmount and mount the filesystem, but I don't 
know how to do this with zfs.

When I enter "zfs list" I get that:

NAMEUSED  AVAIL  REFER  MOUNTPOINT
rpool  32,8G   424G  75,5K  /rpool
rpool/ROOT 5,15G   424G18K  legacy
rpool/ROOT/opensolaris 8,96M   424G  2,47G  /
rpool/ROOT/opensolaris-1   5,14G   424G  4,78G  /
rpool/dump 4,00G   424G  4,00G  -
rpool/export   19,7G   424G19K  /export
rpool/export/home  19,7G   424G19K  /export/home
rpool/export/home/charles  19,7G   424G  16,2G  /export/home/charles

I don't know what to unmount here


thanks again for your help :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Problem with time-slider

2008-12-29 Thread Charles
Hi

I'm a new user of OpenSolaris 2008.11, I switched from Linux to try the 
time-slider, but now when I execute the time-slider I get this message:

http://img115.imageshack.us/my.php?image=capturefentresansnomfx9.png


Thanks you and happy new year ^^
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Peculiar disk loading on raidz2

2008-11-21 Thread Charles Menser
The drives are all connected to the motherboard's (Intel S3210SHLX) SATA ports.

I've scrubbed the pool several times in the last two days, no errors:

[EMAIL PROTECTED]:~# zpool status -v
  pool: main_pool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
main_pool   ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0

errors: No known data errors

I appreciate your feedback, I had not thought to aggregate the stats
and check the aggregate.

Thanks,
Charles

On Fri, Nov 21, 2008 at 3:24 PM, Will Murnane <[EMAIL PROTECTED]> wrote:
> On Fri, Nov 21, 2008 at 14:35, Charles Menser <[EMAIL PROTECTED]> wrote:
>> I have a 5 drive raidz2 pool which I have a iscsi share on. While
>> backing up a MacOS drive to it I noticed some very strange access
>> patterns, and wanted to know if what I am seeing is normal, or not.
>>
>> There are times when all five drives are accessed equally, and there
>> are times when only three of them are seeing any load.
> What does "zpool status" say?  How are the drives connected?  To what
> controller(s)?
>
> This  could just be some degree of asynchronicity showing up.  Take a
> look at these two:
>  capacity operationsbandwidth
> pool used  avail   read  write   read  write
> --  -  -  -  -  -  -
> main_pool852G  3.70T361  1.30K  2.78M  10.1M
>  raidz2 852G  3.70T361  1.30K  2.78M  10.1M
>   c5t5d0  -  -180502  1.25M  3.57M
>   c5t3d0  -  -205330  1.30M  2.73M
>   c5t4d0  -  -239489  1.43M  2.81M
>   c5t2d0  -  -205 17  1.25M  26.1K
>   c5t1d0  -  -248 13  1.41M  25.1K
> --  -  -  -  -  -  -
>
>  capacity operationsbandwidth
> pool used  avail   read  write   read  write
> --  -  -  -  -  -  -
> main_pool852G  3.70T 10  2.02K  77.7K  15.8M
>  raidz2 852G  3.70T 10  2.02K  77.7K  15.8M
>   c5t5d0  -  -  2921   109K  6.52M
>   c5t3d0  -  -  9691   108K  5.63M
>   c5t4d0  -  -  9962   105K  5.97M
>   c5t2d0  -  -  9  1.30K   167K  8.50M
>   c5t1d0  -  -  2  1.23K   150K  8.54M
> --  -  -  -  -  -  -
>
> For c5t5d0, a total of 3.57+6.52 MB of IO happen: 10.09 MB;
> For c5t3d0, a total of 2.73+5.63 MB of IO happen: 8.36 MB;
> For c5t4d0, a total of 2.81+5.97 MB of IO happen: 8.78 MB;
> For c5t2d0, a total of (~0)+8.50 MB of IO happen: 8.50 MB;
> and for c5t1d0, a total of (~0) + 8.54 MB of IO happen: 8.54 MB.
>
> So over time, the amount written to each drive is approximately the
> same.  This being the case, I don't think I'd worry about it too
> much... but a scrub is a fairly cheap way to get peace of mind.
>
> Will
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Peculiar disk loading on raidz2

2008-11-21 Thread Charles Menser
 2719.7 12.2  6.0   23.8   11.8  56  60 c5t3d0
  141.4  371.8  705.4 2720.0 12.1  6.3   23.6   12.4  57  61 c5t4d0
  140.6  372.4  702.8 2719.7 12.1  6.5   23.5   12.6  57  61 c5t5d0
5.00.9  355.53.7  0.1  0.0   16.01.8   1   1 c3t0d0
2.32.6  176.4  110.4  0.1  0.0   18.41.8   1   1 c3t1d0
1.82.2  140.0  109.9  0.1  0.0   31.71.9   1   1 c3t2d0
1.41.9  112.4  109.4  0.2  0.0   47.72.0   1   1 c3t3d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c4t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c5t0d0
1.0 1071.5   43.9 8101.0 14.4  0.4   13.40.4  43  43 c5t1d0
1.3 1219.6   44.6 8144.5 15.6  0.5   12.80.4  47  47 c5t2d0
0.0  962.50.0 6174.6 34.0  1.0   35.31.0 100 100 c5t3d0
0.0  591.40.0 3460.8 34.0  1.0   57.51.7 100 100 c5t4d0
0.0  846.80.0 5818.8 32.0  3.0   37.83.5 100 100 c5t5d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t3d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c4t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c5t0d0
   13.00.0   39.00.0  0.0  0.00.00.8   0   1 c5t1d0
0.30.00.80.0  0.0  0.00.00.1   0   0 c5t2d0
   15.3  300.7  100.3 2311.1 10.7  0.4   33.91.1  34  35 c5t3d0
0.0  514.40.0 3572.0 34.0  1.0   66.11.9 100 100 c5t4d0
   14.0  360.4   76.0 2705.1 16.8  1.6   44.94.4  53  55 c5t5d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t3d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c4t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c5t0d0
  205.2   20.2 1295.9   56.7  0.4  0.11.70.7  14  15 c5t1d0
  161.9   20.9 1186.6   58.1  0.2  0.21.11.3   9  13 c5t2d0
  159.3   18.3 1080.3   45.8  0.2  0.11.10.8   9  15 c5t3d0
  161.5  301.9 1167.0 1477.7 17.4  0.7   37.51.4  64  65 c5t4d0
  201.4   18.9 1245.0   46.2  0.1  0.20.60.9   8  17 c5t5d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t3d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c4t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c5t0d0
0.7 1300.0   23.0 8263.1 16.7  0.5   12.90.4  50  50 c5t1d0
1.0 1124.4   45.2 8353.2 14.6  0.4   13.00.4  44  44 c5t2d0
0.0 1021.10.0 6676.0 33.9  1.0   33.21.0 100 100 c5t3d0
0.0 1017.30.0 6769.4 33.9  1.0   33.41.0 100 100 c5t4d0
0.0  769.60.0 5308.2 33.9  1.0   44.11.3 100 100 c5t5d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t3d0

Thanks,
Charles
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dribbling checksums

2008-10-30 Thread Charles Menser
I'll do that today.

Thank you!

Charles

On Thu, Oct 30, 2008 at 2:12 AM, Marc Bevand <[EMAIL PROTECTED]> wrote:
> Charles Menser  gmail.com> writes:
>>
>> Nearly every time I scrub a pool I get small numbers of checksum
>> errors on random drives on either controller.
>
> These are the typical symptoms of bad RAM/CPU/Mobo. Run memtest for 24h+.
>
> -marc
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Dribbling checksums

2008-10-28 Thread Charles Menser
My home server is giving me fits.

I have seven disks, comprising three pools, on two multi-port SATA
controllers (one onboard the Asus M2A-VM motherboard, and one
Supermicro AOC-SAT2-MV8).

The disks range from many months to many days old.

Two pools are mirrors, one is a raidz.

The machine is running opensolaris snv99.

Nearly every time I scrub a pool I get small numbers of checksum
errors on random drives on either controller.

I have replaced the power supply, suspecting bad power, to no avail.

I removed the AOC-SAT2-MV8 and all the drives, save the root mirror,
(to try ruling out some weird interaction with the AOC-SAT2-MV8) and
still take errors.

Has anyone had a similar problem?

Any ideas what may be happening?

Is there more data I can provide?

Many thanks,
Charles
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool status my_pool , shows a pulled disk c1t6d0 as ONLINE ???

2008-07-28 Thread Charles Emery
New server build with Solaris-10 u5/08, 
on a SunFire t5220, and this is our first rollout of ZFS and Zpools.

Have 8 disks, boot disk is hardware mirrored (c1t0d0 + c1t1d0)

Created Zpool my_pool as RaidZ using 5 disks + 1 spare: 
c1t2d0, c1t3d0, c1t4d0, c1t5d0, c1t6d0, and spare c1t7d0

I am working on alerting & recovery plans for disks failures in the zpool.
As a test, I have pulled disk c1t6d0, to see what a disk failure will look like.
"zpool status -v mypool" Still reports disk c1t6d0 as ONLINE.
"iostat -En" also does not yet realize that the disk is pulled.

By contrast, format realizes the disk is missing, 
and the disk pull did generate errors in /var/adm/messages.

Do I need to hit the device bus with some a command to get a more accurate 
status, or something like that? 
Would appreciate any recommendations for zpool disk failure monitoring?

See the attachment for output from iostat -En, format, and the tail of 
/var/adm/messages:

Here is the output from "zpool status -v":

newserver:/# zpool status -v
pool: my_pool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
my_pool ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
spares
c1t7d0 AVAIL 
errors: No known data errors
=

Message was edited by: 
cemery
 
 
This message posted from opensolaris.orgzpool status my_pool , shows a pulled disk c1t6d0 as ONLINE ???


New server build, on a SunFire t5220,
and this is our first rollout of ZFS and Zpools.

Have 8 disks, boot disk is hardware mirrored (c1t0d0 + c1t1d0)

Created a Raidz (raid5) zpool (called my_pool), including 1 spare, 
Created my_pool as RaidZ pool using 5 disks + 1 spare disk: 
c1t2d0, c1t3d0, c1t4d0, c1t5d0, c1t6d0, and spare c1t7d0

I am working on alerting & recovery plans for disks failures in the zpool.
As a test, I have pulled disk c1t6d0, to see what a disk failure will look like.
"zpool status -v mypool"  Still reports disk c1t6d0 as ONLINE.
"iostat -En" also does not yet realize that the disk is pulled.

By contrast, format realizes the disk is missing, 
and the disk pull did generate errors in /var/adm/messages.

Do I need to hit the device bus with some a command to get a more accurate 
status? Would appreciate any recommendations for zpool disk failure monitoring?

Below are the output from "zpool status -v", "iostat -En", "format", and the 
tail of /var/adm/messages:

newserver:/# zpool status -v
  pool: my_pool
 state: ONLINE
 scrub: none requested
config:
NAMESTATE READ WRITE CKSUM
my_poolONLINE   0 0 0
  raidz1ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
c1t6d0  ONLINE   0 0 0
spares
  c1t7d0AVAIL   
errors: No known data errors
=

newserver:/# iostat -En
c1t0d0   Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 
Vendor: LSILOGIC Product: Logical Volume   Revision: 3000 Serial No:  
Size: 146.56GB <146561286144 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 2 Predictive Failure Analysis: 0 
c1t2d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: SEAGATE  Product: ST914602SSUN146G Revision: 0603 Serial No: 0811953XZG 
Size: 146.81GB <146810536448 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c0t0d0   Soft Errors: 4 Hard Errors: 3 Transport Errors: 0 
Vendor: TSSTcorp Product: CD/DVDW TS-T632A Revision: SR03 Serial No:  
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 3 No Device: 0 Recoverable: 0 
Illegal Request: 4 Predictive Failure Analysis: 0 
c1t3d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: SEAGATE  Product: ST914602SSUN146G Revision: 0603 Serial No: 08139591NN 
Size: 146.81GB <146810536448 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c1t4d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: SEAGATE  Product: ST914602SSUN146G Revision: 0603 Serial No: 0813957V3R 
Size: 146.81GB <146810536448 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c1t5d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: SEAGATE  Product: ST914602SSUN146G Revision: 0603 Serial No: 0813957V2J 
Size: 146.81GB <146810536448 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c1t6d0   Soft Errors: 0 Hard Er

Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool

2008-07-24 Thread Charles Meeks
Hoping this is not too off topic.Can anyone confirm you can break a 
mirrored zfs root pool once formed.   I basically want to clone a boot drive, 
take it to another piece of identical hardware and have two machines ( or more 
).   I am running indiana b93 on x86 hardware. I have read that there are 
various bugs with mirrored zfs root that prevent what I want to do.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-24 Thread Charles Menser
I installed it with snv_86 in IDE controller mode, and have since
upgraded ending up at snv_93.

Do you know what implications there are for using AHCI vs IDE modes?

Thanks,
Charles

On Thu, Jul 24, 2008 at 9:26 AM, Florin Iucha <[EMAIL PROTECTED]> wrote:
> On Thu, Jul 24, 2008 at 08:22:16AM -0400, Charles Menser wrote:
>> Yes, I am vary happy with the M2A-VM.
>
> You will need at least SNV_93 to use it in AHCI mode.
>
> The northbridge gets quite hot, but that does not seem to be impairing
> its performance.  I have the M2A-VM with an AMD 64 BE-2400 (45W) and
> a Scythe Ninja Mini heat sink and the only fans that I have in the case
> are the two side fans (the case is Antec NSK-2440).  Quiet as a mouse.
>
> florin
>
> --
> Bruce Schneier expects the Spanish Inquisition.
>  http://geekz.co.uk/schneierfacts/fact/163
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-24 Thread Charles Menser
Yes, I am vary happy with the M2A-VM.

Charles

On Wed, Jul 23, 2008 at 5:05 PM, Steve <[EMAIL PROTECTED]> wrote:
> Thank you for all the replays!
> (and in the meantime I was just having a dinner! :-)
>
> To recap:
>
> tcook:
> you are right, in fact I'm thinking to have just 3/4 for now, without 
> anything else (no cd/dvd, no videocard, nothing else than mb and drives)
> the case will be the second choice, but I'll try to stick to micro ATX for 
> space reason
>
> Charles Menser:
> 4 is ok, so is the "ASUS M2A-VM" good?
>
> Matt Harrison:
> The post is superb (very compliment to Simon)! And in fact I was already on 
> that, but the MB is unfortunatly ATX. If it will be the only or the suggested 
> choice I would go for it, but I hope there will be a littler one
>
> bhigh:
> so the best is 780G?
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-23 Thread Charles Menser
I am wondering how many SATA controllers most motherboards have for
their built-in SATA ports.

Mine, an ASUS M2A-VM, has four ports, but OpenSolaris reports them as
belonging to two controllers.

I have seen motherboards with 6+ SATA ports, and would love to know if
any of them have more controller density or if two-to-one is the norm.

Charles

On Wed, Jul 23, 2008 at 3:37 PM, Steve <[EMAIL PROTECTED]> wrote:
> I'm a fan of ZFS since I've read about it last year.
>
> Now I'm on the way to build a home fileserver and I'm thinking to go with 
> Opensolaris and eventually ZFS!!
>
> Apart from the other components, the main problem is to choose the 
> motherboard. The offer is incredibly high and I'm lost.
>
> Minimum requisites should be:
> - working well with Open Solaris ;-)
> - micro ATX (I would put in a little case)
> - low power consumption but more important reliable (!)
> - with Gigabit ethernet
> - 4+ (even better 6+) sata 3gb controller
>
> Also: what type of RAM to select toghether? (I would chose if good ECC, but 
> the rest?)
>
> Does it make sense? What are the possibilities?
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Charles Soto
On 7/22/08 11:48 AM, "Erik Trimble" <[EMAIL PROTECTED]> wrote:

> I'm still not convinced that dedup is really worth it for anything but
> very limited, constrained usage. Disk is just so cheap, that you
> _really_ have to have an enormous amount of dup before the performance
> penalties of dedup are countered.

Again, I will argue that the spinning rust itself isn't expensive, but data
management is.  If I am looking to protect multiple PB (through remote data
replication and backup), I need more than just the rust to store that.  I
need to copy this data, which takes time and effort.  If the system can say
"these 500K blocks are the same as these 500K, don't bother copying them to
the DR site AGAIN," then I have a less daunting data management task.
De-duplication makes a lot of sense at some layer(s) within the data
management scheme.

Charles

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How do you grow a ZVOL?

2008-07-17 Thread Charles Menser
I've looked for anything I can find on the topic, but there does not
appear to be anything documented.

Can a ZVOL be expanded?

In particular, can a ZVOL sharded via iscsi be expanded?

Thanks,
Charles
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Charles Soto
Oh, I agree.  Much of the duplication described is clearly the result of
"bad design" in many of our systems.  After all, most of an OS can be served
off the network (diskless systems etc.).  But much of the dupe I'm talking
about is less about not using the most efficient system administration
tricks.  Rather, it's about the fact that software (e.g. Samba) is used by
people, and people don't always do things efficiently.

Case in point:  students in one of our courses were hitting their quota by
growing around 8GB per day.  Rather than simply agree that "these kids need
more space," we had a look at the files.  Turns out just about every student
copied a 600MB file into their own directories, as it was created by another
student to be used as a "template" for many of their projects.  Nobody
understood that they could use the file right where it sat.  Nope. 7GB of
dupe data.  And these students are even familiar with our practice of
putting "class media" on a read-only share (these files serve as similar
"templates" for their own projects - you can create a full video project
with just a few MB in your "project file" this way).

So, while much of the situation is caused by "bad data management," there
aren't always systems we can employ that prevent it.  Done right, dedup can
certainly be "worth it" for my operations.  Yes, teaching the user the
"right thing" is useful, but that user isn't there to know how to "manage
data" for my benefit.  They're there to learn how to be filmmakers,
journalists, speech pathologists, etc.

Charles


On 7/7/08 9:24 PM, "Bob Friesenhahn" <[EMAIL PROTECTED]> wrote:

> On Mon, 7 Jul 2008, Mike Gerdts wrote:
>> 
>> As I have considered deduplication for application data I see several
>> things happen in various areas.
> 
> You have provided an excellent description of gross inefficiencies in
> the way systems and software are deployed today, resulting in massive
> duplication.  Massive duplication is used to ease service deployment
> and management.  Most of this massive duplication is not technically
> necessary.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Charles Soto
Good points.  I see the archival process as a good candidate for adding
dedup because it is essentially doing what a stage/release archiving system
already does - "faking" the existence of data via metadata.  Those blocks
aren't actually there, but they're still "accessible" because they're
*somewhere* the system knows about (i.e. the "other twin").

Currently in SAMFS, if I store two identical files on the archiving
filesystem and my policy generates 4 copies, I will have created 8 copies of
the file (albeit with different metadata).  Dedup would help immensely here.
And as archiving (data management) is inherently a "costly" operation, it's
used where potentially slower access to data is acceptable.

Another system that comes to mind that utilizes dedup is Xythos WebFS.  As
Bob points out, keeping track of dupes is a chore.  IIRC, WebFS uses a
relational database to track this (among much of its other metadata).

Charles

On 7/7/08 7:40 PM, "Bob Friesenhahn" <[EMAIL PROTECTED]> wrote:

> On Tue, 8 Jul 2008, Nathan Kroenert wrote:
> 
>> Even better would be using the ZFS block checksums (assuming we are only
>> summing the data, not it's position or time :)...
>> 
>> Then we could have two files that have 90% the same blocks, and still
>> get some dedup value... ;)
> 
> It seems that the hard problem is not if ZFS has the structure to
> support it (the implementation seems pretty obvious), but rather that
> ZFS is supposed to be able to scale to extremely large sizes.  If you
> have a petabyte of storage in the pool, then the data structure to
> keep track of block similarity could grow exceedingly large.  The
> block checksums are designed to be as random as possible so their
> value does not suggest anything regarding the similarity of the data
> unless the values are identical.  The checksums have enough bits and
> randomness that binary trees would not scale.
> 
> Except for the special case of backups or cloned server footprints,
> it does not seem that data deduplication is going to save the 90% (or
> more) space that Quantum claims at
> http://www.quantum.com/Solutions/datadeduplication/Index.aspx.
> 
> ZFS clones already provide a form of data deduplication.
> 
> The actual benefit of data deduplication to an enterprise seems
> negligible unless the backup system directly supports it.  In the
> enterprise the cost of storage has more to do with backing up the data
> than the amount of storage media consumed.
> 
> Bob


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Charles Soto
A really smart nexus for dedup is right when archiving takes place.  For
systems like EMC Centera, dedup is basically a byproduct of checksumming.
Two files with similar metadata that have the same hash?  They're identical.

Charles


On 7/7/08 4:25 PM, "Neil Perrin" <[EMAIL PROTECTED]> wrote:

> Mertol,
> 
> Yes, dedup is certainly on our list and has been actively
> discussed recently, so there's hope and some forward progress.
> It would be interesting to see where it fits into our customers
> priorities for ZFS. We have a long laundry list of projects.
> In addition there's bug fixes & performance changes that customers
> are demanding.
> 
> Neil.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs mount failed at boot stops network services.

2008-06-28 Thread Charles Soto
On 6/27/08 8:55 AM, "Mark J Musante" <[EMAIL PROTECTED]> wrote:

> On Fri, 27 Jun 2008, wan_jm wrote:
> 
>> the procedure is follows:
>> 1. mkdir /tank
>> 2. touch /tank/a
>> 3. zpool create tank c0d0p3
>> this command give the following error message:
>> cannot mount '/tank': directory is not empty;
>> 4. reboot.
>> then the os can only be login in from console. does it a bug?
> 
> No, I would not consider that a bug.

Why?

Charles
(to paraphrase PBS - "be more helpful" ; conversely, "be less pithy")

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raid card vs zfs

2008-06-25 Thread Charles Soto
On 6/25/08 2:50 PM, "Tim" <[EMAIL PROTECTED]> wrote:

> The issue is cost.  It's still cheaper for someone to buy two quad-port
> gig-e cards and trunk all the interfaces than it is for them to buy a single
> 10Gb card.

At the moment, this is quite true.  Costs per port are going down (even
10Gig) but you get quite good performance with a 4-link aggregate on the
X4500.  You could go 8-way if you add another 4-port PCI-X card.  IIRC,
Solaris 10 supports up to 16-way at this speed (but at some point you're
probably hitting a plateau).

Charles

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raid card vs zfs

2008-06-25 Thread Charles Soto
On 6/25/08 12:57 PM, "Tim" <[EMAIL PROTECTED]> wrote:

> Uhhh... 64bit/133mhz is 17Gbit/sec.  I *HIGHLY* doubt that bus will be a
> limit.  Without some serious offloading, you aren't pushing that amount of
> bandwidth out the card.  Most systems I've seen top out around 6bit/sec with
> current drivers.

Wow, 6bps!  You need a new acoustic coupler ;)

I think the X4500 designers appreciate the bandwidth ceiling, as the 10Gig
card we put in ours is single port, while the cards we have for our X6250s
are dual port (PCIe).

Charles

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-24 Thread Charles Soto
On 6/23/08 7:45 PM, "Richard Elling" <[EMAIL PROTECTED]> wrote:

> I think the ability to have different policies for file systems
> is pure goodness -- though you pay for it on the backup/
> restore side.

And another reason why Automated Data Migration is the way to go.  "Backup"
and "replication" schemes are problematic for a number of reasons, not the
least of which is "remembering what filesystems to back up."  If the
filesystem itself is taking care of things, you're probably more likely to
"get it right."  This is especially true if, like ZFS object propoerties,
ADM properties are inherited.  E.g., set / to back up, and everything else
is automatically protected.  Go "custom" wherever appropriate.

Having already been saved by a Time Machine backup that occurred just 20
minutes before my laptop hard drive died, I'm convinced Apple got it right
in making the default a "back up everything" approach.  While their solution
isn't integrated into the filesystem, as I expect ADM will be with ZFS,
there's something to be said for the "whole system approach."


> A side question though, my friends who run Windows,
> Linux, or OSX don't seem to have this bias towards isolating
> /var.  Is this a purely Solaris phenomenon?  If so, how do we
> fix it?
>  -- richard

Well, having spent a lot of time on the IRIX side, yeah, it's just a
"Solaris thing."  And S10U5 at least now only defaults to TWO partitions
(bigger / than before, and /export/home).  Baby steps, I suppose :)

Charles
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Charles Soto



On 6/23/08 11:59 AM, "Tim" <[EMAIL PROTECTED]> wrote:

> On Mon, Jun 23, 2008 at 11:18 AM, Edward <[EMAIL PROTECTED]> wrote:
> 
>> But the sad thing is Windows XP / Vista is still 32Bit. It doesn't
>> recognize more then 3.x GB of Ram. 64Bit version is still premature and
>> hardly OEM are adopting it. Hardware makers have yet to full jump on broad
>> for 64 bit drivers.
> 
> 
> false, both of them recognize well in excess of 4GB of ram.  What they CAN'T
> do is address it for *ONE* process.  That's why applications like oracle
> were quick to hop on the 64bit bandwagon, they actually need it.  I don't
> know of too many consumer level apps besides maybe photoshop (and firefox ;)
> ) that come anywhere near 4GB ram usage.


While Edward is technically incorrect, the ceiling is still 4GB total
physical memory:

http://msdn.microsoft.com/en-us/library/aa366778.aspx

Note that even though

A 25% higher RAM ceiling is one thing, but it's a far cry from the 64-128GB
the "enterprise target" Windows versions can use (yes, some of them are
32-bit but if you pay the extra $, you are allowed to use more RAM).  The
3GB per-process limit is the real factor.  But then again, who runs Oracle
on Windows? :)

Charles
(ok, I have, but only for testing)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Charles Soto
On 6/23/08 6:24 AM, "Mertol Ozyoney" <[EMAIL PROTECTED]> wrote:

> No, ZFS loves memory and unlike most other FS's around it can make good use
> of memory. But ZFS will free memory if it recognizes that other apps require
> memory or you can limit the cache ARC will be using.

This is an important distinction.  There are many examples of software which
does not utilize the resources we make available.  I'm happy with code that
takes advantage of these additional resources to improve performance.
Otherwise, it becomes difficult to make cost/benefit decisions.  "I need
more performance.  It's worth $x to get that."


> To my experiance ZFS still performs nicely on 1 GB boxes.

This is probably fine for the "typical consumer usage pattern."

> PS: How much 4 GB Ram costs for a desktop ?

I just bought 2GB DIMMs for $40.  IIRC, they were Kingston, so not a no-name
brand.

Charles

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raid card vs zfs

2008-06-23 Thread Charles Soto
On 6/23/08 6:22 AM, "Mertol Ozyoney" <[EMAIL PROTECTED]> wrote:

> A few days a ago a customer tested a Sunfire X4500 connected to a network
> with 4 x 1 Gbit ethernets. X4500 have modest CPU power and do not use any
> Raid card. The unit easly performaed 400 MB/sec on write from LAN tests
> which clearly limited by the ethernet ports.
> 
> Mertol 

This is what we are seeing with our X4500.  Clearly, the four Ethernet
channels are our limiting factor.  We put 10Gbps Ethernet on the unit, but
as this is currently the only 10-gig host on our network (waiting for Vmware
drivers to support the X6250 cards we bought), I can't really test that
fully.  We're using this as a NFS/Samba server, so JBOD with ZFS is "fast
enough."

I'm waiting for COMSTAR and ADM to really take advantage of the Thumper
platform.  The "complete storage stack" that Sun and the OpenSolaris project
have envisioned will make such "commodity" hardware useful pieces of our
solution.  I love our EMC/Brocade/HP SAN gear, but it's just too expensive
to scale (particularly when it comes to total data management).

Charles

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?

2008-06-13 Thread Charles Soto
On 6/12/08 1:46 PM, "Chris Siebenmann" <[EMAIL PROTECTED]> wrote:

> | Every time I've come across a usage scenario where the submitter asks
> | for per user quotas, its usually a university type scenario where
> | univeristies are notorious for providing lots of CPU horsepower (many,
> | many servers) attached to a simply dismal amount of back-end storage.
> 
>  Speaking as one of those pesky university people (although we don't use
> quotas): one of the reasons this happens is that servers are a lot less
> expensive than disk space. With disk space you have to factor in the
> cost of backups and ongoing maintenance, wheras another server is just N
> thousand dollars in one time costs and some rack space.
> 
> (This assumes that you are not rack space, heat, or power constrained,
> which I think most university environments generally are not.)
> 
>  Or to put it another way: disk space is a permanent commitment,
> servers are not.
> 
> - cks

Well, servers have a "running cost," as keeping them up (e.g. running and
still under your control!) requires a certain commitment of resources.  But,
I think the resource emphasis on storage is quite appropriate.  The DATA are
the valuable things, not the servers or applications.  Appropriately,
servers reached commodity status before storage.  But storage hardware will
go that way, and the focus will be on data (storage) management, where it
rightfully belongs.

Charles

-

Charles Soto[EMAIL PROTECTED]
Director, Information Technology         TEL: 512-740-1888
The University of Texas at Austin        FAX: 512-475-9711
College of Communication, CMA 5.150G
1 University Station A0900, Austin, TX 78712
http://communication.utexas.edu/technology/



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?

2008-06-13 Thread Charles Soto
On 6/13/08 12:25 AM, "Keith Bierman" <[EMAIL PROTECTED]> wrote:

> I could easily imagine providing two tiers of storage for a
> university environment ... one which wasn't backed up, and doesn't
> come with any serious promises ... which could be pretty inexpensive
> and the second tier which has the kind of commitments you suggest are
> required.
> 
> Tier 2 should be better than storing things in /tmp, but could
> approach consumer pricing ... and still be "good enough" for a lot of
> uses.

We have provided multiple "tiers" of storage for years.  However, this
usually didn't involve different "tiers" of hardware.  Rather, it
represented how we treated the files.  We have everything from "staging
pools" where everything is transient (no backups, no real SLA, wild west
rules) to snapshots, disaster recovery replication and backup.

What's really going to change everything is SAMFS.  We're able to take
advantage of $.60/GB disk on X4500, $5/GB disk on SAN and hundreds of TB of
tape "backing store" that also provides real-time backup (our traditional
backup windows are untenable).  Most importantly, we're not tied to a
specific vendor's solutions (though I'm very happy with our closed SAN's
capabilities).

"ILM" is essentially a necessity.  You can't manage storage beyond the "home
server" without it.  I hope that all storage technologies take a holistic
view of the storage management picture.  While ZFS goes a long way to
eliminating distinctions between volume and filesystem management, it is
still a niche player.  As much hype as ZFS snapshots get, that's barely
tiptoeing into the managed storage envelope.  However, I do appreciate the
focus on data integrity.  Without that at every tier, ILM cannot properly do
its job.

Charles

-

Charles Soto[EMAIL PROTECTED]
Director, Information Technology         TEL: 512-740-1888
The University of Texas at Austin        FAX: 512-475-9711
College of Communication, CMA 5.150G
1 University Station A0900, Austin, TX 78712
http://communication.utexas.edu/technology/



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] disk names?

2008-06-09 Thread Charles Soto
I agree 100%.  If we went by "this is how we always did it," then we would
not have ZFS :)

Charles
(not to mention X64, CMT, or iPhones!;)


On 6/4/08 10:55 AM, "Bob Friesenhahn" <[EMAIL PROTECTED]> wrote:

> On Tue, 3 Jun 2008, Dave Miner wrote:
>> 
>> Putting into the zpool command would feel odd to me, but I agree that
>> there may be a useful utility here.
> 
> There is value to putting this functionality in zpool for the same
> reason that it was useful to put 'iostat' and other "duplicate"
> functionality in zpool.  For example, zpool can skip disks which are
> already currently in use, or it can recommend whole disks (rather than
> partitions) if none of the logical disk partitions are currently in
> use.
> 
> The zfs commands are currently at least an order of magnitude easier
> to comprehend and use than the legacy commands related to storage
> devices.  It would be nice if the zfs commands will continue to
> simplify what is now quite obtuse.
> 
> Bob
> ==
> Bob Friesenhahn
> [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

-

Charles Soto[EMAIL PROTECTED]
Director, Information Technology         TEL: 512-740-1888
The University of Texas at Austin        FAX: 512-475-9711
College of Communication, CMA 5.150G
1 University Station A0900, Austin, TX 78712
http://communication.utexas.edu/technology/



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Per-user home filesystems and OS-X Leopard anomaly

2008-06-08 Thread Charles Soto
On 5/21/08 12:43 PM, "Bob Friesenhahn" <[EMAIL PROTECTED]> wrote:

> I can't speak from a Mac-centric view, but for my purposes NFS in
> Leopard works well.  The automounter in Leopard is a perfect clone of
> the Solaris automounter, and may be based on OpenSolaris code.

I had heard it was, and I have to concur.  Leopard is the first OS X
automounter that actually works as expected.  There was zero fiddling with
our Solaris 10U5 NFS server (a Thumper).

Charles

-

Charles Soto[EMAIL PROTECTED]
Director, Information Technology         TEL: 512-740-1888
The University of Texas at Austin        FAX: 512-475-9711
College of Communication, CMA 5.150G
1 University Station A0900, Austin, TX 78712
http://communication.utexas.edu/technology/



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 3510 JBOD with multipath

2008-05-23 Thread Charles Soto
The Solaris SAN Configuration and Multipathing Guide proved very helpful for me:

http://docs.sun.com/app/docs/doc/820-1931/

I, too was surprised to see MPIO enabled by default on x86 (we're using Dell/EMC
CX3-40 with our X4500 & X6250 systems).

Charles

Quoting Krutibas Biswal <[EMAIL PROTECTED]>:

> Robert Milkowski wrote:
> > Hello Krutibas,
> >
> > Wednesday, May 21, 2008, 10:43:03 AM, you wrote:
> >
> > KB> On x64 Solaris 10, the default setting of mpxio was :
> >
> > KB> mpxio-disable="no";
> >
> > KB> I changed it to
> >
> > KB> mpxio-disable="yes";
> >
> > KB> and rebooted the machine and it detected 24 drives.
> >
> > Originally you wanted to get it multipathed which was the case by
> > default. Now you have disabled it (well, you still have to paths but
> > no automatic failover).
> >
> Thanks. Can somebody point me to some documentation  on this ?
> I wanted to see 24 drives so that I can use load sharing between
> two controllers (C1Disk1, C2Disk2, C1Disk3, C2Disk4...) for
> performance.
>
> If I enable multipathing, would the drive do automatic load balancing
> (sharing) between the two controllers ?
>
> Thanks,
> Krutibas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris SAMBA questions

2008-05-15 Thread Charles Soto
Mertol, take a look at this article:

http://www.sun.com/bigadmin/features/articles/kerberos_s10.jsp

This is the track we are taking.  However, note that I am having trouble
with S10 U5 that I wasn't with S10 U4 (see my note in the Wiki comments).

I sit down with our Windows AD folks next week to look through the logs as
this fails, so I may have a solution for it soon enough.

Charles


On 5/15/08 2:51 PM, "Mertol Ozyoney" <[EMAIL PROTECTED]> wrote:

> Hi All ;
> 
>  
> 
> Need help for figuring out a solution for customer requirements.
> 
>  
> 
> We will most probably be using Solaris 10u5 or OpenSolaris 10.5 . So I will
> be very please if you can state your opinion for Solaris 10 + SAMBA and
> OpenSolaris integrated Cifs serving capabilities.
> 
>  
> 
> System will accessed by several windows systems. First requirement system is
> to have auditing capabilities. Customer wants to be able to see, who have
> done what at what time on files.
> 
> Second requirement is about administering the file permissions. Here are
> some questions. (sorry for the question I have absolutely no knowledge about
> samba)
> 
>  
> 
> 1)   Can SAMBA get the user lists from active directory ? (I quess this
> is basic functionality and could be done)
> 
> 2)  Ones a owner ship for a directory assigned can the owner set the
> permissions from a windows workstation ?
> 
>  
> 
> Thanks for the answers
> 
>  
> 
> Best regards
> 
>  
> 
>  
> 
>  
> 
> 
>  <http://www.sun.com/> http://www.sun.com/emrkt/sigs/6g_top.gif
> 
> Mertol Ozyoney 
> Storage Practice - Sales Manager
> 
> Sun Microsystems, TR
> Istanbul TR
> Phone +902123352200
> Mobile +905339310752
> Fax +90212335
> Email  <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]
> 
>  
> 
>  
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-

Charles Soto[EMAIL PROTECTED]
Director, Information Technology         TEL: 512-740-1888
The University of Texas at Austin        FAX: 512-475-9711
College of Communication, CMA 5.150G
1 University Station A0900, Austin, TX 78712
http://communication.utexas.edu/technology/



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] server-reboot

2007-10-11 Thread Charles Baker

Hi Claus,

Were you able to collect the core file?  If so, please provide us with the
core file so we can take a look.  I can provide specific upload instructions
offline.

thanks
Charles


eric kustarz wrote:

This looks like a bug in the sd driver (SCSI).

Does this look familiar to anyway from the sd group?

eric

On Oct 10, 2007, at 10:30 AM, Claus Guttesen wrote:

  

Hi.

Just migrated to zfs on opensolaris. I copied data to the server using
rsync and got this message:

Oct 10 17:24:04 zetta ^Mpanic[cpu1]/thread=ff0007f1bc80:
Oct 10 17:24:04 zetta genunix: [ID 683410 kern.notice] BAD TRAP:
type=e (#pf Page fault) rp=ff0007f1b640 addr=fffecd873000
Oct 10 17:24:04 zetta unix: [ID 10 kern.notice]
Oct 10 17:24:04 zetta unix: [ID 839527 kern.notice] sched:
Oct 10 17:24:04 zetta unix: [ID 753105 kern.notice] #pf Page fault
Oct 10 17:24:04 zetta unix: [ID 532287 kern.notice] Bad kernel fault
at addr=0xfffecd873000
Oct 10 17:24:04 zetta unix: [ID 243837 kern.notice] pid=0,
pc=0xfbbc1a9f, sp=0xff0007f1b730, eflags=0x10286
Oct 10 17:24:04 zetta unix: [ID 211416 kern.notice] cr0:
8005003b cr4: 6b8
Oct 10 17:24:04 zetta unix: [ID 354241 kern.notice] cr2:
fffecd873000 cr3: 300 cr8: c
Oct 10 17:24:04 zetta unix: [ID 592667 kern.notice] rdi:
fffecd872f80 rsi:a rdx: ff0007f1bc80
Oct 10 17:24:04 zetta unix: [ID 592667 kern.notice] rcx:
21  r8:  927454bc6fa  r9:  927445906ba
Oct 10 17:24:04 zetta unix: [ID 592667 kern.notice] rax:
20 rbx: fffefef2ea40 rbp: ff0007f1b770
Oct 10 17:24:04 zetta unix: [ID 592667 kern.notice] r10:
 79602 r11: fffecd872e18 r12: fffecd872f80
Oct 10 17:24:04 zetta unix: [ID 592667 kern.notice] r13:
fffecd872f88 r14: 04209380 r15: fb84ce30
Oct 10 17:24:04 zetta unix: [ID 592667 kern.notice] fsb:
 0 gsb: fffec1c31500  ds:   4b
Oct 10 17:24:04 zetta unix: [ID 592667 kern.notice]  es:
4b  fs:0  gs:  1c3
Oct 10 17:24:04 zetta unix: [ID 592667 kern.notice] trp:
 e err:0 rip: fbbc1a9f
Oct 10 17:24:04 zetta unix: [ID 592667 kern.notice]  cs:
30 rfl:10286 rsp: ff0007f1b730
Oct 10 17:24:04 zetta unix: [ID 266532 kern.notice]   
ss:   38

Oct 10 17:24:04 zetta unix: [ID 10 kern.notice]
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1b520 unix:die+ea ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1b630 unix:trap+135b ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1b640 unix:_cmntrap+e9 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1b770 scsi:scsi_transport+1f ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1b7f0 sd:sd_start_cmds+2f4 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1b840 sd:sd_core_iostart+17b ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1b8a0 sd:sd_mapblockaddr_iostart+185 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1b8f0 sd:sd_xbuf_strategy+50 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1b930 sd:xbuf_iostart+103 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1b960 sd:ddi_xbuf_qstrategy+60 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1b9a0 sd:sdstrategy+ec ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1b9d0 genunix:bdev_strategy+77 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1ba00 genunix:ldi_strategy+54 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1ba50 zfs:vdev_disk_io_start+219 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1ba70 zfs:vdev_io_start+1d ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1bab0 zfs:zio_vdev_io_start+123 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1bad0 zfs:zio_next_stage_async+bb ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1baf0 zfs:zio_nowait+11 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1bb50 zfs:vdev_queue_io_done+a5 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1bb90 zfs:vdev_disk_io_done+29 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1bbb0 zfs:vdev_io_done+1d ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1bbd0 zfs:zio_vdev_io_done+1b ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1bc60 genunix:taskq_thread+1a7 ()
Oct 10 17:24:04 zetta genunix: [ID 655072 kern.notice]
ff0007f1bc70 unix:thread_start+8 ()
Oct 10 17:24:04 zetta unix: [ID 10 kern.notice]
Oct 10 17:24:04 zetta genunix: [ID 672855 kern.notice] syncing file  
systems...

Oct 10 17:24:04 zetta genunix: [ID 733762 kern.notice]  26
Oct 10 17:24:05 zetta genunix: [ID 733762 kern.notice]  3
Oct 10 17:24

Re: [zfs-discuss] Single SAN Lun presented to 4 Hosts

2007-08-28 Thread Charles DeBardeleben
Are you sure that UFS writes a-time on read-only filesystems? I do not think
that it is supposed to. If it does, I think that this is a bug. I have 
mounted
read-only media before, and not gotten any write errors.

-Charles

David Olsen wrote:
>> On 27/08/2007, at 12:36 AM, Rainer J.H. Brandt wrote:
>> 
>>> Sorry, this is a bit off-topic, but anyway:
>>>
>>> Ronald Kuehn writes:
>>>   
>>>> No. You can neither access ZFS nor UFS in that
>>>> 
>> way. Only one
>> 
>>>> host can mount the file system at the same time
>>>> 
>> (read/write or
>> 
>>>> read-only doesn't matter here).
>>>> 
>>> I can see why you wouldn't recommend trying this
>>>   
>> with UFS
>> 
>>> (only one host knows which data has been committed
>>>   
>> to the disk),
>> 
>>> but is it really impossible?
>>>
>>> I don't see why multiple UFS mounts wouldn't work,
>>>   
>> if only one
>> 
>>> of them has write access.  Can you elaborate?
>>>   
>> Even with a single writer you would need to be
>> concerned with read  
>> cache invalidation on the read-only hosts and
>> (probably harder)  
>> ensuring that read hosts don't rely on half-written
>> updates (since  
>> UFS doesn't do atomic on-disk updates).
>>
>> Even without explicit caching on the read-only hosts
>> there is some  
>> "implicit caching" when, for example, a read host
>> reads a directory  
>> entry and then uses that information to access a
>> file. The file may  
>> have been unlinked in the meantime. This means that
>> you need atomic  
>> reads, as well as writes.
>>
>> Boyd
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
>> ss
>> 
>
> It's worse than this.  Consider the read-only clients.  When you access a 
> filesystem object (file, directory, etc.), UFS will write metadata to update 
> atime.  I believe that there is a noatime option to mount, but I am unsure as 
> to whether this is sufficient.
>
> my 2c.
> --Dave
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] AVS replication vs ZFS send recieve for odd sized volume pairs

2007-05-22 Thread Charles DeBardeleben

For the moment, SolarisCluster 3.2 does not support using AVS replication
within a cluster for failover of storage. We do support using storage based
replication for failover data with high end Hitachi based storage.
Also at this point SolarisCluster does not ship with support for zfs send.
You could probably write your own agent for zfs send using our
agent builder tool. However, integrating this with the HANFS agent
that ships with SolarisCluster will require that you are familiar with
all of the failures that you may hit and what recovery action you want to
take.

-Charles

a habman wrote:

Hello all,  I am interested in setting up an HA NFS server with zfs as
the storage filesystem on Solaris 10 + Sun Cluster 3.2. This is an HPC
environment with a 70 node cluster attached. File sizes are 1-200meg
or so, with an average around 10meg.

I have two servers, and due to changing specs through time I have
ended up with heterogeneus storage.  They are physically close to each
other, so no offsite replication needs.

Server A has an areca 12 port raid card attached to 12x400 gig drives.
Server B has an onboard raid with 6 available slots which I plan on
populating with either 750 gig or 1tb drives.

With AVS 4.0 (which I have running on a test volume pair) I am able to
mirror the zpools at the block level, but I am forced to have an equal
number of LUNs for it to work on( AVS mirrors block devices that zfs
works on top of).  If I carve up each raid set into 4 volumes, AVS
those(plus bitmap volumes) and then ZFS stripe over that,
theoretically I am in business, although this has a couple of
downsides.

If I want to maximize my performance first, while keeping a margin of
safety in this replicated environment, how can I best use my storage?

Option one:

  AVS + Hardware raid 5 on each side.  Make 4 LUNs and zfs stripe on
top.  Hardware raid takes care of drive failure. AVS ensures that the
whole storage pool is replicated at all times to Server B. This method
does not take advantage of disk caching zfs can do, nor additional
performance scheduling zfs would like to manage at the drive level.
Also unknown is how the SC3.2 HA ZFS module will work on an AVS zfs
filesystem as I believe it was designed for a fiberchannel shared set
of disks. On the plus side with this method we have block level
replication, so close to instantaneous sync between filesystems.

Option two:
  Full zfs pools on both side using zfs send+zfs recieve for the
replication.  This has benifits because my pools can be different
sized and grow and thats ok. Could also be mounted on server B as well
(most of the time).  Downside is I have to hack a zfs send + recieve
script+cron job, which is not likely as bombproof as the tried and
tested AVS?

So... basically, how are you all doing replication between two
different disk topologies using zfs?

I am a solaris newbie, attracted by the smell of the zfs, and so
pardon my lack of in depth knowledge into these issues.

Thank you in advance.

Ahab
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Q: recreate pool?

2007-05-02 Thread Charles Debardeleben

Note that you do not have to use the emcpower name to get
the path redundancy of PowerPath. By default PowerPath
layers in the cb_ops of the driver so that emcpower and all
c#t#d# device entries for a LUN all go through PowerPath in
the same way. I think the emc documentation refers to to this as
native device support, but I could be wrong about the name.

-Charles

Gonzalo Siero wrote:

Hi there,

  because of a problem with EMC Power Path we need to change the 
configuration of a ZFS pool changing "emcpower?g" devices (EMC Power 
Path created devices) to underlaying "c#t#d#" (Solaris path to those 
devices). Problem is that they has non redundant config:


"amebdo:/>zpool status orapool
conjunto: orapool
estado: ONLINE
limpiar: no se ha solicitado ninguna
config:

  NAME  STATE READ WRITE CKSUM
  orapool   ONLINE   0 0 0
emcpower1g  ONLINE   0 0 0
emcpower2g  ONLINE   0 0 0
emcpower3g  ONLINE   0 0 0
emcpower4g  ONLINE   0 0 0
emcpower5g  ONLINE   0 0 0
emcpower6g  ONLINE   0 0 0
emcpower7g  ONLINE   0 0 0
emcpower8g  ONLINE   0 0 0
emcpower9g  ONLINE   0 0 0

i'm thinking about creating a mirror, then replacing the "emcpower?g" 
by "c#t#d#", then recreating the mirror, then broken the mirror leving 
the original disks but this has two problems:


1.- Pool is huge so lot of IO will be done when sync afecting perf.
2.- lot of available LUNs are needed to create the mirror

is there any way appart from the above (or taking a full backup and 
then recreate from scratf the pool) to do this?


remember that we only want to change config but LUNs are physically 
the same.


Many thanks,
Gonzalo.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS and Sun Cluster....

2006-05-30 Thread Charles Debardeleben
Re: Quorum and ZFS.
PGR is a property of the scsi devices in the zpool,
not a property of ZFS or the zpool. The same is true for
SCSI 2 reserve/release protocol. The PGRE protocol requires
a reserved part of the disk. However, this reserved part
of the disk is reserved at the "label" level. While it is
not part of the label itself, the Solaris label reserves a few
tracks of the disk for use by PGRE. Therefor, it should be possible
to use a disk that is in the zpool as a quorum device for both PGR
and PGRE situations.

Re: Why doesn't GFS (PxFS) just work on zfs.
Given that PxFS just works at the vfs layer, I would have
thought that this would have "just worked". However, it turns out
that PxFS has too much knowledge of the interaction if the
filesystem and the virtual memory system for this to work. I think
the biggest problem was the variable blocksize feature of ZFS. It
is possible that PxFS will be made to work with ZFS, but I doubt it.
The current direction is more towards making ZFS itself
a "cluster filesystem" so that all of the advantages of ZFS can be
utilized without being hampered by a layered filesystem.

If you need more details about why PxFS did not work with ZFS, contact
me, and I will try to get more details.

-Charles

>Date: Tue, 30 May 2006 10:30:14 -0700 (PDT)
>From: Tatjana S Heuser <[EMAIL PROTECTED]>
>Subject: [zfs-discuss] Re:  ZFS and Sun Cluster
>To: zfs-discuss@opensolaris.org
>MIME-version: 1.0
>Content-transfer-encoding: 7BIT
>X-BeenThere: zfs-discuss@opensolaris.org
>Delivered-to: zfs-discuss@opensolaris.org
>X-PMX-Version: 5.1.2.240295
>X-Original-To: zfs-discuss@opensolaris.org
>X-OpenSolaris-URL:  
>http://www.opensolaris.org/jive/message.jspa?messageID=40507&tstart=0#40507
>X-Mailman-Version: 2.1.4
>List-Post: <mailto:zfs-discuss@opensolaris.org>
>List-Subscribe: <http://mail.opensolaris.org/mailman/listinfo/zfs-discuss>, 
<mailto:[EMAIL PROTECTED]>
>List-Unsubscribe: <http://mail.opensolaris.org/mailman/listinfo/zfs-discuss>, 
<mailto:[EMAIL PROTECTED]>
>List-Archive: <http://mail.opensolaris.org/pipermail/zfs-discuss>
>List-Help: <mailto:[EMAIL PROTECTED]>
>List-Id: zfs-discuss.opensolaris.org
>
>> SunCluster will support ZFS in our 3.2 release of SunCluster,
>> via the HAStoragePlus resource type. This support will be for
>> failover use only, not scaleable or active-active applications.
>
>What about quorum reservation in ZFS storage pools.
>AFAIK ZFS does not support SCSI3 persistent group reservation (PGR)
>Will that be emulated? As also done on SCSI2 with PGRE ? 
>Why is there no storage communication via the ORB as we had it in 
>SunCluster 3.1 with SDS or Veritas  since global storage actually is a 
>virtual layer above the raid sw layer. Why doesn't that work with ZFS. 
>Any information/clues? The limitations mentioned do sound strange to me. 
>Is it planned to have the cluster fs or proxy fs layer between the ZFS layer
>and the Storage pool layer?
>
>This sounds exciting and I'm eager to learn more about this. FEED ME :) 
>
>Sorry if the questions in question may sound a bit too detailed, my knowledge 
>base is SunCluster 3.1 and I'm currently proof-reading parts of Rolfs book on 
>SunCluster which is scheduled to be released Q1/2007.
>
>Tatjana
> 
> 
>This message posted from opensolaris.org
>___
>zfs-discuss mailing list
>zfs-discuss@opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Sun Cluster....

2006-05-30 Thread Charles Debardeleben
SunCluster will support ZFS in our 3.2 release of SunCluster,
via the HAStoragePlus resource type. This support will be for
failover use only, not scaleable or active-active applications.
It will use the import|export stuff to do its work. It required
code modifications to HAStoragePlus due to the non-conventional
use of vfstab, and the "device" argument of the mount command.

HAStoragePlus will also support zones in SunCluster 3.2. However in
Zones, HAStoragePlus will first mount the filesystem in the global
zone, then export it to the local zone via lofs.

SunCluster 3.2 will support Sol 10u2, but not Nevada or OpenSolaris.

-Charles

>Date: Fri, 26 May 2006 12:08:38 -0700
>From: Erik Trimble <[EMAIL PROTECTED]>
>Subject: [zfs-discuss] ZFS and Sun Cluster
>To: ZFS Discussions 
>MIME-version: 1.0
>Content-transfer-encoding: 7BIT
>X-BeenThere: zfs-discuss@opensolaris.org
>Delivered-to: zfs-discuss@opensolaris.org
>X-PMX-Version: 5.1.2.240295
>X-Original-To: zfs-discuss@opensolaris.org
>X-Mailman-Version: 2.1.4
>List-Post: <mailto:zfs-discuss@opensolaris.org>
>List-Subscribe: <http://mail.opensolaris.org/mailman/listinfo/zfs-discuss>, 
<mailto:[EMAIL PROTECTED]>
>List-Unsubscribe: <http://mail.opensolaris.org/mailman/listinfo/zfs-discuss>, 
<mailto:[EMAIL PROTECTED]>
>List-Archive: <http://mail.opensolaris.org/pipermail/zfs-discuss>
>List-Help: <mailto:[EMAIL PROTECTED]>
>List-Id: zfs-discuss.opensolaris.org
>
>I'm seriously looking at using the SunCluster software in combination
>with ZFS (either in Sol 10u2, or Nevada).  I'm really looking at doing a
>dual-machine HA setup, probably active-active.
>
>How well does ZFS play in a SunCluster?  I've looked at the "zfs
>[import|export]" stuff, and I'm a little confused as to how it might
>work in an HA environment for complete hot-failover. Especially if I do
>something like throw Zones into the mix (as:  run 2 machines, each with
>2 zones on them, and cluster the zones in 2 clusters. both machines
>dual-attached to some JBOD).  
>
>
>
>-- 
>Erik Trimble
>Java System Support
>Mailstop:  usca14-102
>Phone:  x17195
>Santa Clara, CA
>Timezone: US/Pacific (GMT-0800)
>
>___
>zfs-discuss mailing list
>zfs-discuss@opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss