from:"Nathan Kroenert"

[zfs-discuss] zvol wrapped in a vmdk by Virtual Box and double writes?

2012-11-20 Thread Nathan Kroenert


 Hi folks,  (Long time no post...)

Only starting to get into this one, so apologies if I'm light on detail, 
but...


I have a shiny SSD I'm using to help make some VirtualBox stuff I'm 
doing go fast.


I have a 240GB Intel 520 series jobbie. Nice.

I chopped into a few slices - p0 (partition table), p1 128GB, p2 60gb.

As part of my work, I have used it both as a RAW device (cxtxdxp1) and 
wrapped partition 1 with a virtualbox created VMDK linkage, and it works 
like a champ. :) Very happy with that.


I then tried creating a new zpool using partition 2 of the disk (zpool 
create c2d0p2) and then carved a zvol out of that (30GB), and wrapped 
*that* in a vmdk.


Still works OK and speed is good(ish) - but there are a couple of things 
in particular that disturb me:
 - Sync writes are pretty slow - only about 1/10th of what I thought I 
might get (about 15MB/s). ASync writes are fast - up to 150MB/s or more.
 - More worringly, it seems that writes are amplified by 2X in that if 
I write 100MB at the guest level, the underlying bare metal ZFS writes 
200M, as observed by iostat. This doesn't happen on the VM's that are 
using RAW slices.


Anyone have any thoughts on what might be happening here?

I can appreciate that if everything comes through as a sync write, it 
goes to the ZIL first, then to it's final resting place - but it seems a 
little over the top that it really is double.


I have also had a play with sync=, primarycache settings and a few other 
things but it doesn't seem to change the behavious


Again - I'm looking for thoughts here - as I have only really just 
started looking into this. Should I happen across anything interesting, 
I'll followup this post.


Cheers,

Nathan. :)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread Nathan Kroenert


 Hi John,

Actually, last time I tried the whole AF (4k) thing, it's performance 
was worse than woeful.


But admittedly, that was a little while ago.

The drives were the seagate green barracuda IIRC, and performance for 
just about everything was 20MB/s per spindle or worse, when it should 
have been closer to 100MB/s when streaming. Things were worse still when 
doing random...


I'm actually looking to put in something larger than the 3*2TB drives 
(triple mirror for read perf) this pool has in it - preferably 3 * 4TB 
drives. (I don't want to put in more spindles - just replace the current 
ones...)


I might just have to bite the bullet and try something with current SW. :).

Nathan.


On 05/29/12 08:54 PM, John Martin wrote:

On 05/28/12 08:48, Nathan Kroenert wrote:


Looking to get some larger drives for one of my boxes. It runs
exclusively ZFS and has been using Seagate 2TB units up until now (which
are 512 byte sector).

Anyone offer up suggestions of either 3 or preferably 4TB drives that
actually work well with ZFS out of the box? (And not perform like
rubbish)...

I'm using Oracle Solaris 11 , and would prefer not to have to use a
hacked up zpool to create something with ashift=12.


Are you replacing a failed drive or creating a new pool?

I had a drive in a mirrored pool recently fail.  Both
drives were 1TB Seagate ST310005N1A1AS-RK with 512 byte sectors.
All the 1TB Seagate boxed drives I could find with the same
part number on the box (with factory seals in place)
were really ST1000DM003-9YN1 with 512e/4196p.  Just being
cautious, I ended up migrating the pools over to a pair
of the new drives.  The pools were created with ashift=12
automatically:

  $ zdb -C | grep ashift
  ashift: 12
  ashift: 12
  ashift: 12

Resilvering the three pools concurrently went fairly quickly:

  $ zpool status
scan: resilvered 223G in 2h14m with 0 errors on Tue May 22 
21:02:32 2012
scan: resilvered 145G in 4h13m with 0 errors on Tue May 22 
23:02:38 2012
scan: resilvered 153G in 3h44m with 0 errors on Tue May 22 
22:30:51 2012


What performance problem do you expect?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-28 Thread Nathan Kroenert


 Hi folks,

Looking to get some larger drives for one of my boxes. It runs 
exclusively ZFS and has been using Seagate 2TB units up until now (which 
are 512 byte sector).


Anyone offer up suggestions of either 3 or preferably 4TB drives that 
actually work well with ZFS out of the box? (And not perform like 
rubbish)...


I'm using Oracle Solaris 11 , and would prefer not to have to use a 
hacked up zpool to create something with ashift=12.


Thoughts on the best drives - or is Solaris 11 actually ready to go with 
whatever I throw at it? :)


And - am I doomed to have to use these so called 'advanced format' 
drives (which as far as I can tell are in no way actually advanced, and 
only benefit HDD makers and not the end user).


Cheers!

Nathan.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Convert pool from ashift=12 to ashift=9

2012-03-20 Thread Nathan Kroenert


 Jim Klimov wrote:

 It is is hard enough already to justify to an average wife that...snip

That made my night. Thanks, Jim. :)



On 03/20/12 10:29 PM, Jim Klimov wrote:

2012-03-18 23:47, Richard Elling wrote:


...
Yes, it is wrong to think that.


Ok, thanks, we won't try that :)


copy out, copy in. Whether this is easy or not depends on how well
you plan your storage use ...


Home users and personal budgets do tend to have a problem
with planning. Any mistake is to be paid for personally,
and many are left as is. It is is hard enough already
to justify to an average wife that a storage box with
large X-Tb disks needs raidz3 or mirroring and thus
becomes larger and noisier, not to mention almost a
thousand bucks more expensive just for the redundancy
disks, but it will become two times cheaper in a year.

Yup, it is not very easy to find another 10+Tb backup
storage (with ZFS reliability) in a typical home I know
of. Planning is not easy...

But that's a rant... Hoping that in-place BP Rewrite
would arrive and magically solve many problems =)





Questions are:
1) How bad would a performance hit be with 512b blocks used
on a 4kb drive with such efficient emulation?


Depends almost exclusively on the workload and hardware. In my
experience, most folks who bite the 4KB bullet have low-cost HDDs
where one cannot reasonably expect high performance.


Is it
possible to model/emulate the situation somehow in advance
to see if it's worth that change at all?


It will be far more cost effective to just make the change.



Meaning altogether? That with consumer disk which will suck
from performance standpoint anyway, it was not a good idea
to use ashift=12 and it was more cost effective to remain
at ashift=9, to begin with?

What about real-people's tests which seemed to show that
there were substantial performance hits with misaligned
large-block writes (spanning several 4k sectors at wrong
boundaries)?



I had an RFE posted sometime last year about making an
optimisation for both worlds: use formal ashift=9 and allow
writing of small blocks, but align larger blocks at set
boundaries (i.e. offset divisible by 4096 for blocks sized
4096+). Perhaps writing of 512b blocks near each other
should only be reserved for metadata which is dittoed
anyway, so that a whole-sector (4kb) corruption won't
be irreversible for some data. In effect, minblocksize
for userdata would be enforced (by config) at the same
4kb in such case.

This is a zfs-write only change (and some custom pool
or dataset attributes), so the on-disk format and
compatibility should not suffer with this solution.
But I had little feedback whether the idea was at
all reasonable.





2) Is it possible to easily estimate the amount of wasted
disk space in slack areas of the currently active ZFS
allocation (unused portions of 4kb blocks that might
become available if the disks were reused with ashift=9)?


Detailed space use is available from the zfs_blkstats mdb macro
as previously described in such threads.


3) How many parts of ZFS pool are actually affected by the
ashift setting?


Everything is impacted. But that isn't a useful answer.


From what I gather, it is applied at the top-level vdev
level (I read that one can mix ashift=9 and ashift=12
TLVDEVs in one pool spanning several TLVDEVs). Is that
a correct impression?


Yes


If yes, how does ashift size influence the amount of
slots in uberblock ring (128 vs. 32 entries) which is
applied at the leaf vdev level (right?) but should be
consistent across the pool?


It should be consistent across the top-level vdev.

There is 128KB of space available for the uberblock list. The minimum
size of an uberblock entry is 1KB. Obviously, a 4KB disk can't write
only 1KB,
so for 4KB sectors, there are 32 entries in theuberblock list.


So if I have ashift=12 and ashift=9 top-level devices
mixed in the pool, it is okay that some of them would
remember 4x more of pool's TXG history than others?





As far as I see in ZFS on-disk format, all sizes and
offsets are in either bytes or 512b blocks, and the
ashift'ed block size is not actually used anywhere
except to set the minimal block size and its implicit
alignment during writes.


The on-disk format doc is somewhat dated and unclear here. UTSL.


Are there any updates, or the 2006 pdf is the latest available?
For example, is there an effort in illumos/nexenta/openindiana
to publish their version of the current on-disk format? ;)

Thanks for all the answers,
//Jim


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Bad performance (Seagate drive related?)

2012-02-04 Thread Nathan Kroenert


 Hey there,

Few things:
 - Using /dev/zero is not necessarily a great test. I typically use 
/dev/urandom to create an initial block-o-stuff - something like a gig 
or so worth, in /tmp, then use dd to push that to my zpool. (/dev/zero 
will return dramatically different results depending on pool/dataset 
settings for compression etc.)
 - Indeed - getting a total aggregate of 180MB/s seems pretty low on 
the face of it for the setup you have. What's the controller you are 
using? Any details on the driver, backplane, expander, array or other 
you might be using?
 - Have you tried your dd on individual spindles? You might find that 
they behave differently
 - Does your controller have DRAM on it? Can you put it in passthrough 
mode rather than cache?
 - I have done some testing trying to find odd behaviour like this 
before, and found on different occasions a number of different things:

- Drives: Things like the WD 'green' drives getting in my way
- Alignment for non-EFI labled disks (hm - maybe even on EFI... 
that one was a while ago) (particularly for 4K 'advanced format' (ha!) 
disks)
- The controller was unable to keep up. (In one case, I ended up 
tossing an HP P400 (IIRC) and using the on-motherboard chipset as it was 
considerably faster when running four disks
- Disks with wildly different performance characteristics were also 
bad (eg: Enterprise SATA mixed with 5400 RPM disks. ;)


I'd suggest that you spend a little time validating the basic 
assumptions around:

 - speed of individual disks,
 - speed of individual buses
 - Whether you are being limited by CPU (ie: If you have compression or 
dedupe turned on) (view with mpstat and friends)
 - I'll also note that you are looking close to the number of IOPS I'd 
expect a consumer disk to supply assuming a somewhat random distribution 
of IOPS.
 - Consider that your 180MB/s is actually 360 (well - not quite - but 
it's a lot more than 180). Remember - in a mirror, you literally need to 
write the data twice.

  8.0 3857.8 64.0 337868.8 0.0 64.5 0.0 16.7 0 704 c5
  (Note above is your c5 controller - running at around 337 MB/s)

Incidentally - this seems awfully close to 3Gb/s... How did you say all 
of your external drives were attached? If I didn't know better, I'd be 
asking serious questions about how many lanes of a SAS connection sata 
attached drives were able to use... Actually - I don't know better, so 
I'd ask anyway... ;)


I think this will likely go along way to helping understand where the 
holdup is.


There is also a heap of great stuff on solarisinternals.com which I'd 
highly recommend taking a look at after you have validated the basics...


Were this one of my systems, (and especially if it's new, and you don't 
love your data and can re-create the pool) I'd be tempted to do 
something like a very destructive...


for i in all your disks
do
dd if=/tmp/randomdata.file.I.created.earlier of=/dev/rdsk/${i} 
done

and see how much you can stuff down the pipe.

Remember - this will kill whatever is on the disks, do think twice 
before you do it. ;)


If you can't get at least 80-100MB/s on the outside of the platter, I'd 
suggest you should be looking at layers below ZFS. If you *can*, then 
you start looking further up the stack.


Hope this helps somewhat. Let us know how you go.

Cheers!

Nathan.

On 02/ 1/12 04:52 AM, Mohammed Naser wrote:

Hi list!

I have seen less-than-stellar ZFS performance on a setup of one main
head connected to a JBOD (using SAS, but drives are SATA).  There are
16 drives (8 mirrors) in this pool but I'm getting 180ish MB
sequential writes (using dd, I know it's not precise, but those
numbers should be higher).

With some help on IRC, it seems that part of the reason I'm slowing
down is some drives seem to be slower than the others.  Initially, I
had some drives running at 1.5 mode instead of 3.0 -- They are all
running at 3.0 now.  While running the following dd command, the
output of iostat reflects a much higher %b which seems to say that
those drives are slower (but could they really be slowing down
everything else that much? --- Or am I looking at the wrong spot
here?) -- The pool configuration is also included below

dd if=/dev/zero of=4g bs=1M count=4000

 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 1.00.08.00.0  0.0  0.00.00.2   0   0 c1
 1.00.08.00.0  0.0  0.00.00.2   0   0 c1t2d0
 8.0 3857.8   64.0 337868.8  0.0 64.50.0   16.7   0 704 c5
 0.0  259.00.0 26386.2  0.0  3.60.0   14.0   0  37
c5t50014EE0ACE4AEEFd0
 1.0  266.08.0 27139.2  0.0  3.60.0   13.5   0  37
c5t50014EE056EB0356d0
 2.0  276.0   16.0 19315.1  0.0  3.70.0   13.3   0  40
c5t50014EE00239C976d0
 0.0  279.00.0 19699.0  0.0  3.60.0   13.0   0  37
c5t50014EE0577C459Cd0
 1.0  232.08.0 23061.9  0.0  3.60.0   15.4   0

Re: [zfs-discuss] Can I create a mirror for a root rpool?

2011-12-18 Thread Nathan Kroenert

 Do note, that though Frank is correct, you have to be a little careful 
around what might happen should you drop your original disk, and only 
the large mirror half is left... ;)


On 12/16/11 07:09 PM, Frank Cusack wrote:
You can just do fdisk to create a single large partition.  The 
attached mirror doesn't have to be the same size as the first component.


On Thu, Dec 15, 2011 at 11:27 PM, Gregg Wonderly gregg...@gmail.com 
mailto:gregg...@gmail.com wrote:


Cindy, will it ever be possible to just have attach mirror the
surfaces, including the partition tables?  I spent an hour today
trying to get a new mirror on my root pool.  There was a 250GB
disk that failed.  I only had a 1.5TB handy as a replacement.
 prtvtoc ... | fmthard does not work in this case and so you have
to do the partitioning by hand, which is just silly to fight with
anyway.

Gregg

Sent from my iPhone

On Dec 15, 2011, at 6:13 PM, Tim Cook t...@cook.ms
mailto:t...@cook.ms wrote:


Do you still need to do the grub install?

On Dec 15, 2011 5:40 PM, Cindy Swearingen
cindy.swearin...@oracle.com
mailto:cindy.swearin...@oracle.com wrote:

Hi Anon,

The disk that you attach to the root pool will need an SMI label
and a slice 0.

The syntax to attach a disk to create a mirrored root pool
is like this, for example:

# zpool attach rpool c1t0d0s0 c1t1d0s0

Thanks,

Cindy

On 12/15/11 16:20, Anonymous Remailer (austria) wrote:


On Solaris 10 If I install using ZFS root on only one
drive is there a way
to add another drive as a mirror later? Sorry if this was
discussed
already. I searched the archives and couldn't find the
answer. Thank you.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

2011-12-18 Thread Nathan Kroenert

 I know some others may already have pointed this out - but I can't see 
it and not say something...


Do you realise that losing a single disk in that pool could pretty much 
render the whole thing busted?


At least for me - the rate at which _I_ seem to lose disks, it would be 
worth considering something different ;)


Cheers!

Nathan.

On 12/19/11 09:05 AM, Jan-Aage Frydenbø-Bruvoll wrote:

Hi,

On Sun, Dec 18, 2011 at 22:00, Fajar A. Nugrahaw...@fajar.net  wrote:

 From http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
(or at least Google's cache of it, since it seems to be inaccessible
now:


Keep pool space under 80% utilization to maintain pool performance.
Currently, pool performance can degrade when a pool is very full and
file systems are updated frequently, such as on a busy mail server.
Full pools might cause a performance penalty, but no other issues. If
the primary workload is immutable files (write once, never remove),
then you can keep a pool in the 95-96% utilization range. Keep in mind
that even with mostly static content in the 95-96% range, write, read,
and resilvering performance might suffer.


I'm guessing that your nearly-full disk, combined with your usage
performance, is the cause of slow down. Try freeing up some space
(e.g. make it about 75% full), just tot be sure.

I'm aware of the guidelines you refer to, and I have had slowdowns
before due to the pool being too full, but that was in the 9x% range
and the slowdown was in the order of a few percent.

At the moment I am slightly above the recommended limit, and the
performance is currently between 1/1 and 1/2000 of what the other
pools achieve - i.e. a few hundred kB/s versus 2GB/s on the other
pools - surely allocation above 80% cannot carry such extreme
penalties?!

For the record - the read/write load on the pool is almost exclusively WORM.

Best regards
Jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-11 Thread Nathan Kroenert


 On 12/11/11 01:05 AM, Pawel Jakub Dawidek wrote:

On Wed, Dec 07, 2011 at 10:48:43PM +0200, Mertol Ozyoney wrote:

Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.

The only vendor i know that can do this is Netapp

And you really work at Oracle?:)

The answer is definiately yes. ARC caches on-disk blocks and dedup just
reference those blocks. When you read dedup code is not involved at all.
Let me show it to you with simple test:

Create a file (dedup is on):

# dd if=/dev/random of=/foo/a bs=1m count=1024

Copy this file so that it is deduped:

# dd if=/foo/a of=/foo/b bs=1m

Export the pool so all cache is removed and reimport it:

# zpool export foo
# zpool import foo

Now let's read one file:

# dd if=/foo/a of=/dev/null bs=1m
1073741824 bytes transferred in 10.855750 secs (98909962 bytes/sec)

We read file 'a' and all its blocks are in cache now. The 'b' file
shares all the same blocks, so if ARC caches blocks only once, reading
'b' should be much faster:

# dd if=/foo/b of=/dev/null bs=1m
1073741824 bytes transferred in 0.870501 secs (1233475634 bytes/sec)

Now look at it, 'b' was read 12.5 times faster than 'a' with no disk
activity. Magic?:)



Hey all,

That reminds me of something I have been wondering about... Why only 12x 
faster? If we are effectively reading from memory - as compared to a 
disk reading at approximately 100MB/s (which is about an average PC HDD 
reading sequentially), I'd have thought it should be a lot faster than 12x.


Can we really only pull stuff from cache at only a little over one 
gigabyte per second if it's dedup data?


Cheers!

Nathan.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import hangs - pls help

2011-06-14 Thread Nathan Kroenert


 Hi Max,

Unhelpful questions about your CPU aside, what else is your box doing?

Can you run up a second or third shell (ssh or whatever) and watch if 
the disks / system are doing any work?


Were it Solaris, I'd run:
iostat -x
prstat -a
vmstat
mpstat (Though as discussed, you have only a single core CPU)
echo ::memstat | mdb -k  (No idea how you might do that in BSD)

Some other things to think about:
 - Have you tried removing the extra memory? I have indeed seen in 
crappy PC hardware where more than 3GB caused some really bad behaviour 
in Solaris.
 - Have you tried booting into a current Solaris (from CD) and seeing 
if it can import the pool? (Don't upgrade - just import) ;)



I'm aware that there were some long import issues discussed on the list 
recently - someone had an import take some 12 hours or more - would be 
worth looking over the last few weeks posts.


Also - getting a truss or pstack (if freebds has that?) of the process 
trying to initiate the import might help some of the more serious folks 
on the list to see where it's getting stuck. (Or if indeed, it's 
actually getting stuck, and not simply catastrophically slow.)


Hope this helps at least a little.

Cheers,

Nathan.

On 06/14/11 03:20 PM, Maximilian Sarte wrote:

Hi,
   I am posting here in a tad of desperation. FYI, I am running FreeNAS 8.0.
Anyhow, I created a raidz1 (tank1) with 4 x 2Tb WD EARS hdds.
All was doing ok until I decided to up the RAM to 4 Gb since it is what was 
recommended. Asap I re-started data migration, the ZFS issued messages 
indicating that the pool was unavailable and froze the system.
After reboot (FN is based in FreeBSD) and re-installing FN (did not want to 
complete booting - probably a corruption on the USB stick it was running from), 
tank1 was unavailable.
Stauts indicates that there are no pools as List does.
Import indicates that tank1 is OK and all 4 hdds are ONLINE and their status 
seems OK.
When I try either:

zpool import tank1
zpool imprt -f tank1
zpool import -fF tank1

the commands simply hang forever (FreeNAS semms OK).

Any suggestions would be immensly appreciated.
Tx!


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool scrub on b123

2011-04-16 Thread Nathan Kroenert


 Hi Karl,

Is there any chance at all that some other system is writing to the 
drives in this pool? You say other things are writing to the same JBOD...


Given that the amount flagged as corrupt is so small, I'd imagine not, 
but thought I'd ask the question anyways.


Cheers!

Nathan.

On 04/16/11 04:52 AM, Karl Rossing wrote:

Hi,

One of our zfs volumes seems to be having some errors. So I ran zpool 
scrub and it's currently showing the following.


-bash-3.2$ pfexec /usr/sbin/zpool status -x
  pool: vdipool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are 
unaffected.
action: Determine if the device needs to be replaced, and clear the 
errors

using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go
config:

NAME STATE READ WRITE CKSUM
vdipool  ONLINE   0 0 0
  raidz1 ONLINE   0 0 0
c9t14d0  ONLINE   0 012  6K repaired
c9t15d0  ONLINE   0 013  167K repaired
c9t16d0  ONLINE   0 011  5.50K repaired
c9t17d0  ONLINE   0 020  10K repaired
c9t18d0  ONLINE   0 015  7.50K repaired
spares
  c9t19d0AVAIL

errors: No known data errors


I have another server connected to the same jbod using drives c8t1d0 
to c8t13d0 and it doesn't seem to have any errors.


I'm wondering how it could have gotten so screwed up?

Karl





CONFIDENTIALITY NOTICE:  This communication (including all 
attachments) is
confidential and is intended for the use of the named addressee(s) 
only and

may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any 
attachments, in
whole or in part, by anyone other than the intended recipient(s) is 
strictly
prohibited.  If you have received this communication in error, please 
notify

the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slices and reservations Was: Re: How long should an empty destroy take? snv_134

2011-03-08 Thread Nathan Kroenert


Ed -

Simple test. Get onto a system where you *can* disable the disk cache, 
disable it, and watch the carnage.


Until you do that, you can pose as many interesting theories as you like.

Bottom line is that at 75 IOPS per spindle won't impress many people, 
and that's the sort of rate you get when you disable the disk cache.


Nathan.

On 8/03/2011 11:53 PM, Edward Ned Harvey wrote:

From: Jim Dunham [mailto:james.dun...@oracle.com]

ZFS only uses system RAM for read caching,

If your email address didn't say oracle, I'd just simply come out and say
you're crazy, but I'm trying to keep an open mind here...  Correct me where
the following statement is wrong:  ZFS uses system RAM to buffer async
writes.

Sync writes must hit the ZIL first, and then the sync writes are put into
the write buffer along with all the async writes to be written to the main
pool storage.  So after sync writes hit the ZIL and the device write cache
is flushed, they too are buffered in system RAM.



as all writes must be written to
some form of stable storage before acknowledged. If a vdev represents a
whole disk, ZFS will attempt to enable write caching. If a device does not
support write caching, the attempt to set wce fails silently.

Here is an easy analogy to remember basically what you said:  format -e
can control the cache settings for c0t0d0, but cannot control the cache
settings for c0t0d0s0 because s0 is not actually a device.

I contend:

Suppose you have a disk with on-disk write cache enabled.  Suppose a sync
write comes along, so ZFS first performs a sync write to some ZIL sectors.
Then ZFS will issue the cache flush command and wait for it to complete
before acknowledging the sync write; hence the disk write cache does not
benefit sync writes.  So then we start thinking about async writes, and
conclude:  The async writes were acknowledged long ago, when the async
writes were buffered in ZFS system ram, so there is once again, no benefit
from the disk write cache in either situation.

That's my argument, unless somebody can tell me where my logic is wrong.
Disk write cache offers zero benefit.  And disk read cache only offers
benefit in unusual cases that I would call esoteric.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How long should an empty destroy take? snv_134

2011-03-06 Thread Nathan Kroenert

Why wouldn't they try a reboot -d? That would at least get some data in 
the form of a crash dump if at all possible...


A power cycle seems a little medieval to me... At least in the first 
instance.


The other thing I have noted is that sometimes things to get wedged, and 
if you can find where, (mdb -k and take a poke at the stack of some of 
the zfs/zpool commands that are hung to see what they were operating on) 
and trying a zpool clear on that zpool.  Not that I'm recommending that 
you should *need* to, but that has got me unwedged on occasion. (though, 
usually when I have dome something administratively silly... ;)


Nathan.

 On 7/03/2011 12:14 PM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Yaverot

We're heading into the 3rd hour of the zpool destroy on others.
The system isn't locked up, as it responds to local keyboard input, and

I bet you, you're in a semi-crashed state right now, which will degrade into
a full system crash.  You'll have no choice but to power cycle.  Prove me
wrong, please.   ;-)

I bet, as soon as you type in any zpool or zfs command ... even list
or status they will also hang indefinitely.

Is your pool still 100% full?  That's probably the cause.  I suggest if
possible, immediately deleting something and destroying an old snapshot to
free up a little bit of space.  And then you can move onward...



While this destroy is running all other zpool/zfs commands appear to be
hung.

Oh, sorry, didn't see this before I wrote what I wrote above.  This just
further confirms what I said above.



zpool destroy on an empty pool should be on the order of seconds, right?

zpool destroy is instant, regardless of how much data there is in a pool.
zfs destroy is instant for an empty volume, but zfs destroy takes a long
time for a lot of data.

But as mentioned above, that's irrelevant to your situation.  Because your
system is crashed, and even if you try init 0 or init 6...  They will fail.
You have no choice but to power cycle.

For the heck of it, I suggest init 0 first.  Then wait half an hour, and
power cycle.  Just to try and make the crash as graceful as possible.

As soon as it comes back up, free up a little bit of space, so you can avoid
a repeat.



Yes, I've triple checked, I'm not destroying tank.
While writing the email, I attempted a new ssh connection, it got to the

Last

login: line, but hasn't made it to the prompt.

Oh, sorry, yet again this is confirming what I said above.  semi-crashed and
degrading into a full crash.
Right now, you cannot open any new command prompts.
Soon it will stop responding to ping.  (Maybe 2-12 hours.)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] sorry everyone was: Re: External SATA drive enclosures + ZFS?

2011-02-26 Thread Nathan Kroenert

 Actually, I find that tremendously encouraging. Lots of internal 
Oracle folks still subscribed to the list!


Much better than none... ;)

Nathan.

On 02/26/11 03:29 PM, Yaverot wrote:

Sorry all, didn't realize that half of Oracle would auto-reply to a public 
mailing list since they're out of the office 9:30 Friday nights.  I'll try to 
make my initial post each month during daylight hours in the future.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SIL3114 and sparc solaris 10

2011-02-25 Thread Nathan Kroenert

 I can confirm that on *at least* 4 different cards - from different 
board OEMs - I have seen single bit ZFS checksum errors that went away 
immediately after removing the 3114 based card.


I stepped up to the 3124 (pci-x up to 133mhz) and 3132 (pci-e) and have 
never looked back.


I now throw any 3114 card I find into the bin at the first available 
opportunity as they are a pile of doom waiting to insert an exploding 
garden gnome into the unsuspecting chest cavity of your data.


I'd also add that I have never made an effort to determine if it was 
actually the Solaris driver that was at fault - but being that the other 
two cards I have mentioned are available for about $20 a pop, it's not 
worth my time.


I don't recall if Solaris 10 (Sparc or X86) actually has the si3124 
driver, but if it does, for a cheap thrill, they are worth a bash. I 
have no problems pushing 4 disks pretty much flat out on a PCI-X 133 
3124 based card. (note that there was a pci and a pci-x version of the 
3124, so watch out.)


Cheers!

Nathan.

On 02/24/11 02:10 AM, Andrew Gabriel wrote:

Krunal Desai wrote:
On Wed, Feb 23, 2011 at 8:38 AM, Mauricio Tavares 
raubvo...@gmail.com wrote:

   I see what you mean; in
http://mail.opensolaris.org/pipermail/opensolaris-discuss/2008-September/043024.html 


they claim it is supported by the uata driver. What would you suggest
instead? Also, since I have the card already, how about if I try it 
out?


My experience with SPARC is limited, but perhaps the Option ROM/BIOS
for that card is intended for x86, and not SPARC? I might thinking of
another controller, but this could be the case. You could always try
to boot with the card; the worst that'll probably happen is boot hangs
before the OS even comes into play.


SPARC won't try to run the BIOS on the card anyway (it will only run 
OpenFirmware BIOS), but you will have to make sure the card has the 
non-RAID BIOS so that the PCI class doesn't claim it to be a RAID 
controller, which will prevent Solaris going anywhere near the card at 
all. These cards could be bought with either RAID or non-RAID BIOS, 
but RAID was more common. You can (or could some time back) download 
the RAID and non-RAID BIOS from Silicon Image and re-flash which also 
updates the PCI class, and I think you'll need a Windows system to 
actually flash the BIOS.


You might want to do a google search on 3114 data corruption too, 
although it never hit me back when I used the cards.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] External SATA drive enclosures + ZFS?

2011-02-25 Thread Nathan Kroenert

 I'm with the gang on this one as far as USB being the spawn of the 
devil for mass storage you want to depend on. I'd rather scoop my eyes 
out with a red hot spoon than depend on permanently attached USB 
storage... And - don't even start me on SPARC and USB storage... It's 
like watching pitch flow... (see 
http://en.wikipedia.org/wiki/Pitch_drop_experiment). I never spent too 
much time working out why - but I never seen to get better than about 
10MB/s with SPARC+USB...


When it comes to cheap... I use cheap external SATA/USB combo enclosures 
(single drive ones) as I like the flexibility of being able to use them 
in eSATA mode nice and fast (and reliable considering the $$) or in USB 
mode should I need to split a mirror off and read it on my laptop, which 
has no esata port...


Also - using the single drive enclosures is by far the cheapest (at 
least here in Oz), and you get redundant power supplies, as they use 
their own mini brick AC/DC units. I'm currently very happy using 2TB 
disks in the external eSATA+USB thingies.


I had been using ASTONE external eSATA/USB units - though it seems my 
local shop has stopped carrying them... I liked them as they had 
perforated side panels, which allow the disk to stay much cooler than 
some of my other enclosures... (And have a better 'vertical' stand if 
you want the disks to stand up, rather than lie on their side.)


If your box has PCI-e slots, grab one or two $20 Silicon Image 3132 
controllers with eSATA ports and you should be golden... You will then 
be able to run between 2 and 4 disks - easily pushing them to their 
maximum platter speed - which for most of the 2TB disks is near enough 
to 100M/s at the outer edges. You will also get considerably higher IOPS 
- particularly when they are sequential - using eSATA.


Note: All of this is with the 'cheap' view... You can most certainly buy 
much better hardware... But bang for buck - I have been happy with the 
above.


Cheers!

Nathan.

On 02/26/11 01:58 PM, Brandon High wrote:

On Fri, Feb 25, 2011 at 4:34 PM, Rich Teerrich.t...@rite-group.com  wrote:

Space is starting to get a bit tight here, so I'm looking at adding
a couple of TB to my home server.  I'm considering external USB or
FireWire attached drive enclosures.  Cost is a real issue, but I also

I would avoid USB, since it can be less reliable than other connection
methods. That's the impression I get from older posts made by Sun
devs, at least. I'm not sure how well Firewire 400 is supported, let
alone Firewire 800.

You might want to consider eSATA. Port multipliers are supported in
recent builds (128+ I think), and will give better performance than
USB. I'm not sure if PMP are supported on Sparc though., since it
requires support in both the controller and PMP.

Consider enclosures from other manufacturers as well. I've heard good
things about Sans Digital, but I've never used them. The 2-drive
enclosure has the same components as the item you linked but 1/2 the
cost via Newegg.


The intent would be put two 1TB or 2TB drives in the enclosure and use
ZFS to create a mirrored pool out of them.  Assuming this enclosure is
set to JBOD mode, would I be able to use this with ZFS?  The enclosure

Yes, but I think the enclosure has a SiI5744 inside it, so you'll
still have one connection from the computer to the enclosure. If that
goes, you'll lose both drives. If you're just using two drives, two
separate enclosures on separate buses may be better. Look at
http://www.sansdigital.com/towerstor/ts1ut.html for instance. There
are also larger enclosures with up to 8 drives.


I can't think of a reason why it wouldn't work, but I also have exactly
zero experience with this kind of set up!

Like I mentioned, USB is prone to some flakiness.


Assuming this would work, given that I can't see to find a 4-drive
version of it, would I be correct in thinking that I could buy two of

You might be better off using separate enclosures for reliability.
Make sure to split the mirrors across the two devices. Use separate
USB controllers if possible, so a bus reset doesn't affect both sides.


Assuming my proposed enclosure would work, and assuming the use of
reasonable quality 7200 RPM disks, how would you expect the performance
to compare with the differential UltraSCSI set up I'm currently using?
I think the DWIS is rated at either 20MB/sec or 40MB/sec, so on the
surface, the USB attached drives would seem to be MUCH faster...

USB 2.0 is about 30-40MB/s under ideal conditions, but doesn't support
any of the command queuing that SCSI does. I'd expect performance to
be slightly lower, and to use slightly more CPU. Most USB controllers
don't support DMA, so all I/O requires CPU time.

What about an inexpensive SAS card (eg: Supermicro AOC-USAS-L4i) and
external SAS enclosure (eg: Sans Digital TowerRAID TR4X). It would
cost about $350 for the setup.

-B



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org

Re: [zfs-discuss] ZFS read/write fairness algorithm for single pool

2011-02-14 Thread Nathan Kroenert


Thanks for all the thoughts, Richard.

One thing that still sticks in my craw is that I'm not wanting to write 
intermittently. I'm wanting to write flat out, and those writes are 
being held up... Seems to me that zfs should know and do something about 
that without me needing to tune zfs_vdev_max_pending...


Nonetheless, I'm now at a far more balanced point than when I started, 
so that's a good thing. :)


Cheers,

Nathan.

On 15/02/2011 6:44 AM, Richard Elling wrote:

Hi Nathan,
comments below...

On Feb 13, 2011, at 8:28 PM, Nathan Kroenert wrote:


On 14/02/2011 4:31 AM, Richard Elling wrote:

On Feb 13, 2011, at 12:56 AM, Nathan Kroenertnat...@tuneunix.com   wrote:


Hi all,

Exec summary: I have a situation where I'm seeing lots of large reads starving 
writes from being able to get through to disk.

snip

What is the average service time of each disk? Multiply that by the average
active queue depth. If that number is greater than, say, 100ms, then the ZFS
I/O scheduler is not able to be very effective because the disks are too slow.
Reducing the active queue depth can help, see zfs_vdev_max_pending in the
ZFS Evil Tuning Guide. Faster disks helps, too.

NexentaStor fans, note that you can do this easily, on the fly, via the Settings 
-
Preferences -   System web GUI.
   -- richard


Hi Richard,

Long time no speak! Anyhoo - See below.

I'm unconvinced that faster disks would help. I think faster disks, at least in 
what I'm observing, would make it suck just as bad, just reading faster... ;) 
Maybe I'm missing something.

Faster disks always help :-)


Queue depth is around 10 (default and unchanged since install), and average 
service time is about 25ms... Below are 1 second samples with iostat - while I 
have included only about 10 seconds, it's representative of what I'm seeing all 
the time.
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 360.9   13.0 46190.5  351.4  0.0 10.0   26.7   1 100
sd7 342.9   12.0 43887.3  329.9  0.0 10.0   28.1   1 100

ok, we'll take sd6 as an example (the math is easy :-) ...
actv = 10
svc_t = 26.7

actv * svc_t = 267 milliseconds

This is the queue at the disk. ZFS manages its own queue for the disk,
but once it leaves ZFS, there is no way for ZFS to manage it. In the
case of the active queue, the I/Os have left the OS, so even the OS
is unable to change what is in the queue or directly influence when
the I/Os will be finished.

In ZFS, the queue has a priority scheduler and does place a higher
priority on async writes than async reads (since b130 or so). But what
you can see is that the intermittent nature of the async writes get
stuck behind the 267 milliseconds as the queue drains the reads.
[no, I'm not sure if that makes sense, try again...]
If it sends reads continuously and writes occasionally, it will appear
that reads have much more domination. In older releases, when the
reads and writes had the same priority, this looks even worse.


 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b

sd6 422.10.0 54025.00.0  0.0 10.0   23.6   1 100
sd7 422.10.0 54025.00.0  0.0 10.0   23.6   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 370.0   11.0 47360.4  342.0  0.0 10.0   26.2   1 100
sd7 327.0   16.0 41856.4  632.0  0.0  9.6   28.0   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 388.07.0 49406.4  290.0  0.0  9.8   24.8   1 100
sd7 409.01.0 52350.32.0  0.0  9.5   23.2   1  99

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 423.00.0 54148.60.0  0.0 10.0   23.6   1 100
sd7 413.00.0 52868.50.0  0.0 10.0   24.2   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 400.02.0 51081.22.0  0.0 10.0   24.8   1 100
sd7 384.04.0 49153.24.0  0.0 10.0   25.7   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 401.91.0 51448.98.0  0.0 10.0   24.8   1 100
sd7 424.90.0 54392.40.0  0.0 10.0   23.5   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 215.1  208.1 26751.9 25433.5  0.0  9.3   22.1   1 100
sd7 189.1  216.1 24199.1 26833.9  0.0  8.9   22.1   1  91

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 295.0  162.0 37756.8 20610.2  0.0 10.0   21.8   1 100
sd7 307.0  150.0 39292.6 19198.4  0.0 10.0   21.8   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 405.02.0 51843.86.0

[zfs-discuss] ZFS read/write fairness algorithm for single pool

2011-02-13 Thread Nathan Kroenert


 Hi all,

Exec summary: I have a situation where I'm seeing lots of large reads 
starving writes from being able to get through to disk.


Some detail:
I have a newly constructed box (was an old box, but blew the mobo - 
different story - sigh).


Anyhoo - It's a Gigabyte 890GPA-UD3H - with lots of onboard SATA - and 
an HP P400 Raid controller (PCI-E, 512MB, Battery Backed, presenting 2 
spindles, as single member stripes, so, yeah, the nearest thing to JBOD 
that this controller gets to)


pci bus 0x0002 cardnum 0x00 function 0x00: vendor 0x103c device 0x3230
 Hewlett-Packard Company Smart Array Controller

And it's off this HP controller I'm handing my data zpool.

config:

NAMESTATE READ WRITE CKSUM
dataONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0

Cpu is AMD Phenom II, 6 core 1075T, for what it's worth

I guess my problem is more one that the ZFS folks should be aware of 
rather than something directly impacting me, as the workload I have 
created is not something I typically see - but it is something I see 
easily impacting customers - and in a nasty way should they encounter 
it. It *is* also a case I'll create  from time to time - when I'm moving 
DVD images backwards and forwards...


I was stress testing the box, giving the new kits legs a stretch and 
kicked off the following:
 - create a test file to use as source for my 'full speed streaming 
write' (lazy way)

 - dd if=/dev/urandom  /tmp/1
(and let that run for a few seconds, creating about100MB of random 
junk.)

 - start some jobs
- while :; do cat /tmp/1  /data/delete.me/2; done 
(The write workload, which is fine and dandy by itself)
- while :; do dd if=/data/delete.me/2 of=/dev/null bs=65536; done 

Before I kicked off the read workload, everything looked as expected. I 
was getting between 40 and 60MB/s to each of the disks and all was good. 
BUT - As soon as I introduced the read workload, my write throughput 
dropped to virtually zero, and remained there until the write workload 
was killed.


The starvation is immediate. I can 100% reproducibly go from many MB/s 
of write throughput with no read workload to virtually 0MB/s write 
throughput, simply through kicking off that reading dd. Write 
performance picks up again as soon as I kill the read workload. It also 
behaves the same way of the file I'm reading is NOT the same one I'm 
writing to. (eg: cat  file3  and the dd reading file 2)


Other things to know about the system:
 - Disks are Seagate 2GB, 512 byte sector SATA disks
 - OS is Solaris 11 Express (build 151a)
 - zpool version is old. I'm still hedging my bets on having to go back 
to Nevada (sxce, build 124 or so, which is what I was at before 
installing s11express)

Cached configuration:
version: 19
 - Plenty of space remains in the pool -
bash-4.0$ zpool list
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
data   1.81T  1.34T   480G74%  1.00x  ONLINE  -
 - The box has 8GB of memory - and ZFS is getting a fair whack at it.
 ::memstat
Page SummaryPagesMB  %Tot
     
Kernel 211843   827   11%
ZFS File Data 1426054  5570   73%
Anon   106814   4175%
Exec and libs9364360%
Page cache  47192   1842%
Free (cachelist)31448   1222%
Free (freelist)130431   5097%

Total 1963146  7668
Physical  1963145  7668

 - Rest of the zfs dataset properties:
# zfs get all data
NAME  PROPERTY   VALUE  SOURCE
data  type   filesystem -
data  creation   Mon May 24 10:46 2010  -
data  used   1.34T  -
data  available  451G   -
data  referenced 500G   -
data  compressratio  1.02x  -
data  mountedyes-
data  quota  none   default
data  reservationnone   default
data  recordsize 128K   default
data  mountpoint /data  default
data  sharenfs   ro,anon=0  local
data  checksum   on default
data  compressionofflocal
data  atime  offlocal
data  deviceson default
data  exec

Re: [zfs-discuss] ZFS read/write fairness algorithm for single pool

2011-02-13 Thread Nathan Kroenert


Hi Steve,

Thanks for the thoughts - I think that everything you asked about is in 
the original email - but for reference again, it's 151a (s11 express).


Are you really suggesting, for a single user system I need 16GB of 
memory, just to get ZFS to be able to write when it's reading? (and even 
them, that would be contingent on you getting repeat, cached hits on the 
ARC). That's hardly sensible, and anything but enterprise. I know I'm 
only talking about my little baby box at the moment, but extend that to 
a large database application, and I'm seeing badness all round.


Worse - If I'm reading a 45GB contiguous file (say, HD video), the only 
way an ARC will help me is if I have 64GB, and have read it in the 
past... especially if I'm reading it sequentially. That's 
inconceivable!! (cue reference to the Princess Bride :). I'd also ad 
that for the most part, 8GB is plenty for ZFS, and there are a lot of 
Sun/Oracle customers using it now in LDOM environments where 8GB is just 
great in the control/IO domain.


I don't think trying to blame the system in this case is the right 
answer. ZFS schedules the read/write activities, and to me it seems that 
it's just not doing that.


I was suspicious of the impact the HP Raid controller is having - and 
how it might be reacting to what's being pushed at it, so re-created 
exactly this problem again on a different system with native non-cached 
SATA controllers. Issue is identical. (Though I have since determined 
that my HP raid controller is actually *slowing* my reads and writes to 
disk! ;)


Cheers!

Nathan.




On 14/02/2011 4:08 AM, gon...@comcast.net wrote:

Hi Nathan,

Maybe  it is buried somewhere in your email, but I did not see what 
zfs version you are using.


This is rather important, because the  145+ kernels work a lot better 
in many ways than the

early ones ( say 134-ish).

So whenever you are reporting various ZFS issues, something like 
`uname -a` to report the kernel rev

is most useful.

Writes starved by reads has been a complaint in early ZFS, I certainy 
do not see

any evidence of this in the 145+ kernels.

There is a fair amount of tuning and configuration that can be done
(adding ssd-s to your pool, zil vs no zil, how cacheing is configured, 
ie what to cache..)

8 Gig is not a lot of memory for ZFS, I would recommend double of that.

If all goes well, most reads would be statisfied from ARC, and not 
interfere with writes.



Steve


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS read/write fairness algorithm for single pool

2011-02-13 Thread Nathan Kroenert


On 14/02/2011 4:31 AM, Richard Elling wrote:

On Feb 13, 2011, at 12:56 AM, Nathan Kroenertnat...@tuneunix.com  wrote:


Hi all,

Exec summary: I have a situation where I'm seeing lots of large reads starving 
writes from being able to get through to disk.

snip

What is the average service time of each disk? Multiply that by the average
active queue depth. If that number is greater than, say, 100ms, then the ZFS
I/O scheduler is not able to be very effective because the disks are too slow.
Reducing the active queue depth can help, see zfs_vdev_max_pending in the
ZFS Evil Tuning Guide. Faster disks helps, too.

NexentaStor fans, note that you can do this easily, on the fly, via the Settings 
-
Preferences -  System web GUI.
   -- richard



Hi Richard,

Long time no speak! Anyhoo - See below.

I'm unconvinced that faster disks would help. I think faster disks, at 
least in what I'm observing, would make it suck just as bad, just 
reading faster... ;) Maybe I'm missing something.


Queue depth is around 10 (default and unchanged since install), and 
average service time is about 25ms... Below are 1 second samples with 
iostat - while I have included only about 10 seconds, it's 
representative of what I'm seeing all the time.

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 360.9   13.0 46190.5  351.4  0.0 10.0   26.7   1 100
sd7 342.9   12.0 43887.3  329.9  0.0 10.0   28.1   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b

sd6 422.10.0 54025.00.0  0.0 10.0   23.6   1 100
sd7 422.10.0 54025.00.0  0.0 10.0   23.6   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 370.0   11.0 47360.4  342.0  0.0 10.0   26.2   1 100
sd7 327.0   16.0 41856.4  632.0  0.0  9.6   28.0   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 388.07.0 49406.4  290.0  0.0  9.8   24.8   1 100
sd7 409.01.0 52350.32.0  0.0  9.5   23.2   1  99

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 423.00.0 54148.60.0  0.0 10.0   23.6   1 100
sd7 413.00.0 52868.50.0  0.0 10.0   24.2   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 400.02.0 51081.22.0  0.0 10.0   24.8   1 100
sd7 384.04.0 49153.24.0  0.0 10.0   25.7   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 401.91.0 51448.98.0  0.0 10.0   24.8   1 100
sd7 424.90.0 54392.40.0  0.0 10.0   23.5   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 215.1  208.1 26751.9 25433.5  0.0  9.3   22.1   1 100
sd7 189.1  216.1 24199.1 26833.9  0.0  8.9   22.1   1  91

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 295.0  162.0 37756.8 20610.2  0.0 10.0   21.8   1 100
sd7 307.0  150.0 39292.6 19198.4  0.0 10.0   21.8   1 100

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd6 405.02.0 51843.86.0  0.0 10.0   24.5   1 100
sd7 408.03.0 52227.8   10.0  0.0 10.0   24.3   1 100

Bottom line is that ZFS does not seem to be caring about getting my 
writes to disk when there is a heavy read workload.


I have also confirmed that it's not the RAID controller either - 
behaviour is identical with direct attach SATA.


But - to your excellent theory: Setting zfs_vdev_max_pending to 1 causes 
things to swing dramatically!
 - At 1, writes proceed much more than reads - 20mb/s read per 
spindle:35mb/s write per spindle

 - At 2, writes still outstrip reads - 15mb/s read per spindle:44mb/s
 - At 3, it's starting to lean more heavily to reads again, but writes 
at least get a whack - 35mb/s per spindle read:15-20mb/s write.

 - At 4, we are closer to 35-40mb/s read, 15mb/s write

By the time we get back to the default of 0xa, writes drop off almost 
completely.


The crossover (on the box with no RAID controller) seems to be 5. 
Anything more than that, and writes get shouldered out the way almost 
completely.


So - aside from the obvious - manually setting zfs_vdev_max_pending - do 
you have any thoughts on ZFS being able to make this sort of 
determination by itself? It would be somewhat of a shame to bust out 
such 'whacky knobs' for plain old direct attach SATA disks to get balance...


Also - can I set this property per-vdev? (just in case I have sata and, 
say, a USP-V connected to the same box)?


Thanks again, and good to see you are still playing close by!

Cheers!

Nathan.


pci bus 0x0002 cardnum 0x00 function 0x00: vendor

Re: [zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Nathan Kroenert


Hey, Dennis -

I can't help but wonder if the failure is a result of zfs itself finding 
some problems post restart...


Is there anything in your FMA logs?

  fmstat

for a summary and

  fmdump

for a summary of the related errors

eg:
drteeth:/tmp # fmdump
TIME UUID SUNW-MSG-ID
Nov 03 13:57:29.4190 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 ZFS-8000-D3
Nov 03 13:57:29.9921 916ce3e2-0c5c-e335-d317-ba1e8a93742e ZFS-8000-D3
Nov 03 14:04:58.8973 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d ZFS-8000-CS
Mar 05 18:04:40.7116 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d FMD-8000-4M 
Repaired
Mar 05 18:04:40.7875 ff2f60f8-2906-676a-bfb7-ccbd9c7f957d FMD-8000-6U 
Resolved
Mar 05 18:04:41.0052 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 FMD-8000-4M 
Repaired
Mar 05 18:04:41.0760 e28210d7-b7aa-42e0-a3e8-9ba21332d1c7 FMD-8000-6U 
Resolved


then for example,

  fmdump -vu e28210d7-b7aa-42e0-a3e8-9ba21332d1c7

and

  fmdump -Vvu e28210d7-b7aa-42e0-a3e8-9ba21332d1c7

will show more and more information about the error. Note that some of 
it might seem like rubbish. The important bits should be obvious though 
- things like the SUNW error message is (like ZFS-8000-D3), which can be 
pumped into


  sun.com/msg

to see what exactly it's going on about.

Note also that there should also be something interesting in the 
/var/adm/messages log to match and 'faulted' devices.


You might also find an

  fmdump -e

and

  fmdump -eV

to be interesting - This is the *error* log as opposed to the *fault* 
log. (Every 'thing that goes wrong' is an error, only those that are 
diagnosed are considered a fault.)


Note that in all of these fm[dump|stat] commands, you are really only 
looking at the two sets of data. The errors - that is the telemetry 
incoming to FMA - and the faults. If you include a -e, you view the 
errors, otherwise, you are looking at the faults.


By the way - sun.com/msg has a great PDF on it about the predictive self 
healing technologies in Solaris 10 and will offer more interesting 
information.


Would be interesting to see *why* ZFS / FMA is feeling the need to fault 
your devices.


I was interested to see on one of my boxes that I have actually had a 
*lot* of errors, which I'm now going to have to investigate... Looks 
like I have a dud rocket in my system... :)


Oh - And I saw this:

Nov 03 14:04:31.2783 ereport.fs.zfs.checksum

Score one more for ZFS! This box has a measly 300GB mirrored, and I have 
already seen dud data. (heh... It's also got non-ecc memory... ;)


Cheers!

Nathan.


Dennis Clarke wrote:

On Tue, 24 Mar 2009, Dennis Clarke wrote:

You would think so eh?
But a transient problem that only occurs after a power failure?

Transient problems are most common after a power failure or during
initialization.


Well the issue here is that power was on for ten minutes before I tried
to do a boot from the ok pronpt.

Regardless, the point is that the ZPool shows no faults at boot time and
then shows phantom faults *after* I go to init 3.

That does seem odd.

Dennsi


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Nathan Kroenert


definitely time to bust out some mdb -k and see what it's moaning about.

I did not see the screenshot earlier... sorry about that.

Nathan.

Blake wrote:

I start the cp, and then, with prstat -a, watch the cpu load for the
cp process climb to 25% on a 4-core machine.

Load, measured for example with 'uptime', climbs steadily until the reboot.

Note that the machine does not dump properly, panic or hang - rather,
it reboots.

I attached a screenshot earlier in this thread of the little bit of
error message I could see on the console.  The machine is trying to
dump to the dump zvol, but fails to do so.  Only sometimes do I see an
error on the machine's local console - mos times, it simply reboots.



On Thu, Mar 12, 2009 at 1:55 AM, Nathan Kroenert
nathan.kroen...@sun.com wrote:

Hm -

Crashes, or hangs? Moreover - how do you know a CPU is pegged?

Seems like we could do a little more discovery on what the actual problem
here is, as I can read it about 4 different ways.

By this last piece of information, I'm guessing the system does not crash,
but goes really really slow??

Crash == panic == we see stack dump on console and try to take a dump
hang == nothing works == no response - might be worth looking at mdb -K
   or booting with a -k on the boot line.

So - are we crashing, hanging, or something different?

It might simply be that you are eating up all your memory, and your physical
backing storage is taking a while to catch up?

Nathan.

Blake wrote:

My dump device is already on a different controller - the motherboards
built-in nVidia SATA controller.

The raidz2 vdev is the one I'm having trouble with (copying the same
files to the mirrored rpool on the nVidia controller work nicely).  I
do notice that, when using cp to copy the files to the raidz2 pool,
load on the machine climbs steadily until the crash, and one proc core
pegs at 100%.

Frustrating, yes.

On Thu, Mar 12, 2009 at 12:31 AM, Maidak Alexander J
maidakalexand...@johndeere.com wrote:

If you're having issues with a disk contoller or disk IO driver its
highly likely that a savecore to disk after the panic will fail.  I'm not
sure how to work around this, maybe a dedicated dump device not on a
controller that uses a different driver then the one that you're having
issues with?

-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Blake
Sent: Wednesday, March 11, 2009 4:45 PM
To: Richard Elling
Cc: Marc Bevand; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] reboot when copying large amounts of data

I guess I didn't make it clear that I had already tried using savecore to
retrieve the core from the dump device.

I added a larger zvol for dump, to make sure that I wasn't running out of
space on the dump device:

r...@host:~# dumpadm
Dump content: kernel pages
 Dump device: /dev/zvol/dsk/rpool/bigdump (dedicated) Savecore
directory: /var/crash/host
 Savecore enabled: yes

I was using the -L option only to try to get some idea of why the system
load was climbing to 1 during a simple file copy.



On Wed, Mar 11, 2009 at 4:58 PM, Richard Elling
richard.ell...@gmail.com wrote:

Blake wrote:

I'm attaching a screenshot of the console just before reboot.  The
dump doesn't seem to be working, or savecore isn't working.

On Wed, Mar 11, 2009 at 11:33 AM, Blake blake.ir...@gmail.com wrote:


I'm working on testing this some more by doing a savecore -L right
after I start the copy.



savecore -L is not what you want.

By default, for OpenSolaris, savecore on boot is disabled.  But the
core will have been dumped into the dump slice, which is not used for
swap.
So you should be able to run savecore at a later time to collect the
core from the last dump.
-- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//



--
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia

Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Nathan Kroenert

definitely time to bust out some mdb -K or boot -k and see what it's 
moaning about.


I did not see the screenshot earlier... sorry about that.

Nathan.

Blake wrote:

I start the cp, and then, with prstat -a, watch the cpu load for the
cp process climb to 25% on a 4-core machine.

Load, measured for example with 'uptime', climbs steadily until the reboot.

Note that the machine does not dump properly, panic or hang - rather,
it reboots.

I attached a screenshot earlier in this thread of the little bit of
error message I could see on the console.  The machine is trying to
dump to the dump zvol, but fails to do so.  Only sometimes do I see an
error on the machine's local console - mos times, it simply reboots.



On Thu, Mar 12, 2009 at 1:55 AM, Nathan Kroenert
nathan.kroen...@sun.com wrote:

Hm -

Crashes, or hangs? Moreover - how do you know a CPU is pegged?

Seems like we could do a little more discovery on what the actual problem
here is, as I can read it about 4 different ways.

By this last piece of information, I'm guessing the system does not crash,
but goes really really slow??

Crash == panic == we see stack dump on console and try to take a dump
hang == nothing works == no response - might be worth looking at mdb -K
   or booting with a -k on the boot line.

So - are we crashing, hanging, or something different?

It might simply be that you are eating up all your memory, and your physical
backing storage is taking a while to catch up?

Nathan.

Blake wrote:

My dump device is already on a different controller - the motherboards
built-in nVidia SATA controller.

The raidz2 vdev is the one I'm having trouble with (copying the same
files to the mirrored rpool on the nVidia controller work nicely).  I
do notice that, when using cp to copy the files to the raidz2 pool,
load on the machine climbs steadily until the crash, and one proc core
pegs at 100%.

Frustrating, yes.

On Thu, Mar 12, 2009 at 12:31 AM, Maidak Alexander J
maidakalexand...@johndeere.com wrote:

If you're having issues with a disk contoller or disk IO driver its
highly likely that a savecore to disk after the panic will fail.  I'm not
sure how to work around this, maybe a dedicated dump device not on a
controller that uses a different driver then the one that you're having
issues with?

-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Blake
Sent: Wednesday, March 11, 2009 4:45 PM
To: Richard Elling
Cc: Marc Bevand; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] reboot when copying large amounts of data

I guess I didn't make it clear that I had already tried using savecore to
retrieve the core from the dump device.

I added a larger zvol for dump, to make sure that I wasn't running out of
space on the dump device:

r...@host:~# dumpadm
Dump content: kernel pages
 Dump device: /dev/zvol/dsk/rpool/bigdump (dedicated) Savecore
directory: /var/crash/host
 Savecore enabled: yes

I was using the -L option only to try to get some idea of why the system
load was climbing to 1 during a simple file copy.



On Wed, Mar 11, 2009 at 4:58 PM, Richard Elling
richard.ell...@gmail.com wrote:

Blake wrote:

I'm attaching a screenshot of the console just before reboot.  The
dump doesn't seem to be working, or savecore isn't working.

On Wed, Mar 11, 2009 at 11:33 AM, Blake blake.ir...@gmail.com wrote:


I'm working on testing this some more by doing a savecore -L right
after I start the copy.



savecore -L is not what you want.

By default, for OpenSolaris, savecore on boot is disabled.  But the
core will have been dumped into the dump slice, which is not used for
swap.
So you should be able to run savecore at a later time to collect the
core from the last dump.
-- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//



--
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia

Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Nathan Kroenert

For what it's worth, I have been running Nevada (so, same kernel as 
opensolaris) for ages (at least 18 months) on a Gigabyte board with the 
MCP55 chipset and it's been flawless.


I liked it so much, I bought it's newer brother, based on the nvidia 
750SLI chipset...   M750SLI-DS4


Cheers!

Nathan.


On 13/03/09 09:21 AM, Dave wrote:



Tim wrote:



On Thu, Mar 12, 2009 at 2:22 PM, Blake blake.ir...@gmail.com 
mailto:blake.ir...@gmail.com wrote:


I've managed to get the data transfer to work by rearranging my disks
so that all of them sit on the integrated SATA controller.

So, I feel pretty certain that this is either an issue with the
Supermicro aoc-sat2-mv8 card, or with PCI-X on the motherboard 
(though

I would think that the integrated SATA would also be using the PCI
bus?).

The motherboard, for those interested, is an HD8ME-2 (not, I now find
after buying this box from Silicon Mechanics, a board that's on the
Solaris HCL...)


http://www.supermicro.com/Aplus/motherboard/Opteron2000/MCP55/h8dme-2.cfm 



So I'm not considering one of LSI's HBA's - what do list members 
think

about this device:

http://www.provantage.com/lsi-logic-lsi00117~7LSIG03X.htm
http://www.provantage.com/lsi-logic-lsi00117%7E7LSIG03X.htm



I believe the MCP55's SATA controllers are actually PCI-E based.


I use Tyan 2927 motherboards. They have on-board nVidia MCP55 chipsets, 
which is the same chipset at the X4500 (IIRC). I wouldn't trust the 
MCP55 chipset in OpenSolaris. I had random disk hangs even while the 
machine was mostly idle.


In Feb 2008 I bought AOC-SAT2-MV8 cards and moved all my drives to these 
add-in cards. I haven't had any issues with drive hanging since. There 
does not seem to be any problems with the SAT2-MV8 under heavy load in 
my servers from what I've seen.


When the SuperMicro AOC-USAS-L8i came out later last year, I started 
using them instead. They work better than the SAT2-MV8s.


This card needs a 3U or bigger case:
http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm

This is the low profile card that will fit in a 2U:
http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm

They both work in normal PCI-E slots on my Tyan 2927 mobos.

Finding good non-Sun hardware that works very well under OpenSolaris is 
frustrating to say the least. Good luck.


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--


//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Senior Systems Engineer  Phone:  +61 3 9869 6255 //
// Global Systems Engineering   Fax:+61 3 9869 6288 //
// Level 7, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-11 Thread Nathan Kroenert


Hm -

Crashes, or hangs? Moreover - how do you know a CPU is pegged?

Seems like we could do a little more discovery on what the actual 
problem here is, as I can read it about 4 different ways.


By this last piece of information, I'm guessing the system does not 
crash, but goes really really slow??


Crash == panic == we see stack dump on console and try to take a dump
hang == nothing works == no response - might be worth looking at mdb -K
or booting with a -k on the boot line.

So - are we crashing, hanging, or something different?

It might simply be that you are eating up all your memory, and your 
physical backing storage is taking a while to catch up?


Nathan.

Blake wrote:

My dump device is already on a different controller - the motherboards
built-in nVidia SATA controller.

The raidz2 vdev is the one I'm having trouble with (copying the same
files to the mirrored rpool on the nVidia controller work nicely).  I
do notice that, when using cp to copy the files to the raidz2 pool,
load on the machine climbs steadily until the crash, and one proc core
pegs at 100%.

Frustrating, yes.

On Thu, Mar 12, 2009 at 12:31 AM, Maidak Alexander J
maidakalexand...@johndeere.com wrote:

If you're having issues with a disk contoller or disk IO driver its highly 
likely that a savecore to disk after the panic will fail.  I'm not sure how to 
work around this, maybe a dedicated dump device not on a controller that uses a 
different driver then the one that you're having issues with?

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Blake
Sent: Wednesday, March 11, 2009 4:45 PM
To: Richard Elling
Cc: Marc Bevand; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] reboot when copying large amounts of data

I guess I didn't make it clear that I had already tried using savecore to 
retrieve the core from the dump device.

I added a larger zvol for dump, to make sure that I wasn't running out of space 
on the dump device:

r...@host:~# dumpadm
 Dump content: kernel pages
  Dump device: /dev/zvol/dsk/rpool/bigdump (dedicated) Savecore directory: 
/var/crash/host
 Savecore enabled: yes

I was using the -L option only to try to get some idea of why the system load 
was climbing to 1 during a simple file copy.



On Wed, Mar 11, 2009 at 4:58 PM, Richard Elling richard.ell...@gmail.com 
wrote:

Blake wrote:

I'm attaching a screenshot of the console just before reboot.  The
dump doesn't seem to be working, or savecore isn't working.

On Wed, Mar 11, 2009 at 11:33 AM, Blake blake.ir...@gmail.com wrote:


I'm working on testing this some more by doing a savecore -L right
after I start the copy.



savecore -L is not what you want.

By default, for OpenSolaris, savecore on boot is disabled.  But the
core will have been dumped into the dump slice, which is not used for swap.
So you should be able to run savecore at a later time to collect the
core from the last dump.
-- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] schedulers [was: zfs related google summer of code ideas - your vote]

2009-03-04 Thread Nathan Kroenert


Hm - a ZilArc??

Or, slarc?

Or L2ArZi

I'm tried something sort of similar to this when fooling around, adding 
different *slices* for ZIL / L2ARC but as I'm too poor to afford good 
SSD's my resolut was poor at beat... ;)


Having ZFS manage some 'arbitrary fast stuff' and sorting out it's own 
ZIL and L2ARC would be interesting, though, given the propensity for 
SSD's to be either fast read or fast write at the moment, you may well 
require some whacky knobs to get it to do what you actually want it to...


hm.

Nathan.

Bill Sommerfeld wrote:

On Wed, 2009-03-04 at 12:49 -0800, Richard Elling wrote:

But I'm curious as to why you would want to put both the slog and
L2ARC on the same SSD?


Reducing part count in a small system.

For instance: adding L2ARC+slog to a laptop.  I might only have one slot
free to allocate to ssd. 


IMHO the right administrative interface for this is for zpool to allow
you to add the same device to a pool as both cache and ssd, and let zfs
figure out how to not step on itself when allocating blocks.

- Bill

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
///
// Nathan Kroenert  nathan.kroen...@sun.com  //
// Senior Systems Engineer  Phone:+61 3 9869 6255//
// Global Systems Engineering   Fax:+61 3 9869 6288  //
// Level 7, 476 St. Kilda Road   //
// Melbourne 3004   VictoriaAustralia//
///



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] destroy means destroy, right?

2009-01-29 Thread Nathan Kroenert

For years, we resisted stopping rm -r / because people should know 
better, until *finally* someone said - you know what - that's just dumb.

Then, just like that, it was fixed.

Yes - This is Unix.

Yes - Provide the gun and allow the user to point it.

Just don't let it go off in their groin or when pointed at their foot, 
or provide at least some protection when they do.

Having even limited amount of restore capability will provide the user 
with steel capped boots and a codpiece. It won't protect them from 
herpes or fungus but it might deflect the bullet.

On 01/30/09 08:19, Jacob Ritorto wrote:
 I like that, although it's a bit of an intelligence insulter.  Reminds
 me of the old pdp11 install (
 http://charles.the-haleys.org/papers/setting_up_unix_V7.pdf ) --
 
 This step makes an empty file system.
 6.The next thing to do is to restore the data onto the new empty
 file system. To do this you respond
   to the ':' printed in the last step with
 (bring in the program restor)
 : tm(0,4)  ('ht(0,4)' for TU16/TE16)
 tape? tm(0,5)  (use 'ht(0,5)' for TU16/TE16)
 disk? rp(0,0)(use 'hp(0,0)' for RP04/5/6)
 Last chance before scribbling on disk. (you type return)
 (the tape moves, perhaps 5-10 minutes pass)
 end of tape
 Boot
 :
   You now have a UNIX root file system.
 
 
 
 
 On Thu, Jan 29, 2009 at 3:42 PM, Orvar Korvar
 knatte_fnatte_tja...@yahoo.com wrote:
 Maybe add a timer or something? When doing a destroy, ZFS will keep 
 everything for 1 minute or so, before overwriting. This way the disk won't 
 get as fragmented. And if you had fat fingers and typed wrong, you have up 
 to one minute to undo. That will catch 80% of the mistakes?
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 


//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Senior Systems Engineer  Phone:  +61 3 9869 6255 //
// Global Systems Engineering   Fax:+61 3 9869 6288 //
// Level 7, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New RAM disk from ACARD might be interesting

2009-01-29 Thread Nathan Kroenert

As it presents as standard SATA, there should be no reason for this not
to work...

It has battery backup, and CF for backup / restore from DDR2 in the
event of power loss... Pretty cool. (Would have preferred a super-cap,
but oh, well... ;)

Should make an excellent ZIL *and* L2ARC style device...

Seems a little pricey for what it is though.

It's going onto my list of what I'd buy if I had the money... ;)

Nathan.

On 01/30/09 12:10, Janåke Rönnblom wrote:
ACARD have launched a new RAM disk which can take up to 64 GB of ECC RAM
while still looking like a standard SATA drive. If anyone remember the
Gigabyte I-RAM this might be a new development in this area.

Its called ACARD ANS-9010 and up...

http://www.acard.com.tw/english/fb01-product.jsp?idno_no=270prod_no=ANS-9010type1_title=%20Solid%20State%20Drivetype1_idno=13

This might be interesting to use as a cheap log instead of SSD cards... This
test compares it with both Intel SSD (consumer and pro):

http://www.techreport.com/articles.x/16255/1

However the test is more from a homeuser point of view...

Anyone got the money and time to test it ;)

-J

//
// Nathan Kroenert nathan.kroen...@sun.com //
// Senior Systems Engineer Phone: +61 3 9869 6255 //
// Global Systems Engineering Fax:+61 3 9869 6288 //
// Level 7, 476 St. Kilda Road //
// Melbourne 3004 VictoriaAustralia //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New RAM disk from ACARD might be interesting

2009-01-29 Thread Nathan Kroenert

You could be the first...

Man up! ;)

Nathan.

Will Murnane wrote:
 On Thu, Jan 29, 2009 at 21:11, Nathan Kroenert nathan.kroen...@sun.com 
 wrote:
 Seems a little pricey for what it is though.
 For what it's worth, there's also a 9010B model that has only one sata
 port and room for six dimms instead of eight at $250 instead of $400.
 That might fit in your budget a little easier...  I'm considering one
 for a log device.  I wish someone else could test it first and report
 problems, but someone's gotta take the jump first.
 
 It looks like this device (the 9010, that is) is also being marketed
 as the HyperDrive V at the same price point.
 
 Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] destroy means destroy, right?

2009-01-28 Thread Nathan Kroenert

I'm no authority, but I believe it's gone.

Some of the others on the list might have some funky thoughts, but I 
would suggest that if you have already done any other I/O's to the disk 
that you have likely rolled past the point of no return.

Anyone else care to comment?

As a side note, I had a look for anything that looked like a CR for zfs 
destroy / undestroy and could not find one.

Anyone interested in me submitting an RFE to have something like a

zfs undestroy pool/fs

capability?

Clearly, there would be limitations in how long you would have to get 
the command to work, but it would have it's merits...

Cheers!

Nathan.

Jacob Ritorto wrote:
 Hi,
 I just said zfs destroy pool/fs, but meant to say zfs destroy
 pool/junk.  Is 'fs' really gone?
 
 thx
 jake
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Is Disabling ARC on SolarisU4 possible?

2009-01-28 Thread Nathan Kroenert

Also - My experience with a very small ARC is that your performance will 
stink. ZFS is an advanced filesystem that IMO makes some assumptions 
about capability and capacity of current hardware. If you don't give 
what it's expecting, your results may be equally unexpected.

If you are keen to test the *actual* disk performance, you should just 
use the underlying disk device like /dev/rdsk/c0t0d0s0

Beware, however, that any writes to these devices will indeed result in 
the loss of the data on those devices, zpools or other.

Cheers.

Nathan.

Richard Elling wrote:
 Rob Brown wrote:
 Afternoon,

 In order to test my storage I want to stop the cacheing effect of the 
 ARC on a ZFS filesystem. I can do similar on UFS by mounting it with 
 the directio flag.
 
 No, not really the same concept, which is why Roch wrote
 http://blogs.sun.com/roch/entry/zfs_and_directio
 
 I saw the following two options on a nevada box which presumably 
 control it:

 primarycache
 secondarycache
 
 Yes, to some degree this offers some capability. But I don't believe
 they are in any release of Solaris 10.
 -- richard
 
 But I’m running Solaris 10U4 which doesn’t have them -can I disable it?

 Many thanks

 Rob




 *|* *Robert Brown - **ioko *Professional Services *|
 | **Mobile:* +44 (0)7769 711 885 *|
 *
 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] destroy means destroy, right?

2009-01-28 Thread Nathan Kroenert

He's not trying to recover a pool - Just a filesystem...

:)

bdebel...@intelesyscorp.com wrote:
 Recovering Destroyed ZFS Storage Pools.
 You can use the zpool import -D command to recover a storage pool that has 
 been destroyed.
 http://docs.sun.com/app/docs/doc/819-5461/gcfhw?a=view

-- 
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] cifs perfomance

2009-01-22 Thread Nathan Kroenert

Are you able to qualify that a little?

I'm using a realtek interface with OpenSolaris and am yet to experience 
any issues.

Nathan.

Brandon High wrote:
 On Wed, Jan 21, 2009 at 5:40 PM, Bob Friesenhahn
 bfrie...@simple.dallas.tx.us wrote:
 Several people reported this same problem.  They changed their
 ethernet adaptor to an Intel ethernet interface and the performance
 problem went away.  It was not ZFS's fault.
 
 It may not be a ZFS problem, but it is a OpenSolaris problem. The
 drivers for hardware Realtek and other NICs are ... not so great.
 
 -B
 

-- 
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] cifs perfomance

2009-01-22 Thread Nathan Kroenert

Interesting. I'll have a poke...

Thanks!

Nathan.

Brandon High wrote:
 On Thu, Jan 22, 2009 at 1:29 PM, Nathan Kroenert
 nathan.kroen...@sun.com wrote:
 Are you able to qualify that a little?

 I'm using a realtek interface with OpenSolaris and am yet to experience any
 issues.
 
 There's a lot of anecdotal evidence that replacing the rge driver with
 the gani driver can fix poor NFS and CIFS performance. Another option
 is to use an Intel NIC in place of the Realtek.
 
 Search the archives for gani or slow CIFS and you'll find several
 people who resolved poor performance by getting rid of the rge driver.
 
 While it's not hard evidence, it seems to indicate that there are
 problems with the driver (and most likely the hardware).
 
 -B
 

-- 
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] hot spare not so hot ??

2009-01-20 Thread Nathan Kroenert

An interesting interpretation of using hot spares.

Could it be that the hot-spare code only fires if the disk goes down 
whilst the pool is active?

hm.

Nathan.

Scot Ballard wrote:
 I have configured a test system with a mirrored rpool and one hot spare. 
  I powered the systems off, pulled one of the disks from rpool to 
 simulate a hardware failure. 
 
 The hot spare is not activating automatically.  Is there something more 
 i should have done to make this work ? 
 
 
   pool: rpool
  state: DEGRADED
 status: One or more devices could not be opened.  Sufficient replicas 
 exist for
 the pool to continue functioning in a degraded state.
 action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 rpool   DEGRADED 0 0 0
   mirrorDEGRADED 0 0 0
 c0d0s0  ONLINE   0 0 0
 c0d1s0  UNAVAIL  0 0 0  cannot open
 spares
   c1d1s0AVAIL   
 
 errors: No known data errors
 
 
 
 Thanks
 
 
   -Scot
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS tale of woe and fail

2009-01-18 Thread Nathan Kroenert

Hey, Tom -

Correct me if I'm wrong here, but it seems you are not allowing ZFS any 
sort of redundancy to manage.

I'm not sure how you can class it a ZFS fail when the Disk subsystem has 
failed...

Or - did I miss something? :)

Nathan.

Tom Bird wrote:
 Morning,
 
 For those of you who remember last time, this is a different Solaris,
 different disk box and different host, but the epic nature of the fail
 is similar.
 
 The RAID box that is the 63T LUN has a hardware fault and has been
 crashing, up to now the box and host got restarted and both came up
 fine.  However, just now as I have got replacement hardware in position
 and was ready to start copying, it went bang and my data has all gone.
 
 Ideas?
 
 
 r...@cs4:~# zpool list
 NAME  SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 content  62.5T  59.9T  2.63T95%  ONLINE  -
 
 r...@cs4:~# zpool status -v
   pool: content
  state: ONLINE
 status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
 action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 content ONLINE   0 032
   c2t8d0ONLINE   0 032
 
 errors: Permanent errors have been detected in the following files:
 
 content:0x0
 content:0x2c898
 
 r...@cs4:~# find /content
 /content
 r...@cs4:~# (yes that really is it)
 
 r...@cs4:~# uname -a
 SunOS cs4.kw 5.11 snv_99 sun4v sparc SUNW,Sun-Fire-T200
 
 from format:
2. c2t8d0 IFT-S12S-G1033-363H-62.76TB
   /p...@7c0/p...@0/p...@8/LSILogic,s...@0/s...@8,0
 
 Also, content does not show in df output.
 
 thanks

-- 
///
// Nathan Kroenert  nathan.kroen...@sun.com  //
// Senior Systems Engineer  Phone:+61 3 9869 6255//
// Global Systems Engineering   Fax:+61 3 9869 6288  //
// Level 7, 476 St. Kilda Road   //
// Melbourne 3004   VictoriaAustralia//
///



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Odd network performance with ZFS/CIFS

2009-01-13 Thread Nathan Kroenert

2C from Oz:

Windows (at least XP - I have thus far been lucky enough to avoid 
running vista on metal) has packet schedulers, quality of service 
settings and other crap that can severely impact windows performance on 
the network.

I have found that setting the following made a difference to me:
  - Disable Jumbo Frames (as I have only a very cheap crappy gig-switch 
and if I try to drive it hard with jumbo's enabled, it falls in a heap)
  - Lose the 'deterministic network enhancer' under windows
  - Lose the QoS packet scheduler
  - Check the interface properties and go looking for something that 
sounds like 'optimize for CPU / optimize for speed' and set it to speed

  - Depending on workload and packet sizes, it might also be worth 
looking at disabling nagle algorithm on the Solaris box.
See http://www.sun.com/servers/coolthreads/tnb/lighttpd.jsp for a quick 
explanation...

It would be interesting to see if you see the same issues using a 
Solaris or other OS client.

Hope this helps somewhat. Let us know how it goes.

Nathan.

fredrick phol wrote:
 I'm currently experiencing exactly the same problem and it's been driving me 
 nuts. Tried open soalris and am currently running the latest version of SXCE 
 both with exactly the same results.
 
 This issue occurs with both CIFS which shows the speed degrade and ISCSI 
 which just starts off at the lowest speed but exhibits the same peaks and 
 troughs
 
 I have 4x500GB drives in RAIDz1 config on an AMD 780G mobo.
 
 speed tests using DD have shown read rates of ~140MB/s and write rates of 
 `120MB/s (humourously slightly faster than one of my friends arrays on linux 
 and intel hardware) 
 
 Currently the transfer will sit at about 18% gige network utilisation for 10 
 seconds then dip to 0 and come straight back up to 18% this happens at 
 regular predictable intervals, there is no randomness. I've tried two 
 different switches, one a consumer grade switch from linksys and one a low 
 end distribution switch from 3com both exhibit exactly the same behaviour.
 
 The only computer accessing the solaris box is w windows vista 64 sp1 machine.
 
 Currently I'm guessing that the transfer issues have somethign to do with the 
 onboard realtek network card in the solaris box. Possibly a driver issue? 
 I've got a dual port intel server nic on order to replace it and test with.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can the new consumer NAS devices run OpenSolaris?

2009-01-12 Thread Nathan Kroenert

Meh -

I doubt you hurt anyone. Most people have kill files for that sort of 
stuff. heh. ;)

On the 'which if these should work' sort of question, if you do happen 
to try any of those systems, and they work, remember to submit the 
details to the HCL. :)

I'm keen to give it a whack on a small box myself, but have not had the 
time or the funds. The Atom stuff should work pretty well, and even with 
  2GB of memory, if it's just acting as a NAS server, it should have 
plenty of poke. (assuming you are only using it for NAS... ;)

Oh - and assuming you don't enable stuff like gzip-9 compression, which 
might, on the slower Atom style chips, get in the way.

Looking forward to any reports.

Nathan.

On 13/01/09 01:47 PM, JZ wrote:
 ok, was I too harsh on the list?
 sorry folks, as I said, I have the biggest ego.
 
 no one can hurt that by trying to fight me, but yes, it can be hurt if I 
 have to hurt the friends I love in protecting my ego or my other friends' 
 ego.
 
 but no one can get hurt if we don't claim what we have or what we know is 
 the best of all.
 
 a contribution to help the problem today can be better than 100% 
 strategically correct in the long run.
 
 we use what we have today, but if that usage will impact the life or death 
 of a promising technology branch, as a living thing, maybe we don't want to 
 use the best of today.
 
 everyone has their own need and want, and there is no better/worse, 
 right/wrong in the choice of technology.
 
 but some technologies can work together in a constructive fashion, and some 
 in a destructive fashion.
 
 please, be constructive.
 and you will hear much less from me.
 
 best,
 z 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 


//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Senior Systems Engineer  Phone:  +61 3 9869 6255 //
// Global Systems Engineering   Fax:+61 3 9869 6288 //
// Level 7, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Is a manual zfs scrub neccessary?

2008-11-09 Thread Nathan Kroenert

The big win for me in doing a periodic scrub is that in normal 
operation, ZFS only checks data as it's read back form the disks.

If you don't periodically scrub, errors that happen over time won't be 
caught until I next read that actual data, which might be inconvenient 
if it's a long time since the initial data was written.

As I have a lot of data that is pretty much only read once or twice 
after it's originally written, I could have stuff going bad over time 
that I don't know about.

Scrubbing makes sure there is a limit on the amount of time between each 
'surprise!'.

:)

I scrub once every month or so, depending on the system.

So, in direct answer to your question, No - You don't *need* to scrub. 
But - It's better if you do. ;)

My 2c.

Nathan.

On 10/11/08 11:38 AM, Douglas Walker wrote:
 Hi,
 
 I'm running a 3Tb RAIDZ2 array and was wondering about the zfs scrub 
 function.
 
 This server runs as my backup server and receives an rsync every night.
 
 I was wondering if I _need_ to explicitly run a zfs scrub on my zpool 
 periodically.
 
 There's a lot of info on google about running a scrub but not whether 
 it's actually needed or under what circumstances you might run one  -  
 so I thought I'd ask the list it's opinions on this.
 
 If zfs does a background scrub continually anyways - is there any need 
 to manually run a scrub?
 
 I'd imagine a scrub of a 3Tb array would take quite a while (its 7200rpm 
 SATA disks) and if I ran a scrub this would likely overlap with my 
 nightly rsyncs causing yet more I/O. Wouldn't this stress the disks more?
 
 If it is necessary - how often are people running a manually scrub? Once 
 a week? month?
 
 
 regards
 
 
 D
 

-- 


//
// Nathan Kroenert  [EMAIL PROTECTED]   //
// Senior Systems Engineer  Phone:  +61 3 9869 6255 //
// Global Systems Engineering   Fax:+61 3 9869 6288 //
// Level 7, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] boot -L

2008-11-06 Thread Nathan Kroenert

A quick google shows that it's not so much about the mirror, but the BE...

http://opensolaris.org/os/community/zfs/boot/zfsbootFAQ/

Might help?

Nathan.

On  7/11/08 02:39 PM, Krzys wrote:
 What am I doing wrong? I have sparc V210 and I am having difficulty with boot 
 -L, I was under the impression that boot -L will give me options to which zfs 
 mirror I could boot my root disk?
 
 Anyway but even not that, I am seeing some strange behavior anyway... After 
 trying boot -L I am unabl eto boot my system unless I do reset-all, is that 
 normal? I have Solaris 10 U6 that I just upgraded my box to and I wanted to 
 try 
 all the cool things about zfs root disk mirroring and so on, but so far its 
 quite strange experience with this whole thing...
 
 [22:21:25] @adas: /root  init 0
 [22:21:51] @adas: /root  stopping NetWorker daemons:
   nsr_shutdown -q
 svc.startd: The system is coming down.  Please wait.
 svc.startd: 90 system services are now being stopped.
 svc.startd: The system is down.
 syncing file systems... done
 Program terminated
 {0} ok boot -L
 
 SC Alert: Host System has Reset
 Probing system devices
 Probing memory
 Probing I/O buses
 
 Sun Fire V210, No Keyboard
 Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
 OpenBoot 4.22.33, 4096 MB memory installed, Serial #64938415.
 Ethernet address 0:3:ba:de:e1:af, Host ID: 83dee1af.
 
 
 
 Rebooting with command: boot -L
 Boot device: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0:a  File and args: -L
 
 Can't open bootlst
 
 Evaluating:
 The file just loaded does not appear to be executable.
 {1} ok boot disk0
 Boot device: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0  
 File and args:
 ERROR: /[EMAIL PROTECTED],60: Last Trap: Fast Data Access MMU Miss
 
 {1} ok boot disk1
 Boot device: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0  
 File and args:
 ERROR: /[EMAIL PROTECTED],60: Last Trap: Fast Data Access MMU Miss
 
 {1} ok boot
 ERROR: /[EMAIL PROTECTED],60: Last Trap: Fast Data Access MMU Miss
 
 {1} ok reset-all
 Probing system devices
 Probing memory
 Probing I/O buses
 
 Sun Fire V210, No Keyboard
 Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
 OpenBoot 4.22.33, 4096 MB memory installed, Serial #64938415.
 Ethernet address 0:3:ba:de:e1:af, Host ID: 83dee1af.
 
 
 
 Boot device: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0:a  File and args:
 SunOS Release 5.10 Version Generic_137137-09 64-bit
 Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
 Use is subject to license terms.
 Hardware watchdog enabled
 Hostname: adas
 Reading ZFS config: done.
 Mounting ZFS filesystems: (3/3)
 
 adas console login: Nov  6 22:27:13 squid[361]: Squid Parent: child process 
 363 
 started
 Nov  6 22:27:18 adas ufs: NOTICE: mount: not a UFS magic number (0x0)
 starting NetWorker daemons:
   nsrexecd
 
 console login:
 
 
 Does anyone have any idea why is that happening? what am I doing wrong?
 
 Thanks for help.
 
 Regards,
 
 Chris
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 


//
// Nathan Kroenert  [EMAIL PROTECTED]   //
// Senior Systems Engineer  Phone:  +61 3 9869 6255 //
// Global Systems Engineering   Fax:+61 3 9869 6288 //
// Level 7, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] FYI - proposing storage pm project

2008-11-04 Thread Nathan Kroenert

Not wanting to hijack this thread, but...

I'm a simple man with simple needs. I'd like to be able to manually spin 
down my disks whenever I want to...

Anyone come up with a way to do this? ;)

Nathan.

Jens Elkner wrote:
 On Mon, Nov 03, 2008 at 02:54:10PM -0800, Yuan Chu wrote:
 Hi,
   
   a disk may take seconds or
   even tens of seconds to come on line if it needs to be powered up
   and spin up.
 
 Yes - I really hate this on my U40 and tried to disable PM for HDD[s]
 completely. However, haven't found a way to do this (thought
 /etc/power.conf is the right place, but either it doesn't work as
 explained or is not the right place).
 
 HDD[s] are HITACHI HDS7225S Revision: A9CA
 
 Any hints, how to switch off PM for this HDD?
 
 Regards,
 jel.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] add autocomplete feature for zpool, zfs command

2008-10-10 Thread Nathan Kroenert

Hm -

This caused me to ask the question: Who keeps the capabilities in sync?

Is there a programmatic way we can have bash (or other shells) 
interrogate zpool and zfs to find out what it's capabilities are?

I'm thinking something like having bash spawn a zfs command to see what 
options are available in that current zfs / zpool version...

That way, you would never need to do anything to bash/zfs once it was 
done the first time... do it once, and as ZFS changes, the prompts 
change automatically...

Or - is this old hat, and how we do it already? :)

Nathan.

On 10/10/08 05:06 PM, Boyd Adamson wrote:
 Alex Peng [EMAIL PROTECTED] writes:
 Is it fun to have autocomplete in zpool or zfs command?

 For instance -

 zfs cr 'Tab key'  will become zfs create
 zfs clone 'Tab key'  will show me the available snapshots
 zfs set 'Tab key'  will show me the available properties, then zfs 
 set com 'Tab key' will become zfs set compression=,  another 'Tab key' 
 here would show me on/off/lzjb/gzip/gzip-[1-9]
 ..


 Looks like a good RFE.
 
 This would be entirely under the control of your shell. The zfs and
 zpool commands have no control until after you press enter on the
 command line.
 
 Both bash and zsh have programmable completion that could be used to add
 this (and I'd like to see it for these and other solaris specific
 commands).
 
 I'm sure ksh93 has something similar.
 
 Boyd
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 


//
// Nathan Kroenert  [EMAIL PROTECTED]   //
// Senior Systems Engineer  Phone:  +61 3 9869 6255 //
// Global Systems Engineering   Fax:+61 3 9869 6288 //
// Level 7, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZSF Solaris

2008-09-30 Thread Nathan Kroenert

Actually, the one that'll hurt most is ironically the most closely 
related to bad database schema design... With a zillion files in the one 
directory, if someone does an 'ls' in that directory, it'll not only 
take ages, but steal a whole heap of memory and compute power...

Provided the only things that'll be doing *anything* in that directory 
are using indexed methods, there is no real problem from a ZFS 
perspective, but if something decides to list (or worse, list and sort) 
that directory, it won't be that pleasant.

Oh - That's of course assuming you have sufficient memory in the system 
to cache all that metadata somewhere... If you don't then that's another 
zillion I/O's you need to deal with each time you list the entire directory.

an ls -1rt on a directory with about 1.2 million files with names like 
afile1202899 takes minutes to complete on my box, and we see 'ls' get to 
in excess of 700MB rss... (and that's not including the memory zfs is 
using to cache whatever it can.)

My box has the ARC limited to about 1GB, so it's obviously undersized 
for such a workload, but still gives you an indication...

I generally look to keep directories to a size that allows the utilities 
that work on and in it to perform at a reasonable rate... which for the 
most part is around the 100K files or less...

Perhaps you are using larger hardware than I am for some of this stuff? :)

Nathan.

On  1/10/08 07:29 AM, Toby Thain wrote:
 On 30-Sep-08, at 7:50 AM, Ram Sharma wrote:
 
 Hi,

 can anyone please tell me what is the maximum number of files that  
 can be there in 1 folder in Solaris with ZSF file system.

 I am working on an application in which I have to support 1mn  
 users. In my application I am using MySql MyISAM and in MyISAM  
 there is 3 files created for 1 table. I am having application  
 architechture in which each user will be having separate table, so  
 the expected number of files in database folder is 3mn.
 
 That sounds like a disastrous schema design. Apart from that, you're  
 going to run into problems on several levels, including O/S resources  
 (file descriptors) and filesystem scalability.
 
 --Toby
 
 I have read somewhere that there is a limit of each OS to create  
 files in a folder.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 


//
// Nathan Kroenert  [EMAIL PROTECTED]   //
// Senior Systems Engineer  Phone:  +61 3 9869 6255 //
// Global Systems Engineering   Fax:+61 3 9869 6288 //
// Level 7, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] CF to SATA adapters for boot device

2008-08-20 Thread Nathan Kroenert

I second that question, and also ask what brand folks like for 
performance and compatibility?

Ebay is killing me with vast choice and no detail... ;)

Nathan.

Al Hopper wrote:
 On Wed, Aug 20, 2008 at 12:57 PM, Neal Pollack [EMAIL PROTECTED] wrote:
 Ian Collins wrote:
 Brian Hechinger wrote:
 On Wed, Aug 20, 2008 at 05:17:45PM +1200, Ian Collins wrote:

 Has anyone here had any luck using a CF to SATA adapter?

 I've just tried an Addonics ADSACFW CF to SATA adaptor with an 8GB card 
 that I wanted to use for a boot pool and even though the BIOS reports the 
 disk, Solaris B95 (or the installer) doesn't see it.

 I tried this a while back with an IDE to CF adapter.  Real nice looking 
 one too.

 It would constantly cause OpenBSD to panic.

 I would recommend against using this, unless you get real lucky.  If you 
 want
 flash to boot from, buy one of the ones that is specifically made for it 
 (not
 CF, but industrial grade flash meant to be a HDD).  Those things work a LOT
 better.  I can look up the details of the ones my friend uses if you'd 
 like.


 I was looking to run some tests with a CF boot drive before we get an
 X4540, which has a CF slot. The installer did see the attached USB sticks...
 My team does some of the testing inside Sun for the CF boot devices.
 We've used a number of IDE attaced CF adapters, such as;
 http://www.addonics.com/products/flash_memory_reader/ad44midecf.asp
 and also some random models from www.frys.com.
 We also test the CF boot feature on various Sun rack servers and blades
 that use a CF socket.

 I have not tested the SATA adapters but would not expect issues.
 I'd like to know if you find issues.


 The IDE attached devices use the legacy ATA/IDE device driver software,
 which had some bugs fixed for DMA and misc CF specific issues.
 It would be interesting to see if a SATA adapter for CF, set in bios to
 use AHCI instead of Legacy/IDE mode, would have any issues with
 the AHCI device driver software.  I've had no reason to test this yet, since
 the Sun HW models build the CF socket right onto the motherboard/bus.
 I can't find a reason to worry about hot-plug, since removing the boot
 drive while Solaris is running would be, um, somewhat interesting :-)

 True, the enterprise grade devices are higher quality and will last longer.
 But do not underestimate the current (2008) device wear leveling firmware
 that controls the CF memory usage, and hence life span.  Our in house
 destructive life span testing shows that the commercial grade CF device
 will last longer than the motherboard will.  The consumer grade devices
 
 Interesting thread - thanks to all the contributors.  I've seen, on
 several different forums, that many CF users lean towards Sandisk for
 reliability and longevity.  Does anyone else see consensus in terms of
 CF brands?
 
 that you find in the store or on mail order, may or may not be current
 generation, so your device lifespan will vary.  It should still be rather
 good for a boot device,  because Solaris does very little writing to the
 boot disk.  You can review configuration ideas to maximize the life
 of your CF device in this Solaris white paper for non-volatile memory;
 http://www.sun.com/bigadmin/features/articles/nvm_boot.jsp

 I hope this helps.

 Cheers,

 Neal Pollack

 Any further information welcome.

 Ian
 
 Regards,
 

-- 
//
// Nathan Kroenert  [EMAIL PROTECTED] //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] help me....

2008-08-04 Thread Nathan Kroenert

It starts with Z, which makes it the one of the last to be considered if 
it's listed alphabetically?

Nathan.

Rahul wrote:
 hi 
 can you give some disadvantages of the ZFS file system??
 
 plzz its urgent...
 
 help me.
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Supermicro AOC-USAS-L8i

2008-08-04 Thread Nathan Kroenert

And I can certainly vouch for that series of chipsets... I have a 
750a-sli chipset (the one below the 790) and the SATA ports (in AHCI 
mode) Just Work(tm) under nevada / opensolaris.

I'm yet to give it a while on S10, mostly as I pretty much run nevada 
everywhere... As S10 does indeed have an AHCI driver, I'd expect it 
would work just fine there too.

Oh - and the ports go like stink!*

For what it's worth, even with Nevada, you will need the newest NVidia 
Xorg drivers from nvidia's website to get the video working properly, 
and will need to add in it's PCI ID's in /etc/driver_aliases (And, as 
yet, I'm unable to run compiz in a stable way - Tends to hard lock up 
the machine after about 5 minutes use...), a very new hdaudio driver (I 
needed a bodgied up one from the Beijing team to make it work) and last 
I checked, the nvidia ethernet did not work properly without assigning 
it a valid ethernet address... (The driver misreads the ethernet address 
and either delivers it backwards, or byte-swaps... I don't remember 
exactly...)

Oh - And just in case you forget, most boards I have seen use IDE mode 
for the controllers by default, which reeks. Expect less than 15 MB/s if 
reading and writing at the same time if you forget to change the 
controller mode to AHCI!

For what it's worth, the board I'm using is a giga-byte..

   Manufacturer: Gigabyte Technology Co., Ltd.
   Product: M750SLI-DS4

Which also has the 6 X AHCI ports.

It might seem like it'll be a lot of hassle getting it working, but in 
the ZFS space, it works great pretty much out of the box (plus ethernet 
address change if the nvidia driver is still busted... ;)

Cheers!

Nathan.

*Going like stink means going like a hairy goat - like lightning - like 
s*it off a shovel - like a zyrtec - fast. :)

Brandon High wrote:
 On Mon, Aug 4, 2008 at 6:49 AM, Tim [EMAIL PROTECTED] wrote:
 really had the motivation or the cash to do so yet.  I've been keeping my
 eye out for a board that supports the opteron 165 and the wider lane dual
 pci-E slots that isn't stricly a *gaming* board.  I'm starting to think the
 combination doesn't exist.
 
 The AMD 790GX boards are starting to show up:
 http://www.newegg.com/Product/Product.aspx?Item=N82E16813128352
 
 Dual 8x PCIe slots, integrated video and 6 AHCI SATA ports.
 
 -B
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to delete hundreds of emtpy snapshots

2008-07-17 Thread Nathan Kroenert

In one of my prior experiments, I included the names of the snapshots I 
created in a plain text file.

I used this file, and not the zfs list output to determine which 
snapshots I was going to remove when it came time.

I don't even remember *why* I did that in the first place, but it 
certainly made things easier when it came time to clean up a whole bunch 
of stuff...

(And was not impacted by zfs list being non-snappy...)

The snapshot naming scheme meant that it was dead easy to work out which 
to remove / keep...

Right now, I don't have a system (that box was killed in a dreadful xen 
experiment :) so I'll be watching this thread with renewed interest to 
see who else is doing what...

Nathan.

Bob Friesenhahn wrote:
 On Thu, 17 Jul 2008, Ben Rockwood wrote:
 
 zfs list is mighty slow on systems with a large number of objects, 
 but there is no foreseeable plan that I'm aware of to solve that 
 problem.

 Never the less, you need to do a zfs list, therefore, do it once and 
 work from that.
 
 If the snapshots were done from a script then their names are easily 
 predictable and similar logic can be used to re-create the existing 
 names.  This avoids the need to do a 'zfs list'.
 
 Bob
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Nathan Kroenert

Even better would be using the ZFS block checksums (assuming we are only 
summing the data, not it's position or time :)...

Then we could have two files that have 90% the same blocks, and still 
get some dedup value... ;)

Nathan.

Charles Soto wrote:
 A really smart nexus for dedup is right when archiving takes place.  For
 systems like EMC Centera, dedup is basically a byproduct of checksumming.
 Two files with similar metadata that have the same hash?  They're identical.
 
 Charles
 
 
 On 7/7/08 4:25 PM, Neil Perrin [EMAIL PROTECTED] wrote:
 
 Mertol,

 Yes, dedup is certainly on our list and has been actively
 discussed recently, so there's hope and some forward progress.
 It would be interesting to see where it fits into our customers
 priorities for ZFS. We have a long laundry list of projects.
 In addition there's bug fixes  performance changes that customers
 are demanding.

 Neil.
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS write / read speed and traps for beginners

2008-06-15 Thread Nathan Kroenert

Further followup to this thread...

After being beaten sufficiently with a clue-bat, it was determined that 
the nforce 750a could do ahci mode for it's SATA stuff.

I set it to ahci, and redid the devlinks etc and cranked it up as AHCI.

I'm now regularly peaking at 100MB/s, though spending most of the time 
around 70MB/s.

*much better*

The lesson here is: when in ahci mode in the bios, *don't* match that 
PCI-ID with the nv-sata driver. It's not what you want.

heh. *blush*.

Once I removed the extra nv_sata entries I had added to the 
driver_aliases in my miniroot, all was good.

On the NGE front, it turns out that solaris does not seem to like the 
ethernet address of the card. Trying to set it's OWN ethernet address 
using ifconfig yielded this:
# ifconfig nge0 ether 63:d0:b:7d:1d:0
ifconfig: dlpi_set_physaddr failed nge0: DLSAP address in improper 
format or invalid
ifconfig: failed setting mac address on nge0

using

ifconfig nge0 ether 0:e:c:5b:54:45

worked just fine, and the interface now passes traffic and sees 
responses just fine. So, the workaround here is adding
   ether a working ether address
in the hostname.nge0

I guess I'll log a bug on that on Monday...

Awesome. Now to work on audio...

heh.

Nathan.

Nathan Kroenert wrote:
 Hey all -
 
 Just spent quite some time trying to work out why my 2 disk mirrored ZFS 
 pool was running so slow, and found an interesting answer...
 
 System: new Gigabyte M750sli-DS4, AMD 9550, 4GB memory and 2 X Seagate 
 500GB SATA-II 32mb cache disks.
 
 The SATA ports on the nfoce 750asli chipset don't yet seem to be 
 supported by the nv_sata driver (I'm only running nv_89 at the mo, 
 though I'm not aware of new support going in just yet). I *can* get the 
 driver to attach, but not to see any disks. interesting, but I digress...
 
 Anyhoo, - I'm stuck in IDE compatability mode for the moment.
 
 So - using plain dd to the zfs filesystem on said disk
 
   dd if=/dev/zero of=delete.me bs=65536
 
 I could achieve only about 35-40MB/s write speed, whereas, if I dd to 
 the slice directly, I can get around 90-95MB/s
 
 I tried using whole disks versus a slice and it made no appreciable 
 difference.
 
 It turns out that when you are in IDE compatability mode, having two 
 disks on the same 'controller' (c# in solaris) behaves just like real 
 IDE... Crap!
 
 Moving the second disk onto from c1 to c2 got be back to at least 50MB/s 
 with higher peaks, up to 60/70MB/s.
 
 Also of note, on the gigabyte board (and I guess other nforce 750asli 
 based chipsets) only 4 of the 6 SATA ports work when in IDE mode.
 
 Other thoughts on the Nforce 750a:
   - nge plumbs up OK and can send and 'see' packets, but does not seem 
 to know itself... In promiscuous mode, you can see returning icmp echo 
 requests, but they don't make it to the top of the stack.
 I had to use an e1000g in a PCI slot to get my networking working 
 properly...
   - Onboard Video works, including compiz, but you need to create an 
 xorg.conf and update the nvidia driver with the latest from the nvidia 
 website
 
 Seems snappy enough. With 4 cores @ 2.2Ghz (phenom 9550) it's looking 
 like it'll do what I wanted quite nicely.
 
 Later...
 
 Nathan.
 
 
 

-- 
//
// Nathan Kroenert  [EMAIL PROTECTED] //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS write / read speed and traps for beginners

2008-06-14 Thread Nathan Kroenert

Hey all -

Just spent quite some time trying to work out why my 2 disk mirrored ZFS 
pool was running so slow, and found an interesting answer...

System: new Gigabyte M750sli-DS4, AMD 9550, 4GB memory and 2 X Seagate 
500GB SATA-II 32mb cache disks.

The SATA ports on the nfoce 750asli chipset don't yet seem to be 
supported by the nv_sata driver (I'm only running nv_89 at the mo, 
though I'm not aware of new support going in just yet). I *can* get the 
driver to attach, but not to see any disks. interesting, but I digress...

Anyhoo, - I'm stuck in IDE compatability mode for the moment.

So - using plain dd to the zfs filesystem on said disk

  dd if=/dev/zero of=delete.me bs=65536

I could achieve only about 35-40MB/s write speed, whereas, if I dd to 
the slice directly, I can get around 90-95MB/s

I tried using whole disks versus a slice and it made no appreciable 
difference.

It turns out that when you are in IDE compatability mode, having two 
disks on the same 'controller' (c# in solaris) behaves just like real 
IDE... Crap!

Moving the second disk onto from c1 to c2 got be back to at least 50MB/s 
with higher peaks, up to 60/70MB/s.

Also of note, on the gigabyte board (and I guess other nforce 750asli 
based chipsets) only 4 of the 6 SATA ports work when in IDE mode.

Other thoughts on the Nforce 750a:
  - nge plumbs up OK and can send and 'see' packets, but does not seem 
to know itself... In promiscuous mode, you can see returning icmp echo 
requests, but they don't make it to the top of the stack.
I had to use an e1000g in a PCI slot to get my networking working 
properly...
  - Onboard Video works, including compiz, but you need to create an 
xorg.conf and update the nvidia driver with the latest from the nvidia 
website

Seems snappy enough. With 4 cores @ 2.2Ghz (phenom 9550) it's looking 
like it'll do what I wanted quite nicely.

Later...

Nathan.



-- 
//
// Nathan Kroenert  [EMAIL PROTECTED] //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SATA controller suggestion

2008-06-07 Thread Nathan Kroenert

Tim wrote:
 
 
 
 **pci or pci-x.  Yes, you might see *SOME* loss in speed from a pci 
 interface, but let's be honest, there aren't a whole lot of users on 
 this list that have the infrastructure to use greater than 100MB/sec who 
 are asking this sort of question.  A PCI bus should have no issues 
 pushing that.
 

Hm.

If it's a system with only 1 PCI bus, there are still a few things to 
consider here.

If it's plain old 33mhz, 32 bit PCI your 100MB/s(ish) usable bandwidth 
is actually total bandwidth. That's 50MB/s in and 50MB/s out, if you are 
copying disk to disk...

I am about to update my home server for exactly the issue of saturating 
my PCI bus... It's even worse for me, as I'm mirroring, so, that works 
out to closer to 33MB/s read, 33MB/s write + 33 MB/s write to the mirror.

All in all, it blows.

I'm looking into one of the new gigabyte NVIDIA based systems with the 
750aSLI chipsets. I'm *hoping* the Solaris nv_sata drivers will work 
with the new chipset (or that we are on the way to updating them...).

My other box that's using the Nforce 570 works like a champ, and I'm 
hoping to recapture that magic. (I actually wanted to buy some more 570 
based MB's but cannot get 'em in Australia any more... :)

Cheers!

Nathan.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] More USB Storage Issues

2008-06-05 Thread Nathan Kroenert

For what it's worth, I started playing with USB + flash + ZFS and was 
most unhappy for quite a while.

I was suffering with things hanging, going slow or just going away and 
breaking, and thought I was witnessing something zfs was doing as I was 
trying to do mirror recovery and all that sort of stuff.

On a hunch, I tried doing UFS and RAW instead and saw the same issues.

It's starting to look like my USB hubs. Once they are under any 
reasonable read/write load, they just make bunches of things go offline.

Yep - They are powered and plugged in.

So, at this stage, I'll be grabbing a couple of 'better' USB hubs (Mine 
are pretty much the cheapest I could buy) and see how that goes.

For gags, take ZFS out of the equation and validate that your hardware 
is actually providing a stable platform for ZFS... Mine wasn't...

Nathan.

Evan Geller wrote:
 So, I've been stuck in kind of an ugly pattern. I zpool create and nothing 
 goes wrong for a while, and then eventually I'll zpool status, which doesn't 
 respond to ^C or kill -9s or anything. Also, setting NOINUSE_CHECK=1 doesn't 
 appear to make a difference. I'll try and truss it next time I get a chance 
 if that helps. 
 
 Anywho, other problem is I get a huge storm of these around the same time 
 zpool hangs.
 
 Jun  4 23:17:59 cake scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED],7/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0 (sd5):
 Jun  4 23:17:59 cakeoffline or reservation conflict
 Jun  4 23:18:00 cake scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED],7/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0 (sd5):
 Jun  4 23:18:00 cakeoffline or reservation conflict
 Jun  4 23:18:01 cake scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED],7/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0 (sd6):
 Jun  4 23:18:01 cakeoffline or reservation conflict
 Jun  4 23:18:02 cake scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED],7/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0 (sd6):
 Jun  4 23:18:02 cakeoffline or reservation conflict
 Jun  4 23:18:03 cake scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED],7/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0 (sd6):
 Jun  4 23:18:03 cakeoffline or reservation conflict
 Jun  4 23:18:04 cake scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED],7/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0 (sd6):
 Jun  4 23:18:04 cakeoffline or reservation conflict
 Jun  4 23:18:04 cake zfs: [ID 664491 kern.warning] WARNING: Pool 'tank' has 
 encountered an uncorrectable I/O error. Manual intervention is required.
 
 Sorry if this isn't enough information, but if there's anything else I can 
 provide that'll help please let me know.
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
//
// Nathan Kroenert  [EMAIL PROTECTED] //
// Technical Support Engineer   Phone:  +61 3 9869-6255 //
// Sun Services Fax:+61 3 9869-6288 //
// Level 3, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Get your SXCE on ZFS here!

2008-06-04 Thread Nathan Kroenert

format -e is your window to cache settings.

As for the auto-enabling, I'm not sure, as IIRC, we do different things 
based on disk technology.

eg: IDE + SATA - Always enabled
 SCSI - Disabled by default, unless you give ZFS the whole disk.

I think.

On a couple of my systems, this seems to ring true.

Not at all sure about SAS.

If I'm wrong here, hopefully someone else will provide the complete set 
of logic for determining cache enabling semantics.

:)

Nathan.

Brian Hechinger wrote:
 On Wed, Jun 04, 2008 at 09:17:05PM -0400, Ellis, Mike wrote:
 The FAQ document (
 http://opensolaris.org/os/community/zfs/boot/zfsbootFAQ/ ) has a
 jumpstart profile example:
 
 Speaking of the FAQ and mentioning the need to use slices, how does that
 affect the ability of Solaris/ZFS to automatically enable the disk's
 cache?  Does it need to be manually over-ridden (unlike giving ZFS the
 whole disk where it automatically turns the disk cache on)?
 
 Also, how can you check if the disk's cache has been enabled or not?
 
 Thanks,
 
 -brian

-- 
//
// Nathan Kroenert  [EMAIL PROTECTED] //
// Technical Support Engineer   Phone:  +61 3 9869-6255 //
// Sun Services Fax:+61 3 9869-6288 //
// Level 3, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-04 Thread Nathan Kroenert

I'd expect it's the old standard.

if /var/tmp is filled, and that's part of /, then bad things happen.

there are often other places in /var that are writable by more than 
root, and always the possibility that something barfs heavily into syslog.

Since the advent of reasonably sized disks, I know many don't consider 
this an issue these days, but I'd still be inclined to keep /var (and 
especially /var/tmp) separated from /

In ZFS, this is, of course, just two filesystems in the same pool, with 
differing quotas...

:)

Nathan.

Rich Teer wrote:
 On Wed, 4 Jun 2008, Bob Friesenhahn wrote:
 
 Did you actually choose to keep / and /var combined?  Is there any 
 
 THat's what I'd do...
 
 reason to do that with a ZFS root since both are sharing the same pool 
 and so there is no longer any disk space advantage?  If / and /var are 
 not combined can they have different assigned quotas without one 
 inheriting limits from the other?
 
 Why would one do that?  Just keep an eye on the root pool and all is good.
 

-- 
//
// Nathan Kroenert  [EMAIL PROTECTED] //
// Technical Support Engineer   Phone:  +61 3 9869-6255 //
// Sun Services Fax:+61 3 9869-6288 //
// Level 3, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] x4500 Thumper panic

2008-05-11 Thread Nathan Kroenert - Server ESG

Dumping to /dev/dsk/c6t0d0s1

certainly looks like a non-mirrored dump dev...

You  might try a manual savecore telling it to ignore the dump valid 
header and see what you get...

savecore -d

and perhaps try telling it to look directly at the dump device...

savecore -f device

You should also, when you get the chance, deliberately panic the box to 
make sure you can actually capture a dump...

dumpadm is your friend as far as checking where you are going to dump 
to, and it it's one side of your swap mirror, that's bad, M'Kay?

:)

Nathan.

Jorgen Lundman wrote:
 OK, this is a pretty damn poor panic report if I may say no, not had 
 much sleep.
 
  Solaris Express Developer Edition 9/07 snv_70b X86
 Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
  Use is subject to license terms.
  Assembled 30 August 2007
 
 SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc
 
 Even though it dumped, it wrote nothing to /var/crash/. Perhaps because 
 swap is mirrored.
 
 
 
 Jorgen Lundman wrote:
 We had a panic around noon on Saturday, which it mostly recovered 
 itself. All ZFS NFS exports just remounted, but the UFS on zdev NFS 
 exports did not, needed manual umount  mount on all clients for some 
 reason.

 Is this a known bug we should consider a patch for?



 May 10 11:49:46 x4500-01.unix ufs: [ID 912200 kern.notice] quota_ufs:
 over hard
 disk limit (pid 477, uid 127409, inum 1047211, fs /export/zero1)
 May 10 11:51:26 x4500-01.unix unix: [ID 836849 kern.notice]
 May 10 11:51:26 x4500-01.unix ^Mpanic[cpu3]/thread=17b8c820:
 May 10 11:51:26 x4500-01.unix genunix: [ID 335743 kern.notice] BAD TRAP:
 type=e
 (#pf Page fault) rp=ff001f4ca220 addr=0 occurred in module
 unknown due t
 o a NULL pointer dereference
 May 10 11:51:26 x4500-01.unix unix: [ID 10 kern.notice]
 May 10 11:51:26 x4500-01.unix unix: [ID 839527 kern.notice] nfsd:
 May 10 11:51:26 x4500-01.unix unix: [ID 753105 kern.notice] #pf Page fault
 May 10 11:51:26 x4500-01.unix unix: [ID 532287 kern.notice] Bad kernel
 fault at
 addr=0x0
 May 10 11:51:26 x4500-01.unix unix: [ID 243837 kern.notice] pid=477,
 pc=0x0, sp=
 0xff001f4ca318, eflags=0x10246
 May 10 11:51:26 x4500-01.unix unix: [ID 211416 kern.notice] cr0:
 8005003bpg,wp,
 ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de
 May 10 11:51:26 x4500-01.unix unix: [ID 354241 kern.notice] cr2: 0 cr3:
 1fcbbc00
 0 cr8: c
 May 10 11:51:26 x4500-01.unix unix: [ID 592667 kern.notice] rdi:
 fffedef
 ea000 rsi:9 rdx:0
 May 10 11:51:26 x4500-01.unix unix: [ID 592667 kern.notice] rcx:
 17b
 8c820  r8:0  r9: ff054797dc48
 May 10 11:51:26 x4500-01.unix unix: [ID 592667 kern.notice] rax:

  0 rbx:  97eaffc rbp: ff001f4ca350
 May 10 11:51:26 x4500-01.unix unix: [ID 592667 kern.notice] r10:

  0 r11: fffec8b93868 r12: 27991000
 May 10 11:51:27 x4500-01.unix unix: [ID 592667 kern.notice] r13:
 fffed1b
 59c00 r14: fffecf8d8cc0 r15: 1000
 May 10 11:51:27 x4500-01.unix unix: [ID 592667 kern.notice] fsb:

  0 gsb: fffec3d5a580  ds:   4b
 May 10 11:51:27 x4500-01.unix unix: [ID 592667 kern.notice]  es:

 4b  fs:0  gs:  1c3
 May 10 11:51:27 x4500-01.unix unix: [ID 592667 kern.notice] trp:

  e err:   10 rip:0
 May 10 11:51:27 x4500-01.unix unix: [ID 592667 kern.notice]  cs:

 30 rfl:10246 rsp: ff001f4ca318
 May 10 11:51:27 x4500-01.unix unix: [ID 266532 kern.notice]  ss:

 38
 May 10 11:51:27 x4500-01.unix unix: [ID 10 kern.notice]
 May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice]
 ff001f4ca100
 unix:die+c8 ()
 May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice]
 ff001f4ca210
 unix:trap+135b ()
 May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice]
 ff001f4ca220
 unix:_cmntrap+e9 ()
 May 10 11:51:27 x4500-01.unix genunix: [ID 802836 kern.notice]
 ff001f4ca350
 0 ()
 May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice]
 ff001f4ca3d0
 ufs:top_end_sync+cb ()
 May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice]
 ff001f4ca440
 ufs:ufs_fsync+1cb ()
 May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice]
 ff001f4ca490
 genunix:fop_fsync+51 ()
 May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice]
 ff001f4ca770
 nfssrv:rfs3_create+604 ()
 May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice]
 ff001f4caa70
 nfssrv:common_dispatch+444 ()
 May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice]
 ff001f4caa90
 nfssrv:rfs_dispatch+2d ()
 May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice]
 ff001f4cab80
 rpcmod:svc_getreq+1c6 ()
 May 10 11:51:27 x4500-01.unix genunix: [ID 655072 kern.notice]
 ff001f4cabf0

Re: [zfs-discuss] zfs data corruption

2008-04-27 Thread Nathan Kroenert - Server ESG

Note: IANATZD (I Am Not A Team-ZFS Dude)

Speaking as a Hardware Guy, knowing that something is happening, has 
happened or is indicated to happen is a Good Thing (tm).

Begin unlikely, but possible scenario:

If, for instance, I'm getting a cluster of read errors (or, perhaps bad 
blocks), I could:
  - See it as it's happening
  - See the block number for each error
  - already know the rate at which the errors are happening
  - Be able to determine that it's not good, and it's time to replace 
the disk.
  - You get the picture...

And based on this information, I could feel confident that I have the 
right information at hand to be able to determine that it is or is not 
time to replace this disk.

Of course, that assumes:
  - I know anything about disks
  - I know anything about the error messages
  - I have some sort of logging tool that recognises the errors (and 
does not just throw out the 'retryable ones', as most I have seen are 
configured to do)
  - I care
  - The folks watching the logs in the enterprise management tool care
  - My storage even bothers to report the errors

Certainly, for some organisations, all of the above are exactly how it 
works, and it works well for them.

Looking at the ZFS/FMA approach, it certainly is somewhat different.

The (very) rough concept is that FMA gets pretty much all errors 
reported to it. It logs them, in a persistent store, which is always 
available to view. It also makes diagnoses on the errors, based on the 
rules that exist for that particular style of error. Once enough (or the 
right type of) errors happen, it'll then make a Fault Diagnosis for that 
component, and log a message, loud and proud into the syslog. It may 
also take other actions, like, retire a page of memory, offline a CPU, 
panic the box, etc.

So - That's the rough overview.

It's worth noting up front that we can *observe* every event that has 
happened. Using fmdump and fmstat we can immediately see if anything 
interesting has been happening, or we can wait for a Fault Diagnosis, in 
which case, we can just watch /var/adm/messages.

I also *believe* (though am not certain - Perhaps someone else on the 
list might be?) it would be possible to have each *event* (so - the 
individual events that lead to a Fault Diagnosis) generate a message if 
it was required, though I have never taken the time to do that one...

There are many advantages to this approach - It does not rely on 
logfiles, offsets into logfiles, counters of previously processes 
messages and all of the other doom and gloom that comes with scraping 
logfiles. It's something you can simply ask: Any issues, chief? The 
answer is there in a flash.

You will also be less likely to have the messages rolled out of the logs 
before you get to them (another classic...).

And - You get some great details from fmdump showing you what's really 
going on, and it's something that's really easy to parse to look for 
patterns.

All of this said, I understand if you feel things are being 'hidden' 
from you until it's *actually* busted that you are having some of your 
forward vision obscured 'in the name of a quiet logfile'. I felt much 
the same way for a period of time. (Though, I live more in the CPU / 
Memory camp...)

But - Once I realised what I could do with fmstat and fmdump, I was not 
the slightest bit unhappy (Actually, that's not quite true... Even once 
I knew what they could do, it still took me a while to work out the 
options I cared about for fmdump / fmstat), but I now trust FMA to look 
after my CPU / Memory issues better than I would in real life. I can 
still get what I need when I want to, and the data is actually more 
accessible and interesting. I just needed to know where to go looking.

All this being said, I was not actually aware that many of our disk / 
target drivers were actually FMA'd up yet. heh - Shows what I know.

Does any of this make you feel any better (or worse)?

Nathan.

Mark A. Carlson wrote:
 fmd(1M) can log faults to syslogd that are already diagnosed. Why
 would you want the random spew as well?
 
 -- mark
 
 Carson Gaspar wrote:
 [EMAIL PROTECTED] wrote:

   
 It's not safe to jump to this conclusion.  Disk drivers that support FMA
 won't log error messages to /var/adm/messages.  As more support for I/O
 FMA shows up, you won't see random spew in the messages file any more.
 

 mode=large financial institution paying support customer
 That is a Very Bad Idea. Please convey this to whoever thinks that 
 they're helping by not sysloging I/O errors. If this shows up in 
 Solaris 11, we will Not Be Amused. Lack of off-box error logging will 
 directly cause loss of revenue.
 /mode

   
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss

Re: [zfs-discuss] zfs data corruption

2008-04-23 Thread Nathan Kroenert

   c4t60A9800043346859444A476B2D485872d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D485758d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D485642d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D485471d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D485357d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D485241d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D485071d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D484F56d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D484E41d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D484C70d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D484B56d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D484A2Dd0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D484870d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D484755d0  ONLINE   0 0 0
   c4t60A9800043346859444A476B2D48462Dd0  ONLINE   0 0 0
 
 errors: The following persistent errors have been detected:
 
   DATASET  OBJECT  RANGE
   zpool1   17  2428895232-2429026304
   zpool1   17  2429026304-2429157376
   zpool1   17  2429157376-2429288448
   zpool1   17  2429288448-2429419520
   zpool1   17  2429419520-2429550592
   zpool1   17  2463629312-2463760384
   zpool1   17  2463760384-2463891456
   zpool1   17  2463891456-2464022528
   zpool1   17  2464022528-2464153600
   zpool1   17  2464153600-2464284672
   zpool1   18  2397700096-2397831168
   zpool1   18  2397831168-2397962240
   zpool1   18  2397962240-2398093312
   zpool1   18  2398093312-2398224384
   zpool1   18  2398224384-2398355456
   zpool1   18  2432434176-2432565248
   zpool1   18  2432565248-2432696320
   zpool1   18  2432696320-2432827392
   zpool1   18  2432827392-2432958464
   zpool1   18  2432958464-2433089536
   zpool1   19  2418933760-2419064832
   zpool1   19  2419064832-2419195904
   zpool1   19  2419195904-2419326976
   zpool1   19  2419326976-2419458048
   zpool1   19  2453798912-2453929984
   zpool1   19  2453929984-2454061056
   zpool1   19  2454061056-2454192128
   zpool1   19  2454192128-2454323200
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
//
// Nathan Kroenert  [EMAIL PROTECTED] //
// Technical Support Engineer   Phone:  +61 3 9869-6255 //
// Sun Services Fax:+61 3 9869-6288 //
// Level 3, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Zfs send takes 3 days for 1TB?

2008-04-09 Thread Nathan Kroenert

Indeed -

If it was 100Mb/s ethernet, 1TB would take near enough 24 hours just to 
push that much data...

Would be great to see some details of the setup and where the bottleneck 
was. I'd be surprised if ZFS has anything to do with the transfer rate...

But an interesting read anyways. :)

Nathan.



Nicolas Williams wrote:
 On Wed, Apr 09, 2008 at 11:38:03PM -0400, Jignesh K. Shah wrote:
 Can zfs send utilize multiple-streams of data transmission (or some sort 
 of multipleness)?

 Interesting read for background
 http://people.planetpostgresql.org/xzilla/index.php?/archives/338-guid.html

 Note: zfs send takes 3 days for 1TB to another system
 
 Huh?  That article doesn't describe how they were moving the zfs send
 stream, whether the limit was the network, ZFS or disk I/O.  In fact,
 it's bereft of numbers.  It even says that the transfer time is not
 actually three days but upwards of 24 hours.
 
 Nico

-- 
//
// Nathan Kroenert  [EMAIL PROTECTED] //
// Technical Support Engineer   Phone:  +61 3 9869-6255 //
// Sun Services Fax:+61 3 9869-6288 //
// Level 3, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS on Sun X2100?

2008-03-19 Thread Nathan Kroenert

Did you do anything specific with the drive caches?

How is your ZFS performance?

Nathan. :)

Rich Teer wrote:
 On Wed, 19 Mar 2008, Terence Ng wrote:
 
 I am new to Solaris. I have Sun X2100 with 2 x 80G harddisks (run as
 email server, run tomcat, jboss and postgresql) and want to run as
 mirror to secure the data. Since ZFS cannot be used as a root file
 system , does that mean I am no way can benefit from using ZFS?
 
 Nope, in fact I have set up an X2100 pretty much exactly as you want.
 set up 5 partitions: /, swap, space for live upgrade, a small partition
 for the SVM metadbs, and the rest of the disk.  This last one is used
 as that machines zdev for its ZFS pool.
 
 So, root and swap mirrored using SVM, and everything else on a mirrored
 ZFS pool.
 
 HTH,
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs 32bits

2008-03-06 Thread Nathan Kroenert

Paul -

Don't substitute redundancy for backup...

if your data is important to you, for the love of steak, make sure you 
have a backup that would not be destroyed by, say, a lightening strike, 
fire or stray 747.

For what it's worth, I'm also using ZFS on 32 bit and am yet to 
experience any sort of issues.

An external 500GB disk + external USB enclosure runs for what - $150?

That's what I use anyways. :)

Nathan.

Paul Kraus wrote:
 On Thu, Mar 6, 2008 at 10:22 AM, Brian D. Horn [EMAIL PROTECTED] wrote:
 
 ZFS is not 32-bit safe.  There are a number of places in the ZFS code where
  it is assumed that a 64-bit data object is being read atomically (or set
  atomically).  It simply isn't true and can lead to weird and bugs.
 
 This is disturbing, especially as I have not seen this
 documented anywhere. I have a dual P-III 550 Intel system with 1 GB of
 RAM (Intel L440GX+ motherboard). I am running Solaris 10U4 and am
 using ZFS (mirrors and stripes only, no RAIDz). While this is 'only' a
 home server, I still cannot afford to lose over 500 GB of data. If ZFS
 isn't supported under 32 bit systems then I need to start migrating to
 UFS/SLVM as soon as I can. I specifically went with 10U4 so that I
 would have a stable, supportable environment.
 
 Under what conditions are the 32 bit / 64 bit problems likely
 to occur ? I have been running this system for 6 months (a migration
 from OpenSuSE 10.1) without any issues. The NFS server performance is
 at least an order of magnitude better than the SuSE server was.
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Nathan Kroenert

Bob Friesenhahn wrote:
 On Tue, 4 Mar 2008, Nathan Kroenert wrote:

 It does seem that some of us are getting a little caught up in disks 
 and their magnificence in what they write to the platter and read 
 back, and overlooking the potential value of a simple (though 
 potentially computationally expensive) circus trick, which might, just 
 might, make your broken 1TB archive useful again...
 
 The circus trick can be handled via a user-contributed utility.  In 
 fact, people can compete with their various repair utilities.  There are 
 only 1048576 1-bit permuations to try, and then the various two-bit 
 permutations can be tried.

That does not sound 'easy', and I consider that ZFS should be... :) and 
IMO it's something that should really be built in, not attacked with an 
addon.

I had (as did Jeff in his initial response) considered that we only need 
to actually try to flip 128KB worth of bits once... That many flips 
means that we in a way 'processing' some 128GB in the worse case when 
re-generating checksums.  Internal to a CPU, depending on Cache 
Aliasing, competing workloads, threadedness, etc, this could be 
dramatically variable... something I guess the ZFS team would want to 
keep out of the 'standard' filesystem operation... hm. :\

 I don't think it's a good idea for us to assume that it's OK to 'leave 
 out' potential goodness for the masses that want to use ZFS in 
 non-enterprise environments like laptops / home PC's, or use commodity 
 components in conjunction with the Big Stuff... (Like white box PC's 
 connected to an EMC or HDS box... )
 
 It seems that goodness for the masses has not been left out.  The 
 forthcoming ability to request duplicate ZFS blocks is very good news 
 indeed.  We are entering an age where the entry level SATA disk is 1TB 
 and users have more space than they know what to do with.  A little 
 replication gives these users something useful to do with their new disk 
 while avoiding the need for unreliable circus tricks to recover data.  
 ZFS goes far beyond MS-DOS's recover command (which should have been 
 called destroy).

I never have enough space on my laptop... I guess I'm a freak.

But - I am sure that we are *both* right for some subsets of ZFS users, 
and that the more choice we have built into the filesystem, the better.

Thanks again for the comments!

Nathan.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-03 Thread Nathan Kroenert

Hey, Bob

My perspective on Big reasons for it *to* be integrated would be:
  - It's tested - By the folks charged with making ZFS good
  - It's kept in sync with the differing Zpool versions
  - It's documented
  - When the system *is* patched, any changes the patch brings are 
synced with the recovery mechanism
  - Being integrated, it has options that can be persistently set if 
required
  - It's there when you actually need it
  - It could be integrated with Solaris FMA to take some funky actions 
based on the nature of the failure, including cool messages telling you 
what you need to run to attempt a repair etc
  - It's integrated (recursive, self fulfilling benefit... ;)

As for the separate utility for different failure modes, I agree, 
*development* of these might be faster if everyone chases their own pet 
failure mode and contributes it, but I still think getting them 
integrated either as optional actions on error, or as part of zdb or 
other would be far better than having to go looking for the utility and 
'give it a whirl'.

But - I'm sure that's a personal preference, and I'm sure that there are 
those that would love the opportunity to roll their own.

OK - I'm going to shutup now. I think I have done this to death, and I 
don't want to end up in everyone's kill filter.

Cheers!

Nathan.



Bob Friesenhahn wrote:
 On Tue, 4 Mar 2008, Nathan Kroenert wrote:
 The circus trick can be handled via a user-contributed utility.  In fact, 
 people can compete with their various repair utilities.  There are only 
 1048576 1-bit permuations to try, and then the various two-bit permutations 
 can be tried.
 That does not sound 'easy', and I consider that ZFS should be... :) and IMO 
 it's something that should really be built in, not attacked with an addon.
 
 There are several reasons why this sort of thing should not be in ZFS 
 itself.  A big reason is that if it is in ZFS itself, it can only be 
 updated via an OS patch or upgrade, along with a required reboot.  If 
 it is in a utility, it can be downloaded and used as the user sees fit 
 without any additional disruption to the system.  While some errors 
 are random, others follow well defined patterns, so it may be that one 
 utility is better than another or that user-provided options can help 
 achieve success faster.
 
 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS vs. Novell NSS

2008-02-28 Thread Nathan Kroenert - Server ESG

Hm -

Based on this detail from the page:

Change lever for switching between Rotation
   + Hammering , Neutral and Hammering only

I'd hope it could still hammer... Though I'd suspect the size of nails 
it would hammer would be somewhat limited... ;)

Nathan.

Boyd Adamson wrote:
 Richard Elling [EMAIL PROTECTED] writes:
 Tim wrote:
 The greatest hammer in the world will be inferior to a drill when 
 driving a screw :)

 The greatest hammer in the world is a rotary hammer, and it
 works quite well for driving screws or digging through degenerate
 granite ;-)  Need a better analogy.
 Here's what I use (quite often) on the ranch:
 http://www.hitachi-koki.com/powertools/products/hammer/dh40mr/dh40mr.html
 
 Hasn't the greatest hammer in the world lost the ability to drive
 nails? 
 
 I'll have to start belting them in with the handle of a screwdriver...
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Nathan Kroenert

Are you indicating that the filesystem know's or should know what an 
application is doing??

It seems to me that to achieve what you are suggesting, that's exactly 
what it would take.

Or, you are assuming that there are no co-dependent files in 
applications that are out there...

Whichever the case, I'm confused...!

Unless you are perhaps suggesting perhaps an IOCTL that an application 
could call to indicate I'm done for this round, please snapshot or 
something to that effect. Even then, I'm still confused as to how I 
would do anything much useful with this over and above, say, 1 minute 
snapshots.

Nathan.


Uwe Dippel wrote:
 atomic view?
 
 Your post was on the gory details on how ZFS writes. Atomic View here is, 
 that 'save' of a file is an 'atomic' operation: at one moment in time you 
 click 'save', and some other moment in time it is done. It means indivisible, 
 and from the perspective of the user this is how it ought to look.
 
 The rub is this: how do you know when a file edit/modify has completed?
 
 Not to me, I'm sorry, this is task of the engineer, the implementer. (See 
 'atomic', as above.)
 It would be a shame if a file system never knew if the operation was 
 completed.
 
 If an application has many files then an edit/modify may include
 updates and/or removals of more than one file. So once again: how do
 you know when an edit/modify has completed?
 
 So an 'edit' fires off a few child processes to do this and that and then you 
 forget about them, hoping for them to do a proper job. 
 Oh, this gives me confidence ;)
 
 No, seriously, let's look at some applications:
 
 A. User works in Office (Star-Office, sure!) and clicks 'Save' for a current 
 work before making major modifications. So the last state of the document 
 (odt) is being stored. Currently we can set some Backup option to be done 
 regularly. Meaning that the backup could have happened at the very wrong 
 moment; while saving the state on each user request 'Save' is much better.
 
 B. A bunch of e-mails are read from the Inbox and stored locally (think 
 Maildir). The user sees the sender, doesn't know her, and deletes all of 
 them. Of course, the deletion process will have fired up the CDP-engine 
 ('event') and retire the files instead of deletion. So when the sender calls, 
 and the user learns that he made a big mistake, he can roll back to before 
 the deletion (event).
 
 C. (Sticking with /home/) I agree with you, that the rather continuous 
 changes of the dot-files and dot-directories in the users HOME that serve 
 JDS, and many more, do eventually not necessarily allow to reconstitute a 
 valid state of the settings at all and any moment. Still, chances are high, 
 that they will. In the worst case, the unlucky user can roll back to when he 
 last took a break, if only for grabbing another coffee, because it took a 
 minute, the writes (see above) will hopefully have completed. oh, s***, 
 already messed up the settings? Then try to roll back to lunch break. Works? 
 Okay! But when you roll back to lunch break, where is the stuff done in 
 between? The backup solution means that they are lost. The event-driven (CDP) 
 not: you can roll over all the states of files or directories between lunch 
 break and recover the third latest version of your tendering document (see 
 above), within the settings of the desktop that were valid this morning.
 
 Maybe SUN can't do this, but wait for Apple, and OSX10-dot-something (using 
 ZFS as default!) will know how to do it. (And they probably also know, when 
 their 'writes' are done.)
 
 Uwe
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Nathan Kroenert

It occurred to me that we are likely missing the point here because Uwe 
is thinking of this as a One User on a System sort of perspective, 
whereas most of the rest of us are thinking of it from a 'Solaris' 
perspective, where we are typically expecting the system to be running 
many applications / DB's / users all at the same time.

In Uwe's use cases thus far, it seems that he is interested in only the 
simple single user style applications, if I'm not mistaken, so he's not 
considering the consequences of what it *really* means to have CDP in 
the way he wishes.

Uwe - am I close here?

Nathan.


Nicolas Williams wrote:
 On Tue, Feb 26, 2008 at 06:34:04PM -0800, Uwe Dippel wrote:
 The rub is this: how do you know when a file edit/modify has completed?
 Not to me, I'm sorry, this is task of the engineer, the implementer.
 (See 'atomic', as above.) It would be a shame if a file system never
 knew if the operation was completed.
 
 The filesystem knows if a filesystem operation completed.  It can't know
 application state.  You keep missing that.
 
 If an application has many files then an edit/modify may include
 updates and/or removals of more than one file. So once again: how do
 you know when an edit/modify has completed?
 So an 'edit' fires off a few child processes to do this and that and
 then you forget about them, hoping for them to do a proper job.  Oh,
 this gives me confidence ;)
 
 You'd rather the filesystem guess application state than have the app
 tell it?  Weird.  Your other alternative -- saving a history of every
 write -- doesn't work because you can't tell what point in time is safe
 to restore to.
 
 No, seriously, let's look at some applications:

 A. User works in Office (Star-Office, sure!) and clicks 'Save' for a
 current work before making major modifications. So the last state of
 the document (odt) is being stored. Currently we can set some Backup
 option to be done regularly. Meaning that the backup could have
 happened at the very wrong moment; while saving the state on each user
 request 'Save' is much better.
 
 So modify the office suite to call a new syscall that says I'm
 internally consistent in all these files and boom, the filesystem can
 now take a useful snapshot.
 
 B. A bunch of e-mails are read from the Inbox and stored locally
 (think Maildir). The user sees the sender, doesn't know her, and
 deletes all of them. Of course, the deletion process will have fired
 up the CDP-engine ('event') and retire the files instead of deletion.
 So when the sender calls, and the user learns that he made a big
 mistake, he can roll back to before the deletion (event).
 
 Now think of an application like this but which uses, say, SQLite (e.g.,
 Firefox 3.x, Thunderbird, ...).  The app might never close the database
 file, just fsync() once in a while.  The DB might have multiple files
 (in the SQLite case that might be multiple DBs ATTACHed into one
 database connection).  Also, an fsync of a SQLite journal file is not
 as useful to CDP as an fsync() of a SQLite DB proper.  Now add any of a
 large number of databases and apps to the mix and forget it -- the
 heuristics become impossible or mostly useless.
 
 C. (Sticking with /home/) I agree with you, that the rather continuous
 changes of the dot-files and dot-directories in the users HOME that
 serve JDS, and many more, do eventually not necessarily allow to
 reconstitute a valid state of the settings at all and any moment.
 Still, chances are high, that they will. In the worst case, the
 
 Chances?  So what, we tell the user try restoring to this snapshot,
 login again and if stuff doesn't work, then try another snapshot?  What
 if the user discovers too late that the selected snapshot was
 inconsistent and by then they've made other changes?
 
 unlucky user can roll back to when he last took a break, if only for
 grabbing another coffee, because it took a minute, the writes (see
 
 That sounds mighty painful.
 
 I'd rather modify some high-profile apps to tell the filesystem that
 their state is consistent, so take a snapshot.
 
 Maybe SUN can't do this, but wait for Apple, and OSX10-dot-something
 (using ZFS as default!) will know how to do it. (And they probably
 also know, when their 'writes' are done.)
 
 I'm giving you the best answer -- modify the apps -- and you reject it.
 Given how many important apps Apple controls it wouldn't surprise me if
 they did what I suggest.  We should do it too.  But one step at a time.
 We need to setup a project, gather requirements, design a solution, ...
 And since the solution will almost certainly entail modifications to
 apps where heuristics won't help, well, I think this would be a project
 with fairly wide scope, which means it likely won't go fast.
 
 Nico
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cause for data corruption?

2008-02-25 Thread Nathan Kroenert

My guess is that you have some defective hardware in the system that's 
causing bit flips in the checksum or the data payload.

I'd suggest running some sort of system diagnostics for a few hours to 
see if you can locate the bad piece of hardware.

My suspicion would be your memory or CPU, but that's just a wild guess, 
based on the number of errors you have and the number of devices it's 
spread over.

Could it be that you have been corrupting data for some time and now 
known it?

Oh - And i'd also look around based on your disk controller and ensure 
that there are no newer patches for it, just in case it's one for which 
there was a known problem. (which was worked around in the driver)

I *think* there was an issue with at least one or two...

Cheers!

Nathan.

Sandro wrote:
 hi folks
 
 I've been running my fileserver at home with linux for a couple of years and 
 last week I finally reinstalled it with solaris 10 u4.
 
 I borrowed a bunch of disks from a friend, copied over all the files, 
 reinstalled my fileserver and copied the data back.
 
 Everything went fine, but after a few days now, quite a lot of files got 
 corrupted.
 here's the output:
 
  # zpool status data
   pool: data
  state: ONLINE
 status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
 action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: scrub completed with 422 errors on Mon Feb 25 00:32:18 2008
 config:
 
 NAMESTATE READ WRITE CKSUM
 dataONLINE   0 0 5.52K
   raidz1ONLINE   0 0 5.52K
 c0t0d0  ONLINE   0 0 10.72
 c0t1d0  ONLINE   0 0 4.59K
 c0t2d0  ONLINE   0 0 5.18K
 c0t3d0  ONLINE   0 0 9.10K
 c1t0d0  ONLINE   0 0 7.64K
 c1t1d0  ONLINE   0 0 3.75K
 c1t2d0  ONLINE   0 0 4.39K
 c1t3d0  ONLINE   0 0 6.04K
 
 errors: 388 data errors, use '-v' for a list
 
 Last night I found out about this, it told me there were errors in like 50 
 files.
 So I scrubbed the whole pool and it found a lot more corrupted files.
 
 The temporary system which I used to hold the data while I'm installing 
 solaris on my fileserver is running nv build 80 and no errors on there.
 
 What could be the cause of these errors??
 I don't see any hw errors on my disks..
 
  # iostat -En | grep -i error
 c3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c0t0d0   Soft Errors: 574 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c1t0d0   Soft Errors: 549 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c0t1d0   Soft Errors: 14 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c0t2d0   Soft Errors: 549 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c0t3d0   Soft Errors: 549 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c1t1d0   Soft Errors: 548 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c1t2d0   Soft Errors: 14 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c1t3d0   Soft Errors: 548 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 
 although a lot of soft errors.
 Linux said that one disk had gone bad, but I figured the sata cable was 
 somehow broken, so I replaced that before installing solaris. And solaris 
 didn't and doesn't see any actual hw errors on the disks, does it?
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-25 Thread Nathan Kroenert

And would drive storage requirements through the roof!!

I like it!

;)

Nathan.

Jonathan Loran wrote:
 
 David Magda wrote:
 On Feb 24, 2008, at 01:49, Jonathan Loran wrote:

 In some circles, CDP is big business. It would be a great ZFS offering.
 ZFS doesn't have it built-in, but AVS made be an option in some cases:

 http://opensolaris.org/os/project/avs/
 
 Point in time copy (as AVS offers) is not the same thing as CDP.  When 
 you snapshot data as in point in time copies, you predict the future, 
 knowing the time slice at which your data will be needed.  Continuous 
 data protection is based on the premise that you don't have a clue ahead 
 of time which point in time you want to recover to.  Essentially, for 
 CDP, you need to save every storage block that has ever been written, so 
 you can put them back in place if you so desire. 
 
 Anyone else on the list think it is worthwhile adding CDP to the ZFS 
 list of capabilities?  It causes space management issues, but it's an 
 interesting, useful idea.
 
 Jon
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes

2008-02-15 Thread Nathan Kroenert

What about new blocks written to an existing file?

Perhaps we could make that clearer in the manpage too...

hm.


Mattias Pantzare wrote:
  
   If you created them after, then no worries, but if I understand
   correctly, if the *file* was created with 128K recordsize, then it'll
   keep that forever...


 Files have nothing to do with it.  The recordsize is a file system
  parameter.  It gets a little more complicated because the recordsize
  is actually the maximum recordsize, not the minimum.
 
 Please read the manpage:
 
  Changing the file system's recordsize only affects files
  created afterward; existing files are unaffected.
 
 Nothing is rewritten in the file system when you change recordsize so
 is stays the same for existing files.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes

2008-02-15 Thread Nathan Kroenert

Hey, Richard -

I'm confused now.

My understanding was that any files created after the recordsize was set 
would use that as the new maximum recordsize, but files already created 
would continue to use the old recordsize.

Though I'm now a little hazy on what will happen when the new existing 
files are updated as well...

hm.

Cheers!

Nathan.

Richard Elling wrote:
 Nathan Kroenert wrote:
 And something I was told only recently - It makes a difference if you 
 created the file *before* you set the recordsize property.
 
 Actually, it has always been true for RAID-0, RAID-5, RAID-6.
 If your I/O strides over two sets then you end up doing more I/O,
 perhaps twice as much.
 

 If you created them after, then no worries, but if I understand 
 correctly, if the *file* was created with 128K recordsize, then it'll 
 keep that forever...
 
 Files have nothing to do with it.  The recordsize is a file system
 parameter.  It gets a little more complicated because the recordsize
 is actually the maximum recordsize, not the minimum.
 -- richard
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes

2008-02-14 Thread Nathan Kroenert

And something I was told only recently - It makes a difference if you 
created the file *before* you set the recordsize property.

If you created them after, then no worries, but if I understand 
correctly, if the *file* was created with 128K recordsize, then it'll 
keep that forever...

Assuming I understand correctly.

Hopefully someone else on the list will be able to confirm.

Cheers!

Nathan.

Richard Elling wrote:
 Anton B. Rang wrote:
 Create a pool [ ... ]
 Write a 100GB file to the filesystem [ ... ]
 Run I/O against that file, doing 100% random writes with an 8K block size.
 
 Did you set the record size of the filesystem to 8K?

 If not, each 8K write will first read 128K, then write 128K.
   
 
 Also check to see that your 8kByte random writes are aligned on 8kByte
 boundaries, otherwise you'll be doing a read-modify-write.
  -- richard
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.

2008-02-06 Thread Nathan Kroenert

Hey all -

I'm working on an interesting issue where I'm seeing ZFS being quite 
cranky about writing O_SYNC written blocks.

Bottom line is that I have a small test case that does essentially this:

open file for writing  -- O_SYNC
loop(
write() 8KB of random data
print time taken to write data
}

It's taking anywhere up to 80 seconds per 8KB block. When the 'problem' 
is not in evidence, (and it's not always happening), I can do around 
1200 O_SYNC writes per second...

It seems to be waiting here virtually all of the time:

  0t11021::pid2proc | ::print proc_t p_tlist|::findstack -v
stack pointer for thread 30171352960: 2a118052df1
[ 02a118052df1 cv_wait+0x38() ]
   02a118052ea1 zil_commit+0x44(1, 6b50516, 193, 60005db66bc, 6b50570,
   60005db6640)
   02a118052f51 zfs_write+0x554(0, 14000, 2a1180539e8, 6000af22840, 
2000,
   2a1180539d8)
   02a118053071 fop_write+0x20(304898cd100, 2a1180539d8, 10, 
300a27a9e48, 0,
   7b7462d0)
   02a118053121 write+0x268(4, 8058, 60051a3d738, 2000, 113, 1)
   02a118053221 dtrace_systrace_syscall32+0xac(4, ffbfdaf0, 2000, 21e80,
   ff3a00c0, ff3a0100)
   02a1180532e1 syscall_trap32+0xcc(4, ffbfdaf0, 2000, 21e80, ff3a00c0,
   ff3a0100)

And this also evident in a dtrace of it, following the write in...

...
  28- zil_commit
  28  - cv_wait
  28- thread_lock
  28- thread_lock
  28- cv_block
  28  - ts_sleep
  28  - ts_sleep
  28  - new_mstate
  28- cpu_update_pct
  28  - cpu_grow
  28- cpu_decay
  28  - exp_x
  28  - exp_x
  28- cpu_decay
  28  - cpu_grow
  28- cpu_update_pct
  28  - new_mstate
  28  - disp_lock_enter_high
  28  - disp_lock_enter_high
  28  - disp_lock_exit_high
  28  - disp_lock_exit_high
  28- cv_block
  28- sleepq_insert
  28- sleepq_insert
  28- disp_lock_exit_nopreempt
  28- disp_lock_exit_nopreempt
  28- swtch
  28  - disp
  28- disp_lock_enter
  28- disp_lock_enter
  28- disp_lock_exit
  28- disp_lock_exit
  28- disp_getwork
  28- disp_getwork
  28- restore_mstate
  28- restore_mstate
  28  - disp
  28  - pg_cmt_load
  28  - pg_cmt_load
  28- swtch
  28- resume
  28  - savectx
  28- schedctl_save
  28- schedctl_save
  28  - savectx
...

At this point, it waits for up to 80 seconds.

I'm also seeing zil_commit() being called around 7-15 times per second.

For kicks, I disabled the ZIL: zil_disable/W0t1, and that made not a 
pinch of difference. :)

For what it's worth, this is a T2000, running Oracle, connected to an 
HDS 9990 (using 2GB fibre), with 8KB record sizes for the oracle 
filesystems, and I'm only seeing the issue on the ZFS filesystems that 
have the active oracle tables on them.

The O_SYNC test case is just trying to help me understand what's 
happening. The *real* problem is that oracle is running like rubbish 
when it's trying to roll forward archive logs from another server. It's 
an almost 100% write workload. At the moment, it cannot even keep up 
with the other server's log creation rate, and it's barely doing 
anything. (The other box is quite different, so not really valid for 
direct comparison at this point).

6513020 looked interesting for a while, but I already have 120011-14 and 
127111-03 and installed.

I'm looking into the cache flush settings of the 9990 array to see if 
it's that killing me, but I'm also looking for any other ideas on what 
might be hurting me.

I also have set
zfs:zfs_nocacheflush = 1
in /etc/system

The Oracle Logs are on a separate Zpool and I'm not seeing the issue on 
those filesystems.

The lockstats I have run are not yet all that interesting. If anyone has 
ideas on specific incantations I should use or some specific D or 
anything else, I'd be most appreciative.

Cheers!

Nathan.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun 5220 as a ZFS Server?

2008-02-05 Thread Nathan Kroenert

For what it's worth, I configured a T5220 this week with a 6 disk, three 
mirror zpool. (three top level mirror vdevs...).

Used only internal disks...

When pushing to disk, I was seeing bursts of 70 odd MB/s per spindle, 
with all 6 spindles making the 70MB/s, so 350MB/s ish.

Read performance was about the same for large files. (did not do 
anything with small files, though I expect that with the 2.5 SAS disks, 
it should be pretty good...).

I was not seeing a consistent 70MB/s per spindle, which I put down the 
the fact that I was only using a single thread to generate the writes. 
(A single thread of an N2 is only so fast... Just think of what you 
could do with 64 of them ;)

I'll be interested to see what the others have to say. :)

Hope this helps.

Nathan.





Michael Stalnaker wrote:
 We’re looking at building out sever ZFS servers, and are considering an 
 x86 platform vs a Sun 5520 as the base platform. Any comments from the 
 floor on comparative performance as a ZFS server? We’d be using the LSI 
 3801 controllers in either case.
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 30 seond hang, ls command....

2008-01-30 Thread Nathan Kroenert

Any chance the disks are being powered down, and you are waiting for 
them to power back up?

Nathan. :)

Neal Pollack wrote:
 I'm running Nevada build 81 on x86 on an Ultra 40.
 # uname -a
 SunOS zbit 5.11 snv_81 i86pc i386 i86pc
 Memory size: 8191 Megabytes
 
 I started with this zfs pool many dozens of builds ago, approx a year ago.
 I do live upgrade and zfs upgrade every few builds.
 
 When I have not accessed the zfs file systems for a long time,
 if I cd there and do an ls command, nothing happens for approx 30 seconds.
 
 Any clues how I would find out what is wrong?
 
 --
 
 # zpool status -v
   pool: tank
  state: ONLINE
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 tankONLINE   0 0 0
   raidz2ONLINE   0 0 0
 c2d0ONLINE   0 0 0
 c3d0ONLINE   0 0 0
 c4d0ONLINE   0 0 0
 c5d0ONLINE   0 0 0
 c6d0ONLINE   0 0 0
 c7d0ONLINE   0 0 0
 c8d0ONLINE   0 0 0
 
 errors: No known data errors
 
 
 # zfs list
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 tank   172G  2.04T  52.3K  /tank
 tank/arc   172G  2.04T   172G  /zfs/arc
 
 # zpool list
 NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 tank  3.16T   242G  2.92T 7%  ONLINE  -
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] hardware for zfs home storage

2008-01-14 Thread Nathan Kroenert

I see a business opportunity for someone...

Backups for the masses... of Unix / VMS and other OS/s out there.

any takers? :)

Nathan.

Jonathan Loran wrote:
 
 
 eric kustarz wrote:
 On Jan 14, 2008, at 11:08 AM, Tim Cook wrote:

   
 www.mozy.com appears to have unlimited backups for 4.95 a month.   
 Hard to beat that.  And they're owned by EMC now so you know they  
 aren't going anywhere anytime soon.
 

 I just signed on and am trying Mozy out.  Note, its $5 per computer  
 and its *not* archival.  If you delete something on your computer,  
 then 30 days later it is not going to be backed up anymore.

 eric
   
 
 And they don't support Solaris or Linux, so that means I would have to 
 transfer everything indirectly from my Mac.  Or worse yet, run windoz in 
 a VM.  Hardly practical.  Why is it we always have to be second class 
 citizens!  Power to the (*x) people!
 
 Jon
 
 -- 
 
 
 - _/ _/  /   - Jonathan Loran -   -
 -/  /   /IT Manager   -
 -  _  /   _  / / Space Sciences Laboratory, UC Berkeley
 -/  / /  (510) 643-5146 [EMAIL PROTECTED]
 - __/__/__/   AST:7731^29u18e3
  
 
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Clearing partition/label info

2007-12-17 Thread Nathan Kroenert

format -e

then from there, re-label using SMI label, versus EFI.

Cheers

Al Slater wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Hi,
 
 What is the quickest way of clearing the label information on a disk
 that has been previously used in a zpool?
 
 regards
 
 - --
 Al Slater
 
 Technical Director
 SCL
 
 Phone : +44 (0)1273 07
 Fax   : +44 (0)1273 01
 email : [EMAIL PROTECTED]
 
 Stanton Consultancy Ltd
 Pavilion House, 6-7 Old Steine, Brighton, East Sussex, BN1 1EJ
 Registered in England Company number: 1957652 VAT number: GB 760 2433 55
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (MingW32)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
 iD8DBQFHZoluz4fTOFL/EDYRAnr5AJ4ie+xFNCi6gA5HLZ8IqI1wHItEEwCgj0ru
 EwSc9B16io3kBz2wS0LGoEQ=
 =eaZc
 -END PGP SIGNATURE-
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS + DB + fragments

2007-11-13 Thread Nathan Kroenert

This question triggered some silly questions in my mind:

Lots of folks are determined that the whole COW to different locations 
are a Bad Thing(tm), and in some cases, I guess it might actually be...

What if ZFS had a pool / filesystem property that caused zfs to do a 
journaled, but non-COW update so the data's relative location for 
databases is always the same?

Or - What if it did a double update: One to a staged area, and another 
immediately after that to the 'old' data blocks. Still always have 
on-disk consistency etc, at a cost of double the I/O's...

Of course, both of these would require non-sparse file creation for the 
DB etc, but would it be plausible?

For very read intensive and position sensitive applications, I guess 
this sort of capability might make a difference?

Just some stabs in the dark...

Cheers!

Nathan.


Louwtjie Burger wrote:
 Hi
 
 After a clean database load a database would (should?) look like this,
 if a random stab at the data is taken...
 
 [8KB-m][8KB-n][8KB-o][8KB-p]...
 
 The data should be fairly (100%) sequential in layout ... after some
 days though that same spot (using ZFS) would problably look like:
 
 [8KB-m][   ][8KB-o][   ]
 
 Is this pseudo logical-physical view correct (if blocks n and p was
 updated and with COW relocated somewhere else)?
 
 Could a utility be constructed to show the level of fragmentation ?
 (50% in above example)
 
 IF the above theory is flawed... how would fragmentation look/be
 observed/calculated under ZFS with large Oracle tablespaces?
 
 Does it even matter what the fragmentation is from a performance 
 perspective?
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS very slow under xVM

2007-11-01 Thread Nathan Kroenert

I observed something like this a while ago, but assumed it was something 
I did. (It usually is... ;)

Tell me - If you watch with an iostat -x 1, do you see bursts of I/O 
then periods of nothing, or just a slow stream of data?

I was seeing intermittent stoppages in I/O, with bursts of data on 
occasion...

Maybe it's not just me... Unfortunately, I'm still running old nv and 
xen bits, so I can't speak to the 'current' situation...

Cheers.

Nathan.

Martin wrote:
 Hello
 
 I've got Solaris Express Community Edition build 75 (75a) installed on an 
 Asus P5K-E/WiFI-AP (ip35/ICH9R based) board.  CPU=Q6700, RAM=8Gb, 
 disk=Samsung HD501LJ and (older) Maxtor 6H500F0.
 
 When the O/S is running on bare metal, ie no xVM/Xen hypervisor, then 
 everything is fine.
 
 When it's booted up running xVM and the hypervisor, then unlike plain disk 
 I/O, and unlike svm volumes, zfs is around 20 time slower.
 
 Specifically, with either a plain ufs on a raw/block disk device, or ufs on a 
 svn meta device, a command such as dd if=/dev/zero of=2g.5ish.dat bs=16k 
 count=15 takes less than a minute, with an I/O rate of around 30-50Mb/s.
 
 Similary, when running on bare metal, output to a zfs volume, as reported by 
 zpool iostat, shows a similar high output rate. (also takes less than a 
 minute to complete).
 
 But, when running under xVM and a hypervisor, although the ufs rates are 
 still good, the zfs rate drops after around 500Mb.
 
 For instance, if a window is left running zpool iostat 1 1000, then after the 
 dd command above has been run, there are about 7 lines showing a rate of 
 70Mbs, then the rate drops to around 2.5Mb/s until the entire file is 
 written.  Since the dd command initially completes and returns control back 
 to the shell in around 5 seconds, the 2 gig of data is cached and is being 
 written out.  It's similar with either the Samsung or Maxtor disks (though 
 the Samsung are slightly faster).
 
 Although previous releases running on bare metal (with xVM/Xen) have been 
 fine, the same problem exists with the earlier b66-0624-xen drop of Open 
 Solaris
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] characterizing I/O on a per zvol basis.

2007-10-17 Thread Nathan Kroenert

Hey all -

Time for my silly question of the day, and before I bust out vi and 
dtrace...

If there a simple, existing way I can observe the read / write / IOPS on 
a per-zvol basis?

If not, is there interest in having one?

Cheers!

Nathan.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread Nathan Kroenert

I think it's a little more sinister than that...

I'm only just trying to import the pool. Not even yet doing any I/O to it...

Perhaps it's the same cause, I don't know...

But I'm certainly not convinced that I'd be happy with a 25K, for 
example, panicing just because I tried to import a dud pool...

I'm ok(ish) with the panic on a failed write to a non-redundant storage. 
I expect it by now...

Cheers!

Nathan.

Victor Engle wrote:
 Wouldn't this be the known feature where a write error to zfs forces a panic?
 
 Vic
 
 
 
 On 10/4/07, Ben Rockwood [EMAIL PROTECTED] wrote:
 Dick Davies wrote:
 On 04/10/2007, Nathan Kroenert [EMAIL PROTECTED] wrote:


 Client A
   - import pool make couple-o-changes

 Client B
   - import pool -f  (heh)


 Oct  4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ff0002b51c80:
 Oct  4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion
 failed: dmu_read(os, smo-smo_object, offset, size, entry_map) == 0 (0x5
 == 0x0)
 , file: ../../common/fs/zfs/space_map.c, line: 339
 Oct  4 15:03:12 fozzie unix: [ID 10 kern.notice]
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51160
 genunix:assfail3+b9 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51200
 zfs:space_map_load+2ef ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51240
 zfs:metaslab_activate+66 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51300
 zfs:metaslab_group_alloc+24e ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b513d0
 zfs:metaslab_alloc_dva+192 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51470
 zfs:metaslab_alloc+82 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b514c0
 zfs:zio_dva_allocate+68 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b514e0
 zfs:zio_next_stage+b3 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51510
 zfs:zio_checksum_generate+6e ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51530
 zfs:zio_next_stage+b3 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b515a0
 zfs:zio_write_compress+239 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b515c0
 zfs:zio_next_stage+b3 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51610
 zfs:zio_wait_for_children+5d ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51630
 zfs:zio_wait_children_ready+20 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51650
 zfs:zio_next_stage_async+bb ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51670
 zfs:zio_nowait+11 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51960
 zfs:dbuf_sync_leaf+1ac ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b519a0
 zfs:dbuf_sync_list+51 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51a10
 zfs:dnode_sync+23b ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51a50
 zfs:dmu_objset_sync_dnodes+55 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51ad0
 zfs:dmu_objset_sync+13d ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51b40
 zfs:dsl_pool_sync+199 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51bd0
 zfs:spa_sync+1c5 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51c60
 zfs:txg_sync_thread+19a ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51c70
 unix:thread_start+8 ()
 Oct  4 15:03:12 fozzie unix: [ID 10 kern.notice]


 Is this a known issue, already fixed in a later build, or should I bug it?

 It shouldn't panic the machine, no. I'd raise a bug.


 After spending a little time playing with iscsi, I have to say it's
 almost inevitable that someone is going to do this by accident and panic
 a big box for what I see as no good reason. (though I'm happy to be
 educated... ;)

 You use ACLs and TPGT groups to ensure 2 hosts can't simultaneously
 access the same LUN by accident. You'd have the same problem with
 Fibre Channel SANs.

 I ran into similar problems when replicating via AVS.

 benr.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread Nathan Kroenert

Erik -

Thanks for that, but I know the pool is corrupted - That was kind if the 
point of the exercise.

The bug (at least to me) is ZFS panicing Solaris just trying to import 
the dud pool.

But, maybe I'm missing your point?

Nathan.




eric kustarz wrote:

 Client A
   - import pool make couple-o-changes

 Client B
   - import pool -f  (heh)

 Client A + B - With both mounting the same pool, touched a couple of
 files, and removed a couple of files from each client

 Client A + B - zpool export

 Client A - Attempted import and dropped the panic.

 
 ZFS is not a clustered file system.  It cannot handle multiple readers 
 (or multiple writers).  By importing the pool on multiple machines, you 
 have corrupted the pool.
 
 You purposely did that by adding the '-f' option to 'zpool import'.  
 Without the '-f' option ZFS would have told you that its already 
 imported on another machine (A).
 
 There is no bug here (besides admin error :)  ).
 
 eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread Nathan Kroenert

Awesome.

Thanks, Eric. :)

This type of feature / fix is quite important to a number of the guys in 
the our local OSUG. In particular, they are adamant that they cannot use 
ZFS in production until it stops panicing the whole box for isolated 
filesystem / zpool failures.

This will be a big step. :)

Cheers.

Nathan.

Eric Schrock wrote:
 On Fri, Oct 05, 2007 at 08:20:13AM +1000, Nathan Kroenert wrote:
 Erik -

 Thanks for that, but I know the pool is corrupted - That was kind if the 
 point of the exercise.

 The bug (at least to me) is ZFS panicing Solaris just trying to import 
 the dud pool.

 But, maybe I'm missing your point?

 Nathan.
 
 This a variation on the read error while writing problem.  It is a
 known issue and a generic solution (to handle any kind of non-replicated
 writes failing) is in the works (see PSARC 2007/567).
 
 - Eric
 
 --
 Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-03 Thread Nathan Kroenert

Some people are just dumb. Take me, for instance... :)

Was just looking into ZFS on iscsi and doing some painful and unnatural 
things to my boxes and dropped a panic I was not expecting.

Here is what I did.

Server: (S10_u4 sparc)
  - zpool create usb /dev/dsk/c4t0d0s0
 (on a 4gb USB stick, if it matters)
  - zfs create -s -V 200mb usb/is0
  - zfs set shareiscsi=on usb/is0

On Client A (nv_72 amd64)
  - iscsiadm stuff to enable sendtarget and set discovery-address to the 
server above
  - svcadm enable iscsiinitator
  - zpool create server_usb iscsi_target_created_above
  - created a few files
  - exported pool

On Client B (nv_65 amd64 xen dom0)
  - iscsiadm stuff and enable service and import pool - import failed 
due to newer pool version... dang.
  - re-create pool
  - create some other files and stuff
  - export pool

Client A
  - import pool make couple-o-changes

Client B
  - import pool -f  (heh)

Client A + B - With both mounting the same pool, touched a couple of 
files, and removed a couple of files from each client

Client A + B - zpool export

Client A - Attempted import and dropped the panic.

Oct  4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ff0002b51c80:
Oct  4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion 
failed: dmu_read(os, smo-smo_object, offset, size, entry_map) == 0 (0x5 
== 0x0)
, file: ../../common/fs/zfs/space_map.c, line: 339
Oct  4 15:03:12 fozzie unix: [ID 10 kern.notice]
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51160 
genunix:assfail3+b9 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51200 
zfs:space_map_load+2ef ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51240 
zfs:metaslab_activate+66 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51300 
zfs:metaslab_group_alloc+24e ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b513d0 
zfs:metaslab_alloc_dva+192 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51470 
zfs:metaslab_alloc+82 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b514c0 
zfs:zio_dva_allocate+68 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b514e0 
zfs:zio_next_stage+b3 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51510 
zfs:zio_checksum_generate+6e ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51530 
zfs:zio_next_stage+b3 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b515a0 
zfs:zio_write_compress+239 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b515c0 
zfs:zio_next_stage+b3 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51610 
zfs:zio_wait_for_children+5d ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51630 
zfs:zio_wait_children_ready+20 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51650 
zfs:zio_next_stage_async+bb ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51670 
zfs:zio_nowait+11 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51960 
zfs:dbuf_sync_leaf+1ac ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b519a0 
zfs:dbuf_sync_list+51 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51a10 
zfs:dnode_sync+23b ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51a50 
zfs:dmu_objset_sync_dnodes+55 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51ad0 
zfs:dmu_objset_sync+13d ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51b40 
zfs:dsl_pool_sync+199 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51bd0 
zfs:spa_sync+1c5 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51c60 
zfs:txg_sync_thread+19a ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51c70 
unix:thread_start+8 ()
Oct  4 15:03:12 fozzie unix: [ID 10 kern.notice]

Yep - Sure I did some boneheaded things here (grin) and deserved a good 
kick in the groin, however, should I panic a whole box just because I 
have attempted to import a dud pool??

Without re-creating the pool, I can now panic the system reliably just 
through attempting to import the pool

I was a little surprised, as I would have though that there should have 
been no chance for really nasty things to have happened at a systemwide 
level, and we should have just bailed on the mount / import.

I see a few bugs that were closeish to this, but not a great match...

Is this a known issue, already fixed in a later build, or should I bug it?

After spending a little time playing with iscsi, I have to say it's 
almost inevitable that someone is going to do this by accident and panic 
a big box for what I see as no good reason. (though I'm happy to be 
educated... ;)

Oh - and also - Kudos to the ZFS team and the other involved in the 
whole iSCSI thing. So easy

Re: [zfs-discuss] pool is full and cant delete files

2007-09-09 Thread Nathan Kroenert

And if there is a rubbish file somewhere, I *think* you should be able 
to cat /dev/null  thatfile

Which would free up it's blocks.

Assuming you don't have snapshots... ;)

Nathan.

Anton B. Rang wrote:
 At least three alternatives --
 
 1. If you don't have the latest patches installed, apply them.  There have 
 been bugs in this area which have been fixed.
 
 2. If you still can't remove files with the latest patches, and you have a 
 service contract with Sun, open a service request to get help.
 
 3. Add a new device (or RAID group) to the pool, which will give you free 
 space again.  You can then delete files, at the cost of having your pool 
 larger, since you can't remove it again.
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] NV_65 AMD64 - ZFS seems to write fast and slow to a single spindle

2007-08-26 Thread Nathan Kroenert

Hey all -

Just saw something really weird.

I have been playing with by box for a little while now, and just noticed 
something whilst checking how fast / slow my IDE ports were on a newish 
motherboard...

I had been copying around an image. Not a particularly large one - 500M 
ISO...

I had been observing the read speed off disk, and write speed to disk.

When reading from one disk and writing to another, I was seeing about 
60MB/s and all was as expected.

But, then, I thought I'd do one more run, and copied the *same* image as 
my last run...

Of course, the image was in memory, so I expected there would be no 
reads and lots of writes. What I saw was lots of not very impressive 
speed (cmdk1 is the target):

  extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0
cmdk1 0.0   35.30.0 4514.1 32.4  2.0  975.3 100 100

  extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0
cmdk1 0.0   36.70.0 4697.6 32.4  2.0  936.8 100 100

  extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0
cmdk1 0.0   36.80.0 4650.6 32.4  2.0  935.1 100 100

  extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0 0.00.00.00.0  0.1  0.00.0   0   0
cmdk1 0.0   37.50.0 4424.9 32.3  2.0  913.8 100 100

So my target disk, which is owned exclusively by ZFS, was apparently 
flat out writing at 4.4MB/s.

At the time it goes bad, the svc_t jumps from about 125ms to 950ms.

Ouch!!

On closer inspection, I see that -
  - The cp returns almost immediately. (somewhat expected)
  - ZFS starts writing at about 60MB/s, but only for about 2 seconds
(This is changable. Sometimes, it writes the whole image at the 
slower rate.)
  - the write rate drops back to 4 - 5MB/s
  - CPU usage is only 8%
  - I still have 1.5GB of 4GB free memory (Though I *am* running Xen at 
the moment. Not sure if that matters)
  - If I kick off a second copy to a different filename whilst the first 
is running it does not get any faster.
  - If I kick off a write to a raw zvol on the same pool, the write rate 
to the disk jumps back up to the expected 60MB/s, but drops again as 
soon as it's completed the write to the raw zvol... So, it seems it's 
not the disk itself.
  - The zpool *has* been exported and imported this boot. Not sure if 
that matters either.
  - I had a hunch that memory availability might be playing a part, so I 
forced a whole heap to be freed up with a honking big malloc and walk of 
the pages. I freed up 3GB (box has 4GB total) and it seems that I start 
to see the problem much more frequently as I get to about 1.5GB free.
  - It's not entirely predictable. Sometimes, it'll write at 50-60MB/s 
for up to 8 or so seconds, and others, it'll only write fast for a burst 
right at the start, then take quite some time to write out the rest.

It's almost as if we are being throttled on the rate at which we can 
push data through the ZFS in-memory cache when writing previously read 
and written data. Or something equally bogus like me expecting that ZFS 
would write as fast as it can all the time, which I guess might be an 
invalid assumption?

Now: This is running NV_65 with the Xen bits from back then. Not sure if 
that really matters.

Does not seem that the disk is having problem -
beaker:/disk2/crap # zpool status
   pool: disk2
  state: ONLINE
  scrub: none requested
config:

 NAMESTATE READ WRITE CKSUM
 disk2   ONLINE   0 0 0
   c3d0  ONLINE   0 0 0

errors: No known data errors

   pool: zfs
  state: ONLINE
  scrub: none requested
config:

 NAMESTATE READ WRITE CKSUM
 zfs ONLINE   0 0 0
   c1d0s7ONLINE   0 0 0

errors: No known data errors

c1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: ST3320620AS Revision:  Serial No: 6QF Size: 
320.07GB 320070352896 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: ST3320620AS Revision:  Serial No: 6QF Size: 
320.07GB 320070352896 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c0t1d0   Soft Errors: 4 Hard Errors: 302 Transport Errors: 0

If anyone within SWAN on the ZFS team wanted to take a look at this box 
and see if it's a new bug or just me being a bonehead and not 
understanding what I'm seeing, please respond to me directly, and I can 
provide access. (I'll make an effort not to reboot the box just in case 
it's only this

Re: [zfs-discuss] Is there _any_ suitable motherboard?

2007-08-26 Thread Nathan Kroenert

For what it's worth, I bought a Gigabyte GA-M57SLI-S4 a couple of months 
ago and it rocks on a reasonably current Nevada.

Certainly not the cheapest or most expensive, but I felt a good choice 
for multiple PCI-E slots and a couple of PCI slots.

http://www.gigabyte.com.tw/Products/Motherboard/Products_Spec.aspx?ClassValue=MotherboardProductID=2287ProductName=GA-M57SLI-S4

Everything on it worked a treat for me, and paired with an Nvidia 
7900GS, has handled pretty much whatever I have thrown at it, including 
Second Life on Solaris. :)

On Nevada (at the time I build it, it was NV_65), everything just worked 
straight out of the box. Gig Ethernet, IDE ports, SATA ports (in 
compatability mode), USB, 1394, audio, dual core stuff, the lot.

SATA ports work fine and dandy (up to about 70MB/S per port on the outer 
edge of the disk using Seagate 320GB 16MB cache 7200RPM disks) using the 
IDE emulation. I'm waiting for the build of nevada that provides the 
Nvidia MCP55 SATA controller support for native SATA stuff. Not long 
now... ZFS seems to be able to write down the channel's at about 
80MB/s... (At least on a brand spanking new Zpool. Seems closer to 
60MB/s now...)

Once the Nvidia SATA stuff goes back, you'll have 6 ports of NVidia SATA 
goodness straight off the board. (and even now, you have 6 ports of 
reasonable speedyness in good old ATA mode.)

 From what I can tell, the Nvidia SATA devices hang straight off the 
PCI-E bus, so you might even be able to get 'em all running flat out. 
(Though, I'm basing this on the output of the prtconf, I could be 
completely wrong.) See bottom of post for the prtconf -D output.

I'm also running it as a Solaris Xen Dom0 with other OS/s lurking on top 
of that, so the HVM support also works great.

I have just submitted this board to the HCL for SXDE, and if I get a 
chance, I'll pull the latest S10 and give that a whirl too.

Hope this helps. (and excuse the prtconf being from the Xen boot, rather 
than bare metal... got a bit of stuff happening on the box at the moment 
and did not feel like rebooting. ;)

/root # prtconf -D
System Configuration:  Sun Microsystems  i86pc
Memory size: 3895 Megabytes
System Peripherals (Software Nodes):

i86xpv (driver name: rootnex)
 scsi_vhci, instance #0 (driver name: scsi_vhci)
 isa, instance #0 (driver name: isa)
 fdc, instance #0 (driver name: fdc)
 fd, instance #0 (driver name: fd)
 asy, instance #0 (driver name: asy)
 lp, instance #0 (driver name: ecpp)
 i8042, instance #0 (driver name: i8042)
 keyboard, instance #0 (driver name: kb8042)
 motherboard
 xpvd, instance #0 (driver name: xpvd)
 xencons, instance #0 (driver name: xencons)
 xenbus, instance #0 (driver name: xenbus)
 domcaps, instance #0 (driver name: domcaps)
 balloon, instance #0 (driver name: balloon)
 evtchn, instance #0 (driver name: evtchn)
 privcmd, instance #0 (driver name: privcmd)
 pci, instance #0 (driver name: npe)
 pci1458,5001
 pci1458,c11
 pci1458,c11
 pci1458,c11
 pci1458,5004, instance #0 (driver name: ohci)
 mouse, instance #1 (driver name: hid)
 pci1458,5004, instance #0 (driver name: ehci)
 pci-ide, instance #0 (driver name: pci-ide)
 ide, instance #0 (driver name: ata)
 sd, instance #1 (driver name: sd)
 sd, instance #0 (driver name: sd)
 ide (driver name: ata)
 pci-ide, instance #1 (driver name: pci-ide)
 ide, instance #2 (driver name: ata)
 cmdk, instance #0 (driver name: cmdk)
 ide, instance #3 (driver name: ata)
 cmdk, instance #1 (driver name: cmdk)
 pci-ide, instance #2 (driver name: pci-ide)
 ide (driver name: ata)
 ide (driver name: ata)
 pci-ide, instance #3 (driver name: pci-ide)
 ide (driver name: ata)
 ide (driver name: ata)
 pci10de,370, instance #0 (driver name: pci_pci)
 pci8086,1e, instance #0 (driver name: e1000g)
 pci1458,1000, instance #0 (driver name: hci1394)
 pci1458,a002, instance #0 (driver name: audiohd)
 pci1458,e000, instance #0 (driver name: nge)
 pci10de,377, instance #0 (driver name: pcie_pci)
 display, instance #0 (driver name: nvidia)
 pci1022,1100 (driver name: mc-amd)
 pci1022,1101 (driver name: mc-amd)
 pci1022,1102 (driver name: mc-amd)
 pci1022,1103, instance #0 (driver name: amd64_gart)
 iscsi, instance #0 (driver name: iscsi)
 pseudo, instance #0 (driver name: pseudo)
 options, instance #0 (driver name: options)
 agpgart, instance #0 (driver name: agpgart)
 xsvc, instance #0 (driver name: xsvc)
 used-resources
 cpus
 cpu, instance #0
 cpu, instance #1


Nathan.

Ben Middleton wrote:

Re: [zfs-discuss] SiI 3114 Chipset on Syba Card - Solaris Hangs

2007-08-07 Thread Nathan Kroenert

Some time ago I encountered issues using the odd numbered ports on my 
SIL3114 based card.

I currently use ports 0 and 2 without issue.

I never did get ports 1 and 3 working...

If I have a disk connected to ports 1 or 3, it just conks out on the way 
up when it's initializing the disks. (Unfortunately, I don't remember 
for sure, but I think it was a hard hang).

I should likely have bugged it, but the box on which I was doing the 
work fills the role of my gateway to the internet, so I was disinclined 
to spend lots of time trying to break it, when I only needed two disks 
working anyways...

My 2c...

Nathan.

Blake wrote:
 I have re-flashed the BIOS.
 
 Blake
 
 On 8/7/07, *Ian Collins* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:
 
 Blake wrote:
   Hi.
  
   I'm running snv 65 and having an issue much like this:
  
  
 http://osdir.com/ml/solaris.opensolaris.help/2006-11/msg00047.html
 http://osdir.com/ml/solaris.opensolaris.help/2006-11/msg00047.html
   http://osdir.com/ml/solaris.opensolaris.help/2006-11/msg00047.html
  
   Has anyone found a workaround?
  
   Or is this the issue with the BIOS not liking EFI information
 that ZFS
   uses?
  
 Are you sure the card doesn't have a RAID BIOS?  If it does, it will
 have to be re-flashed.
 
 Ian
 
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS, iSCSI + Mac OS X Tiger (globalSAN iSCSI)

2007-07-04 Thread Nathan Kroenert - Server ESG

Hey there -

This is very likely completely unrelated, but here goes anyhoo...

I have noticed with some particular ethernet adapters (e1000g in my 
case) and large MTU sizes (8K) that things (most anything that really 
pushes the interface) sometimes stop for no good reason on my x86 
Solaris boxes. After it stops, I'm able to re-connect after a short time 
and it works for a while again... (Really must get around to properly 
reproducing the problem and logging a bug too...)

I'd be curious to know if setting the MTU to 1500 on both systems makes 
any difference at all.

Note that I have only observed this with my super cheap adapters at 
home. I'm yet to see if (though also yet to try really hard) on the more 
expensive ones at work...

Again - Likely nothing to do with your problem, but hey. It has made a 
difference for me before...

Cheers.

Nathan.


George wrote:
 I have set up an iSCSI ZFS target that seems to connect properly from 
 the Microsoft Windows initiator in that I can see the volume in MMC Disk 
 Management.
 
  
 When I shift over to Mac OS X Tiger with globalSAN iSCSI, I am able to 
 set up the Targets with the target name shown by `iscsitadm list target` 
 and when I actually connect or Log On I see that one connection exists 
 on the Solaris server.  I then go on to the Sessions tab in globalSAN 
 and I see the session details and it appears that data is being 
 transferred via the PDUs Sent, PDUs Received, Bytes, etc.  HOWEVER the 
 connection then appears to terminate on the Solaris side if I check it a 
 few minutes later it shows no connections, but the Mac OS X initiator 
 still shows connected although no more traffic appears to be flowing in 
 the Session Statistics dialog area.
 
  
 Additionally, when I then disconnect the Mac OS X initiator it seems to 
 drop fine on the Mac OS X side, even though the Solaris side has shown 
 it gone for a while, however when I reconnect or Log On again, it seems 
 to spin infinitely on the Target Connect... dialog.  Solaris is, 
 interestingly, showing 1 connection while this apparent issue (spinning 
 beachball of death) is going on with globalSAN.  Even killing the Mac OS 
 X process doesn't seem to get me full control again as I have to restart 
 the system to kill all processes (unless I can hunt them down and `kill 
 -9` them which I've not successfully done thus far).
 
 Has anyone dealt with this before and perhaps be able to assist or at 
 least throw some further information towards me to troubleshoot this?
 
  
 
  
 Thanks much,
 
  
 -George
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS not utilizing all disks

2007-05-10 Thread Nathan Kroenert


Simple test - mkfile 8gb now and see where the data goes... :)

Victor Latushkin wrote:

Robert Milkowski wrote:

Hello Leon,

Thursday, May 10, 2007, 10:43:27 AM, you wrote:

LM Hello,

LM I've got some weird problem: ZFS does not seem to be utilizing
LM all disks in my pool properly. For some reason, it's only using 2 
of the 3 disks in my pool:


LMcapacity operationsbandwidth
LM pool used  avail   read  write   read  write
LM --  -  -  -  -  -  -
LM database8.48G  1.35T202  0  12.4M  0
LM   c0t1d04.30G   460G103  0  6.21M  0
LM   c0t3d04.12G   460G 96  0  6.00M  0
LM   c0t2d054.9M   464G  2  0   190K  0
LM --  -  -  -  -  -  -

LM I've added all the disks at the same time, so it's not like the
LM last disk was added later. Any ideas on what might be causing this 
? I'm using solaris express b62.
LM 
Your third disks is 4GB larger that first two disks and ZFS tries to

load-balance data so that you can fill up all devices. As you've
already have about 4GB on each of the first two disks ZFS should start
to use third disks after copying addtitional data.


No, it is not - other two disks have 4G out of 464G used, and disk in 
question has only 55M used. So for me it does not look like weighting 
problem. This is something else I believe.


I'm not sure but i suspect this may be somehow related to meta data 
allocation, given that ZFS stores two copies for file system meta data. 
But this is nothing more than a wild guess.


Leon, What kind of data is stored in this pool? What Solaris version are 
you using? How is your pool configured?


Cheers,
Victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: [osol-help] How to recover from rm *?

2007-02-20 Thread Nathan Kroenert


begin crackly, broken record :) 

I, for one, would love to have similar functionality that we had in good 
old netware, where we could 'salvage' deleted files.


The concept was that when the files were deleted, they were not actually 
removed, nor were the all important references to the files to allow 
undeleting them.


In the event that a user had an oops, they could just run salvage (or 
later, filer) and pick the files from the directory in question, and 
*whammo*, undelete it.


I don't recall ever having to do whole directories, nor if it was 
actually possible...


IIRC, you could set policy that determined when the deleted files data 
blocks became available for overwriting (and hence permanent deletion of 
the file).


If it happened that there was too much space 'used up' by the deleted 
files, you could run a purge. (I was not a fan of that part).


I'd have preferred that the deleted files were simply overwritten in a 
fifo manner, and left purge out of it.


Yes - Snapshots are great, but how often do you run a snapshot? Every 60 
seconds?  That's going to get real ugly if you have a filesystem per 
user...


I once cobbled up a poor man's version of this sort of thing, aliasing 
rm to a scripted mv, and pushing everything into a /fs/deleted/* area 
when someone ran rm (maintaining filesystem directory structure). I then 
had occasional rm's run through it, once the filesystem reached a 
certain highwater mark.


Something under the covers of ZFS that provided dumb dumb protection 
would be very cool. I was saved a number of times by the hackery above...


cheers!

Nathan.

Robert Milkowski wrote:

Hello Jeremy,

Monday, February 19, 2007, 1:58:18 PM, you wrote:


Something similar was proposed here before and IIRC someone even has a
working implementation. I don't know what happened to it.


JT That would be me. AFAIK, no one really wanted it.  The problem that it
JT solves can be solved by putting snapshots in a cronjob.

Not exactly the same.

But if people really do not want it...


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: [osol-help] How to recover from rm *?

2007-02-20 Thread Nathan Kroenert

I'd usually agree with that, but - if we have an opportunity to make 
users love ZFS even more, why not at least investigate it.


A perfect example might be exactly what I did on one occasion, where I 
copied a bunch of photos off a CF card. I then reformatted the CF card, 
and cleaned up the the crappy photos on my hard disk, but unfortunately, 
(and stupidly) removed all of them. My photos were gone forever.  :(


Even a snapshot would not have helped here...

I know; stupid stupid stupid, but it happens, and I would have *really* 
liked to have been able to recover those photos...


A salvage / undelete would have been gold.

Nathan.

James Dickens wrote:

Yes - Snapshots are great, but how often do you run a snapshot? Every 60
seconds?  That's going to get real ugly if you have a filesystem per
user...


I'm sure  every 15 minutes is suffient,  if the worker doesn't have a 
slight

penalty he will won't ever learn to be careful.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Cheap ZFS homeserver.

2007-01-31 Thread Nathan Kroenert


Urk!

Where is this documented? And - is it something you can do nothing 
about, or are we ultimately trying to address it somewhere / somehow?


Thanks!!

Nathan.

Bill Moore wrote:

On Wed, Jan 31, 2007 at 05:01:19AM -0800, Tom Buskey wrote:

As a followup, the system I'm trying to use this on is a dual PII 400
with 512MB.  Real low budget.

2 500 GB drives with 2 120 GB in a RAIDZ.  The idea is that I can get
2 more 500 GB drives later to get full capacity.  I tested going from
a 20GB to a 120GB and that worked well.

I'm finding the throughput just isn't there.   2MB/s compared to
20MB/s on a similar Linux system.

Anyone else going this low budget?


There are many folks (including myself) that have done similar super-low
budget setups.  Which SATA controller are you using?  If it's the
SI3112, it has a documented problem when you try to use both SATA
channels simultaneously - it gets less than 2MB/s, compared to about
50MB/s on each drive individually.  I'm not sure if the SI3114 has
similar problems, or not.  This may not be your problem, but I know the
SI3112 was popular on machines in that timeframe.


--Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread Nathan Kroenert


Random thoughts:

If we were to use some intelligence in the design, we could perhaps have 
a monitor that profiles the workload on the system (a pool, for example) 
over a [week|month|whatever] and selects a point in time, based on 
history, that it would expect the disks to be quite, and can 'pre-build' 
the spare with the contents of the disk it's about to swap out. At the 
point of switch-over, it could be pretty much instantaneous... It could 
also bail if it happened that the system actually started to get 
genuinely busy...


That might actually be quite cool, though, if all disks are rotated, we 
end up with a whole bunch of disks that are evenly worn out again, which 
is just what we are really trying to avoid! ;)


Nathan.

Wee Yeh Tan wrote:

On 1/30/07, David Magda [EMAIL PROTECTED] wrote:

What about a rotating spare?

When setting up a pool a lot of people would (say) balance things
around buses and controllers to minimize single  points of failure,
and a rotating spare could disrupt this organization, but would it be
useful at all?


The costs involved in rotating spares in terms of IOPS reduction may
not be worth it.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS on a damaged disk

2006-12-13 Thread Nathan Kroenert

On a recent journey of pain and frustration, I had to recover a UFS 
filesystem from a broken disk. The disk had many bad blocks and more 
were going bad over time. Sadly, there were just a few files that I 
wanted, but I could not mount the disk without it killing my system. 
(PATA disks... PITA if you ask me...)


My recovery method, though painful, might be of value in you locating 
the bad regions of the disk.


What I did was to kick off a script that used dd, and did something like 
this...


==
#! /usr/bin/ksh

SEEK=0

while :
do
dd if=/dev/rdsk/c0d1s7 of=backup.ufs.s7 bs=8192 \
oseek=${SEEK} iseek=${SEEK} count=1 conv=noerror,sync
SEEK=$((SEEK + 1))
done
==

(Or something to that effect.)

Anyhoo - the point is that this hit the disk one block at a time(I chose 
8kb, as it was the ufs block size, and 512 byte blocks looked like it 
would take 3 weeks), and I was ultimately able to get my data back (at 
least the bits I cared about...) after futzing with fsck and some other 
novelties.


If you were to do something similar to this, but instead of copying the 
block, send it to /dev/null, and log the result of dd, you could get a 
complete list of broken blocks.


A few botnotes:
 - Yes. This is slow. WAY slow, and there are thousands of different 
ways that could have done this better and faster. However, it saved me 
from having to do anything else, and at the time, I did not feel like 
breaking out a compiler.  Due to the massively large number of bad 
blocks on my disk, the size of the disk, 160GB, and the number of 
retries my system made for each bad block, it took 10 days (!!) to read 
through the whole disk 8kb at a time.
 - If you are happy to throw away larger blocks of disk, you could use 
a larger block size, which would speed things up.
 - If you disk really does have bad blocks that are getting in the way, 
chances are that it's going to get worse, and pain will ensue. I'd 
suggest that a new disk might be a better option.
 - On the new disk front, note that many hard disks come with 5 year 
warranties these days. If the disk is not super old, you might be able 
to get it replaced under warranty if you send it directly to the 
manufacturer...


Hope this helps at least provide some ideas. :)

Oh - and get a new disk. ;)

Nathan.




Patrick P Korsnick wrote:


i have a machine with a disk that has some sort of defect and i've found that 
if i partition only half of the disk that the machine will still work.  i tried 
to use 'format' to scan the disk and find the bad blocks, but it didn't work.

so as i don't know where the bad blocks are but i'd still like to use some of 
the rest of the disk, i thought ZFS might be able to help.  i partitioned the 
disk so slices 4,5,6 and 7 are each 5GB.  i thought i'd make one or multiple 
zpools on those slices and then i'd be able to narrow down where the bad 
sections are.

so my question is can i declare a zpool that spans multiple c0d0sXX but isn't a 
mirror and if i can, then will zfs be able to detect where the problem c0d0sXX 
is and not use it?  if not, i'll have to make 4 different zpools and experiment 
with storing stuff on each to find the approximate location of the bad blocks.
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] weird thing with zfs

2006-12-05 Thread Nathan Kroenert

Hm. If the disk has no label, why would it have an s0?

Or, did you mean p0?

Nathan.


On Wed, 2006-12-06 at 04:45, Krzys wrote:
 Does not work :(
 
 dd if=/dev/zero of=/dev/rdsk/c3t6d0s0 bs=1024k count=1024
 dd: opening `/dev/rdsk/c3t6d0s0': I/O error
 
 That is so strange... it seems like I lost another disk... I will try to 
 reboot 
 and see what I get, but I guess I need to order another disk then and give it 
 a 
 try...
 
 Chris
 
 
 
 
 
 On Tue, 5 Dec 2006, Al Hopper wrote:
 
  On Tue, 5 Dec 2006, Krzys wrote:
 
  Thanks, ah another wird thing is that when I run format on that frive 
  I get
  a coredump :(
  ... snip 
 
  Try zeroing out the disk label with something like:
 
  dd if=/dev/zero of=/dev/rdsk/c?t?d?p0  bs=1024k count=1024
 
  Regards,
 
  Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
  OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
  OpenSolaris Governing Board (OGB) Member - Feb 2006
 
 
  !DSPAM:122,4575a7731650371292!
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] snv_51 hangs

2006-11-14 Thread Nathan Kroenert

Hm.

If the system is hung, it's unlikely that a reboot -d will help.

You want to be booting into kmdb, then using the F1-a interrupt sequence
then dumping using $systemdump at the kmdb prompt.

See the following documents:
Index of lots of useful stuff:
http://docs.sun.com/app/docs/doc/817-1985/6mhm8o5p3?a=view

Forcing a crashdump on x86 boxes:
http://docs.sun.com/app/docs/doc/817-1985/6mhm8o5q5?a=view

And booting from grub into kmdb:
http://docs.sun.com/app/docs/doc/817-1985/6mhm8o5q2?a=view

I'm not sure how the serial console is going to impact you. I'm
expecting it'll still be f1-a to drop to the debugger...

That's assuming it's not a hard hang. :)

Cheers.

Nathan.





On Wed, 2006-11-15 at 14:16, Sean Ye wrote:
 Hi, Chris,
 
 You may force a panic by reboot -d.
 
 Thanks,
 Sean
 On Tue, Nov 14, 2006 at 09:11:58PM -0600, Chris Csanady wrote:
  I have experienced two hangs so far with snv_51.  I was running snv_46
  until recently, and it was rock solid, as were earlier builds.
  
  Is there a way for me to force a panic?  It is an x86 machine, with
  only a serial console.
  
  Chris
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Best Practices recommendation on x4200

2006-11-08 Thread Nathan Kroenert

On Thu, 2006-11-09 at 10:21, Richard Elling - PAE wrote:
  One way to populate
  an ABE is to mirror slices.  However, you cannot mirror between a
  device that starts at cylinder 0 and one that does not.  
 
 Where is this restriction documented?  It doesn't make sense to me.
 Maybe you have a scar from running Sybase in a previous life? ;-)

IIRC, that's a part of the history of disksuite / SVM. Moreover, it was
that you cannot mirror a slice that has a VTOC label on it to one that
does not... (hence the understanding of it being a cylinder 0 issue).

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/lvm/mirror/mirror_ioctl.c#887

Or, perhaps I need more coffee...

Cheers!

Nathan. ;)





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] OpenSolaris vs. Solaris 10 11/06 (S10u3) for NFS ZFS Server

2006-11-08 Thread Nathan Kroenert

For me, it came down to - Do I want to patch, or upgrade?

My gateway to the internet is a solaris 10 box, patched whenever
required. I like that as soon as a security patch is available, I can
apply it and reboot. Simple.

My laptop runs nevada. I upgrade from network / dvd when I see a new
feature that excites me.

As far as whiz-bang things that would excite you, only you will know
that for sure. :)

Cheers!

Nathan.




On Thu, 2006-11-09 at 08:58, Wes Williams wrote:
 I'm in the process of building a Solaris NFS server with ZFS and was 
 wondering if any gurus here have any comments as to choosing the upcoming 
 Solairs 10 11/06 [presumably] or OpenSolaris bXX/Solairs Express for this 
 use.  Even with my use of OpenSolaris I maintain a service contract to show 
 my support, so bug fixes in a static supported version shouldn't be an 
 issue in picking a version.
 
 So, the short question is are there any super-cool must-have ZFS/NFS 
 features in OpenSolaris now that S10u3 won't have right away?
 
 Thanks!
  
 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Where is the ZFS configuration data stored?

2006-10-12 Thread Nathan Kroenert


I'll take a crack at this.

First off, I'm assuming that the RAID you are talking about it provided 
by the hardware and not by ZFS.


IF that's the case, then it will depend on the way you created the raid 
set, the bios of the controller, and whether or not these two things 
match up with any other systems.


A few of the RAID controllers I have played with has an option to 
'rebuild' a raid set, which I get the impression (though have never 
tried) allows you to essentially tell the controller there is a raid set 
there, and if you set it up the same way as before, it will use work.


Personally, unless I was moving the disks to another system with the 
same RAID controller and BIOS, I would have no expectation it would 
work. It might, but I would not be surprised (or disappointed) if it did 
not.


If you are talking about using ZFS's raid, then you won't need to do 
anything. It should just work, as ZFS will be able to just import the zpool.


I hope I understood your question. (And I hope I'm telling no lies... ;)

Nathan.

Sergey wrote:


+ a little addition to the original quesion:

Imagine that you have a RAID attached to Solaris server. There's ZFS on RAID. 
And someday you lost your server completely (fired motherboard, physical crash, 
...). Is there any way to connect the RAID to some another server and restore 
ZFS layout (not loosing all data on RAID)?
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool list No known data errors

2006-10-09 Thread Nathan Kroenert

I might be wrong here, but I think it's telling you that there are no
errors.

Something like:

errors: none

or 

errors: None that we know of, but we'll let you know if there are any.

At least that is how I'd read it.

:)

Do you have an actual problem other than the text?

Nathan.

On Tue, 2006-10-10 at 10:05, ttoulliu2002 wrote:
 Hi:
 
 I have zpool created 
 # zpool list
 NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
 ktspool34,5G   33,5K   34,5G 0%  ONLINE -
 
 However, zpool status shows no known data error.  May I know what is the 
 problem
 # zpool status
   pool: ktspool
  state: ONLINE
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 ktspool ONLINE   0 0 0
   c0t1d0s6  ONLINE   0 0 0
 
 errors: No known data errors
  
 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Significant pauses during zfs writes

2006-08-14 Thread Nathan Kroenert

Hey, Bob - 

It might be worth exploring where your data stream for the writes was
coming from. Moreover, it might be worth exploring how fast it was
filling up caches for writing.

Were you delivering enough data to keep the disks busy 100% of the time?

I have been tricked by this before... :)

Nathan.

On Tue, 2006-08-15 at 01:38, James C. McPherson wrote:
 Bob Evans wrote:
  Just getting my feet wet with zfs.  I set up a test system (Sunblade
  1000, dual channel scsi card, disk array with 14x18GB 15K RPM SCSI
  disks) and was trying to write a large file (10 GB) to the array to
  see how it performed.  I configured the raid using raidz.
  
  During the write, I saw the disk access lights come on, but I noticed
  a peculiar behavior.  The system would write to the disk, but then
  pause for a few seconds, then contineu, then pause for a few seconds.
  
  
  I saw the same behavior when I made a smaller raidz using 4x36 GB
  scsi drives in a different enclosure.
  
  Since I'm new to zfs, and realize that I'm probably missing
  something, I was hoping somebody might help shed some light on my
  problem.
 
 Hi Bob,
 I'm pretty sure that's not a problem that you're seeing, just
 ZFS' normal behaviour. Writes are coalesced as much as possible,
 so the pauses that you observed are most likely going to be
 the system waiting for suitable IOs to be gathered up and sent
 out to your storage.
 
 If you want to examine this a bit more then might I suggest the
 DTrace Toolkit's iosnoop utility.
 
 
 best regards,
 James C. McPherson
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs sucking down my memory!?

2006-07-20 Thread Nathan Kroenert

Something I often do when I'm a little suspicious of this sort of
activity is to run something that steals vast quantities of memory...

eg: something like this:

#include stdio.h
#include stdlib.h

int main() 
{

int memsize=0;
char *input_string;
char *memory;
long i=0;

input_string=malloc(256 * sizeof(char));

printf(How much memory? :);
input_string=fgets(input_string, 255, stdin);
memsize=atoi(input_string);

printf(mem_size=%d\n, memsize);

memory=calloc(memsize*1024*1024, 1);

printf(Pausing: hit enter to exit\n);

input_string=fgets(input_string, 255, stdin);

exit(0);

}


which allows me to request, say, 500mb of memory. Watching vmstat whilst
doing this is interesting.
It then runs and uses lots of memory, and causes some pressure.
If, at the end when it exits, you have lots of memory free, and nothing
swapped out, it's all good. :)

quick, dirty, possibly even smelly, with no error checking at all... 

:)

Nathan.



On Fri, 2006-07-21 at 09:28, Eric Schrock wrote:
 There two things to note here:
 
 1. The vast majority of the memory is being used by the ZFS cache, but
appears under 'kernel heap'.  If you actually need the memory, it
_should_ be released.   Under UFS, this cache appears as the 'page
cache', and users understand that it can be released when needed.
The same is true of ZFS, but it's just not accounted for as separate
memory.  Now, the VM hooks needed to do this are somewhat add hoc at
the moment, but the ZFS cache should keep itself from consuming 100%
of the available memory.
 
 2. There is a difference between VA (virtual addressing) and physical
memory.  See the following thread for a more complete discussion:
 

 http://www.opensolaris.org/jive/thread.jspa?threadID=10774tstart=45start=15
 
 So the (apparent) high kernel memory consumption is expected, and does
 not indicate any type of problem.  Applications actually receiving
 ENOMEM should not happen, and may indicate that there are some
 circumstances where the VM interfaces are currently inadequate.  Someone
 else on the ZFS team may be able to get some more specifics from you to
 figure out what's really going on.
 
 - Eric
 
 On Thu, Jul 20, 2006 at 04:03:50PM -0700, Joseph Mocker wrote:
  
  
  So what's going on! Please help. I want my memory back!
  
  
  This is essentially by design, due to the way that ZFS uses kernel
  memory for caching and other stuff.
  
  You can alleviate this somewhat by running a 64bit processor, which
  has a significantly larger address space to play with.
  
  Uhh. If I don't have any more physical memory, how does a 64bit 
  processor help?
  
  FWIW, this is on a SunBlade 2000 running in 64bit mode:
  
  [EMAIL PROTECTED]: uname -a
  SunOS watt 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Blade-1000
  [EMAIL PROTECTED]: isainfo
  sparcv9 sparc
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 --
 Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

1 2 >

1 - 100 of 106 matches

Mail list logo