Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-18 Thread George Wilson
Don,

Try setting the zfs_scrub_delay to 1 but increase the
zfs_top_maxinflight to something like 64.

Thanks,
George

On Wed, May 18, 2011 at 5:48 PM, Donald Stahl  wrote:
> Wow- so a bit of an update:
>
> With the default scrub delay:
> echo "zfs_scrub_delay/K" | mdb -kw
> zfs_scrub_delay:20004
>
> pool0       14.1T  25.3T    165    499  1.28M  2.88M
> pool0       14.1T  25.3T    146      0  1.13M      0
> pool0       14.1T  25.3T    147      0  1.14M      0
> pool0       14.1T  25.3T    145      3  1.14M  31.9K
> pool0       14.1T  25.3T    314      0  2.43M      0
> pool0       14.1T  25.3T    177      0  1.37M  3.99K
>
> The scrub continues on at about 250K/s - 500K/s
>
> With the delay set to 1:
>
> echo "zfs_scrub_delay/W1" | mdb -kw
>
> pool0       14.1T  25.3T    272      3  2.11M  31.9K
> pool0       14.1T  25.3T    180      0  1.39M      0
> pool0       14.1T  25.3T    150      0  1.16M      0
> pool0       14.1T  25.3T    248      3  1.93M  31.9K
> pool0       14.1T  25.3T    223      0  1.73M      0
>
> The pool scrub rate climbs to about 800K/s - 100K/s
>
> If I set the delay to 0:
>
> echo "zfs_scrub_delay/W0" | mdb -kw
>
> pool0       14.1T  25.3T  50.1K    116   392M   434K
> pool0       14.1T  25.3T  49.6K      0   389M      0
> pool0       14.1T  25.3T  50.8K     61   399M   633K
> pool0       14.1T  25.3T  51.2K      3   402M  31.8K
> pool0       14.1T  25.3T  51.6K      0   405M  3.98K
> pool0       14.1T  25.3T  52.0K      0   408M      0
>
> Now the pool scrub rate climbs to 100MB/s (in the brief time I looked at it).
>
> Is there a setting somewhere between slow and ludicrous speed?
>
> -Don
>



-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-17 Thread George Wilson
On Tue, May 17, 2011 at 11:48 AM, Jim Klimov  wrote:
>> So if you bump this to 32k then the fragmented size
>> is 512k which tells ZFS to switch to a different metaslab
>> once it drops below this threshold.
>
> Makes sense after some more reading today ;)
>
> What happens if no metaslab has a block this large (or small)
> on a sufficiently full and fragmented system? Will the new writes
> fail altogether, or a sufficient free space block would still be used?

If all the metaslabs on all of your devices won't accommodate the
specified block then you will start to create gang blocks (i.e.
smaller fragments which make up the specified block size).

>
>
>> This is used to add more weight (i.e. preference) to specific
>> metaslabs. A metaslab receives this bonus if it has an offset
>> which is
>> lower than a previously use metaslab. Sorry this is somewhat
>> complicated and hard to explain without a whiteboard. :-)
>
> From recent reading on Jeff's blog and links leading from it,
> I might guess this relates to different disk offsets with different
> writing speeds? Yes-no would suffice, as to spare the absent
> whiteboard ,)


No. Imagine if you started allocations on a disk and used the
metaslabs that are at the edge of disk and some out a 1/3 of the way
in. Then you want all the metaslabs which are a 1/3 of the way in and
lower to get the bonus. This keeps the allocations towards the outer
edges.

- George

>
> Thanks,
> //Jim



-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-17 Thread George Wilson
On Tue, May 17, 2011 at 6:49 AM, Jim Klimov  wrote:
> 2011-05-17 6:32, Donald Stahl пишет:
>>
>> I have two follow up questions:
>>
>> 1. We changed the metaslab size from 10M to 4k- that's a pretty
>> drastic change. Is there some median value that should be used instead
>> and/or is there a downside to using such a small metaslab size?
>>
>> 2. I'm still confused by the poor scrub performance and it's impact on
>> the write performance. I'm not seeing a lot of IO's or processor load-
>> so I'm wondering what else I might be missing.
>
> I have a third question, following up to the first one above ;)
>
> 3) Is the "4k" size anyhow theoretically based?
> Namely, is it a "reasonably large" amount of eight or so
> metadata blocks of 512Kb size,or something else is in
> play - like a 4Kb IO?

The 4k blocksize was based on some analysis I had done on some systems
at Oracle. The code uses this shifted by another tuneable (defaults to
4) to determine the "fragmented" minimum size. So if you bump this to
32k then the fragmented size is 512k which tells ZFS to switch to a
different metaslab once it drops below this threshold.

>
> In particular, since my system uses 4Kb blocks (ashift=12),
> for similar benefit I should set metaslab size to 32k (4K*8
> blocks) - yes/no?
>
> Am I also correct to assume that if I have a large streaming
> write and ZFS can see or predict that it would soon have to
> reference many blocks, it can allocate a metaslab larger
> that this specified minimum and thus keep fragmentation
> somewhat not extremely high?

The metaslabs are predetermined at config time and their sizes are
fixed. A good way to think about them is as slices of your disk. If
you take your disk size and divided them up into 200 equally sized
sections then you end up with your metaslab size.

>
> Actually, am I understanding correctly that metaslabs are
> large contiguous ranges reserved for metadata blocks?
> If so, and if they are indeed treated specially anyway,
> is it possible to use 512-byte records for metadata even
> on VDEVs with 4kb block size configured by ashift=12?
> Perhaps not today, but as an RFE for ZFS development
> (I posted the idea here https://www.illumos.org/issues/954 )
>

No, metaslab are for all allocations and not specific to metadata.
There's more work to do to efficiently deal with 4k block sizes.

> Rationale: Very much space is wasted on my box just to
> reference data blocks and keep 3.5kb of trailing garbage ;)
>
> 4) In one internet post I've seen suggestions about this
> value to be set as well:
> set zfs:metaslab_smo_bonus_pct = 0xc8
>
> http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg40765.html

This is used to add more weight (i.e. preference) to specific
metaslabs. A metaslab receives this bonus if it has an offset which is
lower than a previously use metaslab. Sorry this is somewhat
complicated and hard to explain without a whiteboard. :-)

Thanks,
George

>
> Can anybody comment - what it is and whether it would
> be useful? The original post passed the knowledge as-is...
> Thanks
>
>
>
> --
>
>
> ++
> |                                                            |
> | Климов Евгений,                                 Jim Klimov |
> | технический директор                                   CTO |
> | ЗАО "ЦОС и ВТ"                                  JSC COS&HT |
> |                                                            |
> | +7-903-7705859 (cellular)          mailto:jimkli...@cos.ru |
> |                          CC:ad...@cos.ru,jimkli...@mail.ru |
> ++
> | ()  ascii ribbon campaign - against html mail              |
> | /\                        - against microsoft attachments  |
> ++
>
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-17 Thread George Wilson
On Mon, May 16, 2011 at 7:32 PM, Donald Stahl  wrote:
> As a followup:
>
> I ran the same DD test as earlier- but this time I stopped the scrub:
>
> pool0       14.1T  25.4T     88  4.81K   709K   262M
> pool0       14.1T  25.4T    104  3.99K   836K   248M
> pool0       14.1T  25.4T    360  5.01K  2.81M   230M
> pool0       14.1T  25.4T    305  5.69K  2.38M   231M
> pool0       14.1T  25.4T    389  5.85K  3.05M   293M
> pool0       14.1T  25.4T    376  5.38K  2.94M   328M
> pool0       14.1T  25.4T    295  3.29K  2.31M   286M
>
> ~# dd if=/dev/zero of=/pool0/ds.test bs=1024k count=2000 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.1 GB) copied, 6.50394 s, 322 MB/s
>
> Stopping the scrub seemed to increase my performance by another 60%
> over the highest numbers I saw just from the metaslab change earlier
> (That peak was 201 MB/s).
>
> This is the performance I was seeing out of this array when newly built.
>
> I have two follow up questions:
>
> 1. We changed the metaslab size from 10M to 4k- that's a pretty
> drastic change. Is there some median value that should be used instead
> and/or is there a downside to using such a small metaslab size?


Unfortunately the default value for metaslab_min_alloc_size is too
high. I've been meaning to rework much of this code to make the change
more dynamic rather than just a hard-coded value. What this is trying
to do is make sure that zfs switches to a different metaslab once it
finds that it can't allocate its desired chunk. With the default value
the desired chunk is 160MB. By taking the value to 4K it now is
looking for 64K chunks which is more reasonable for fuller pools. My
plan is to make these values dynamically change as we start to fill up
the metaslabs. This is a substantial rewhack of the code and not
something that will be available anytime soon.


> 2. I'm still confused by the poor scrub performance and it's impact on
> the write performance. I'm not seeing a lot of IO's or processor load-
> so I'm wondering what else I might be missing.

Scrub will impact performance although I wouldn't expect a 60% drop.
Do you mind sharing more data on this? I would like to see the
spa_scrub_* values I sent you earlier while you're running your test
(in a loop so we can see the changes). What I'm looking for is to see
how many inflight scrubs you have at the time of your run.

Thanks,
George

> -Don
>



-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-16 Thread George Wilson
You mentioned that the pool was somewhat full, can you send the output
of 'zpool iostat -v pool0'? You can also try doing the following to
reduce 'metaslab_min_alloc_size' to 4K:

echo "metaslab_min_alloc_size/Z 1000" | mdb -kw

NOTE: This will change the running system so you may want to make this
change during off-peak hours.

Then check your performance and see if it makes a difference.

- George


On Mon, May 16, 2011 at 10:58 AM, Donald Stahl  wrote:
> Here is another example of the performance problems I am seeing:
>
> ~# dd if=/dev/zero of=/pool0/ds.test bs=1024k count=2000 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.1 GB) copied, 56.2184 s, 37.3 MB/s
>
> 37MB/s seems like some sort of bad joke for all these disks. I can
> write the same amount of data to a set of 6 SAS disks on a Dell
> PERC6/i at a rate of 160MB/s and those disks are hosting 25 vm's and a
> lot more IOPS than this box.
>
> zpool iostat during the same time shows:
> pool0       14.2T  25.3T    124  1.30K   981K  4.02M
> pool0       14.2T  25.3T    277    914  2.16M  23.2M
> pool0       14.2T  25.3T     65  4.03K   526K  90.2M
> pool0       14.2T  25.3T     18  1.76K   136K  6.81M
> pool0       14.2T  25.3T    460  5.55K  3.60M   111M
> pool0       14.2T  25.3T    160      0  1.24M      0
> pool0       14.2T  25.3T    182  2.34K  1.41M  33.3M
>
> The zero's and other low numbers don't make any sense. And as I
> mentioned- the busy percent and service times of these disks are never
> abnormally high- especially when compared to the much smaller, better
> performing pool I have.
>



-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-16 Thread George Wilson
Don,

Can you send the entire 'zpool status' output? I wanted to see your
pool configuration. Also run the mdb command in a loop (at least 5
tiimes) so we can see if spa_last_io is changing. I'm surprised you're
not finding the symbol for 'spa_scrub_inflight' too.  Can you check
that you didn't mistype this?

Thanks,
George

On Mon, May 16, 2011 at 7:41 AM, Donald Stahl  wrote:
>> Can you share your 'zpool status' output for both pools?
> Faster, smaller server:
> ~# zpool status pool0
>  pool: pool0
>  state: ONLINE
>  scan: scrub repaired 0 in 2h18m with 0 errors on Sat May 14 13:28:58 2011
>
> Much larger, more capable server:
> ~# zpool status pool0 | head
>  pool: pool0
>  state: ONLINE
>  scan: scrub in progress since Fri May 13 14:04:46 2011
>    173G scanned out of 14.2T at 737K/s, (scan is slow, no estimated time)
>    43K repaired, 1.19% done
>
> The only other relevant line is:
>            c5t9d0          ONLINE       0     0     0  (repairing)
>
> (That's new as of this morning- though it was still very slow before that)
>
>> Also you may want to run the following a few times in a loop and
>> provide the output:
>>
>> # echo "::walk spa | ::print spa_t spa_name spa_last_io
>> spa_scrub_inflight" | mdb -k
> ~# echo "::walk spa | ::print spa_t spa_name spa_last_io
>> spa_scrub_inflight" | mdb -k
> spa_name = [ "pool0" ]
> spa_last_io = 0x159b275a
> spa_name = [ "rpool" ]
> spa_last_io = 0x159b210a
> mdb: failed to dereference symbol: unknown symbol name
>
> I'm pretty sure that's not the output you were looking for :)
>
> On the same theme- is there a good reference for all of the various
> ZFS debugging commands and mdb options?
>
> I'd love to spend a lot of time just looking at the data available to
> me but every time I turn around someone suggests a new and interesting
> mdb query I've never seen before.
>
> Thanks,
> -Don
>



-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-15 Thread George Wilson
Can you share your 'zpool status' output for both pools?

Also you may want to run the following a few times in a loop and
provide the output:

# echo "::walk spa | ::print spa_t spa_name spa_last_io
spa_scrub_inflight" | mdb -k

Thanks,
George

On Sat, May 14, 2011 at 8:29 AM, Donald Stahl  wrote:
>> The scrub I/O has lower priority than other I/O.
>>
>> In later ZFS releases, scrub I/O is also throttled. When the throttle
>> kicks in, the scrub can drop to 5-10 IOPS. This shouldn't be much of
>> an issue, scrubs do not need to be, and are not intended to be, run
>> very often -- perhaps once a quarter or so.
> I understand the lower priority I/O and such but what confuses me is this:
> On my primary head:
>  scan: scrub in progress since Fri May 13 14:04:46 2011
>    24.5G scanned out of 14.2T at 340K/s, (scan is slow, no estimated time)
>    0 repaired, 0.17% done
>
> I have a second NAS head, also running OI 147 on the same type of
> server, with the same SAS card, connected to the same type of disk
> shelf- and a zpool scrub over there is showing :
>  scan: scrub in progress since Sat May 14 11:10:51 2011
>    29.0G scanned out of 670G at 162M/s, 1h7m to go
>    0 repaired, 4.33% done
>
> Obviously there is less data on the second server- but the first
> server has 88 x SAS drives and the second one has 10 x 7200 SATA
> drives. I would expect those 88 SAS drives to be able to outperform 10
> SATA drives- but they aren't.
>
> On the first server iostat -Xn is showing 30-40 IOPS max per drive,
> while on the second server iostat -Xn is showing 400 IOPS per drive.
>
> On the first server the disk busy numbers never climb higher than 30%
> while on the secondary they will spike to 96%.
>
> This performance problem isn't just related to scrubbing either. I see
> mediocre performance when trying to write to the array as well. If I
> were seeing hardware errors, high service times, high load, or other
> errors, then that might make sense. Unfortunately I seem to have
> mostly idle disks that don't get used. It's almost as if ZFS is just
> sitting around twiddling its thumbs instead of writing data.
>
> I'm happy to provide real numbers, suffice it to say none of these
> numbers make any sense to me.
>
> The array actually has 88 disks + 4 hot spares (1 each of two sizes
> per controller channel) + 4 Intel X-25E 32GB SSD's (2 x 2 way mirror
> split across controller channels).
>
> Any ideas or things I should test and I will gladly look into them.
>
> -Don
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing Faulted ZFS pool when zbd doesn't recognize the pool as existing

2011-02-06 Thread George Wilson
r
> existed. The reason it jumps to raidz1-6 as far as I know is because
> raidz1-6 is the start of a new backplane. The other raidz1-[0-3] are on a
> different backplane. I wonder if ZFS is suddenly thinking we need those? )
>
>
>
> Tried: zdb -e - 13666181038508963033
>
> Result: (same as before) can't open 'tank': I/O error
>
>
>
> Tried: zpool history tank
>
> Result: no output
>
>
>
> Tried:  zdb -U -lv 13666181038508963033
>
> Result:  zdb: can't open '13666181038508963033': No such file or directory
>
>
>
> Tried: zdb –e 13666181038508963033
>
> Result: lists all the vdevs,  gets an I/O error at the end of c9t15d1s0
>
>
>
>
>
> It thinks we have 7 vdev children, and it lists 7 (0-6)
>
>
>
>
>
> Tried: time zpool import -V -m -d /mytempdev -fFX -o ro -o
> failmode=continue -R /mnt 13666181038508963033
>
> Result: works, took 25 min, and all the vdevs are the proper /mytempdev
> devices, not other ones.
>
>
>
> Tried: zpool clear -F tank
>
> Result: cannot clear errors for tank: I/O error
>
>
>
> Zdb does work, my commands run on “rpool” come back properly.. look:
>
>
>
> solaris:/# zdb -R tank 0:11600:200
>
> zdb: can't open 'tank': No such file or directory
>
> solaris:/# zdb -R rpool 0:11600:200
>
> Found vdev: /dev/dsk/c8d0s0
>
> DVA[0]=<0:11600:200:STD:1> [L0 unallocated] off uncompressed LE contiguous
> unique unencrypted 1-copy size=200L/200P birth=4L/4P fill=0 cksum=0:0:0:0
>
>   0 1 2 3 4 5 6 7   8 9 a b c d e f  0123456789abcdef
>
> 00:  070c89010c000254  1310050680101c00  T...
>
> 10:  58030001001f0728  060d201528830a07  (..X...(. ..
>
> 20:  3bdf081041020c10  0f00cc0c00588256  ...A...;V.X.
>
> 30:  00083d2fe64b  130016580df44f09  K./=.O..X...
>
> 40:  48b49e8ec3ac74c0  42fc03fcff2f0064  .t.Hd./B
>
> 50:  42fc42fc42fc42fc  fc42fcff42fc42fc  .B.B.B.B.B.B..B.
>
> [..snip..]
>
>
>
>
>
> I’ve tried booting into FreeBSD, and I get pretty much the same results as
> I do under Solaris Express. The only diff with FreeBSD seems to be that
> after I import with –V and try a zdb command, the zpool doesn’t exist
> anymore (crashes out?). My advantage in FreeBSD is that I have the source
> code that I can browse through, and possibly edit if I need to.
>
>
>
>
>
>
>
> I’m thinking that if I could try some of the uberblock invalidation tricks,
> I could do something here – But how do I do this without zdb giving my
> information that I need?
>
>
>
> Hopefully some kind soul will take pity and dive into this mess with me. J
>
>
>
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>


-- 
George Wilson

 <http://www.delphix.com>

M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool-poolname has 99 threads

2011-01-31 Thread George Wilson
The threads associated with the zpool process have special purposes and are
used by the different I/O types of the ZIO pipeline. The number of threads
doesn't change for workstations or servers. They are fixed values per ZIO
type. The new process you're seeing is just exposing the work that has
always been there. Now you can monitor how much CPU is being used by the
underlying ZFS I/O subsystem. If you're seeing a specific performance
problem feel free to provide more details about the issue.

- George

On Mon, Jan 31, 2011 at 4:54 PM, Gary Mills  wrote:

> After an upgrade of a busy server to Oracle Solaris 10 9/10, I notice
> a process called zpool-poolname that has 99 threads.  This seems to be
> a limit, as it never goes above that.  It is lower on workstations.
> The `zpool' man page says only:
>
>  Processes
> Each imported pool has an associated process,  named  zpool-
> poolname.  The  threads  in  this process are the pool's I/O
> processing threads, which handle the compression,  checksum-
> ming,  and other tasks for all I/O associated with the pool.
> This process exists to  provides  visibility  into  the  CPU
> utilization  of the system's storage pools. The existence of
> this process is an unstable interface.
>
> There are several thousand processes doing ZFS I/O on the busy server.
> Could this new process be a limitation in any way?  I'd just like to
> rule it out before looking further at I/O performance.
>
> --
> -Gary Mills--Unix Group--Computer and Network Services-
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
George Wilson

 <http://www.delphix.com>

M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question about (delayed) block freeing

2010-10-29 Thread George Wilson
This value is hard-coded in.

- George

On Fri, Oct 29, 2010 at 9:58 AM, David Magda  wrote:

> On Fri, October 29, 2010 10:00, Eric Schrock wrote:
> >
> > On Oct 29, 2010, at 9:21 AM, Jesus Cea wrote:
> >
> >> When a file is deleted, its block are freed, and that situation is
> >> committed in the next txg. Fine. Now those blocks are free, and can be
> >> used in new block requests. Now new requests come and the (now free)
> >> blocks are reused for new data.
> >
> > ZFS will not reuse blocks for 3 transaction groups.  This is why
> uberblock
> > rollback will do normally only attempt a rollback of up to two previous
> > txgs.
>
> Just curious: is is run-time tunable, or hard-coded in at compile time?
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovering from corrupt ZIL

2010-10-24 Thread George Wilson
The guid is stored on the mirrored pair of the log and in the pool config.
If you're log device was not mirrored then you can only find it in the pool
config.

- George

On Sun, Oct 24, 2010 at 9:34 AM, David Ehrmann  wrote:

> How does ZFS detect that there's a log device attached to a pool?  I
> couldn't actually find the GUID of the log device anywhere on the other
> devices in the pool.
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovering from corrupt ZIL

2010-10-23 Thread George Wilson
If your pool is on version > 19 then you should be able to import a pool
with a missing log device by using the '-m' option to 'zpool import'.

- George

On Sat, Oct 23, 2010 at 10:03 PM, David Ehrmann  wrote:

> > > From: zfs-discuss-boun...@opensolaris.org
> > [mailto:zfs-discuss-
> > > boun...@opensolaris.org] On Behalf Of Roy Sigurd
> > Karlsbakk
> > >
> > > Last I checked, you lose the pool if you lose the
> > slog on zpool
> > > versions < 19. I don't think there is a trivial way
> > around this.
> >
> > The actual data on disk
> > hasn't disappeared.  The only problem is the fact
> > that you can't import it.
>
> That's what's so frustrating.
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there any way to stop a resilver?

2010-09-29 Thread George Wilson

Can you post the output of 'zpool status'?

Thanks,
George

LIC mesh wrote:

Most likely an iSCSI timeout, but that was before my time here.

Since then, there have been various individual drives lost along the way 
on the shelves, but never a whole LUN, so, theoretically, /except/ for 
iSCSI timeouts, there has been no great reason to resilver.




On Wed, Sep 29, 2010 at 11:51 AM, Lin Ling > wrote:



What caused the resilvering to kick off in the first place?

Lin

On Sep 29, 2010, at 8:46 AM, LIC mesh wrote:


It's always running less than an hour.

It usually starts at around 300,000h estimate(at 1m in), goes up
to an estimate in the millions(about 30mins in) and restarts.

Never gets past 0.00% completion, and K resilvered on any LUN.

64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs.




On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke
mailto:scott.meili...@craneaerospace.com>> wrote:

Has it been running long? Initially the numbers are *way* off.
After a while it settles down into something reasonable.

How many disks, and what size, are in your raidz2?  


-Scott


On 9/29/10 8:36 AM, "LIC mesh" http://licm...@gmail.com/>> wrote:

Is there any way to stop a resilver?

We gotta stop this thing - at minimum, completion time is
300,000 hours, and maximum is in the millions.

Raidz2 array, so it has the redundancy, we just need to
get data off.




We value your opinion! 
 How may we

serve you better?Please click the survey link to tell us how
we are doing:

http://www.craneae.com/surveys/satisfaction.htm

Your feedback is of the utmost importance to us. Thank you for
your time.

Crane Aerospace & Electronics Confidentiality Statement:
The information contained in this email message may be
privileged and is confidential information intended only for
the use of the recipient, or any employee or agent responsible
to deliver it to the intended recipient. Any unauthorized use,
distribution or copying of this information is strictly
prohibited and may be unlawful. If you have received this
communication in error, please notify the sender immediately
and destroy the original message and all attachments from your
electronic files.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org 
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilver endlessly restarting at completion

2010-09-29 Thread George Wilson

Answers below...

Tuomas Leikola wrote:
The endless resilver problem still persists on OI b147. Restarts when it 
should complete.


I see no other solution than to copy the data to safety and recreate the 
array. Any hints would be appreciated as that takes days unless i can 
stop or pause the resilvering.


On Mon, Sep 27, 2010 at 1:13 PM, Tuomas Leikola 
mailto:tuomas.leik...@gmail.com>> wrote:


Hi!

My home server had some disk outages due to flaky cabling and
whatnot, and started resilvering to a spare disk. During this
another disk or two dropped, and were reinserted into the array. So
no devices were actually lost, they just were intermittently away
for a while each.

The situation is currently as follows:
  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear the
errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 5h33m, 22.47% done, 19h10m to go
config:

NAME   STATE READ WRITE CKSUM
tank   ONLINE   0 0 0
  raidz1-0 ONLINE   0 0 0
c11t1d0p0  ONLINE   0 0 0
c11t2d0ONLINE   0 0 5
c11t6d0p0  ONLINE   0 0 0
spare-3ONLINE   0 0 0
  c11t3d0p0ONLINE   0 0 0  106M
resilvered
  c9d1 ONLINE   0 0 0  104G
resilvered
c11t4d0p0  ONLINE   0 0 0
c11t0d0p0  ONLINE   0 0 0
c11t5d0p0  ONLINE   0 0 0
c11t7d0p0  ONLINE   0 0 0  93.6G
resilvered
  raidz1-2 ONLINE   0 0 0
c6t2d0 ONLINE   0 0 0
c6t3d0 ONLINE   0 0 0
c6t4d0 ONLINE   0 0 0  2.50K
resilvered
c6t5d0 ONLINE   0 0 0
c6t6d0 ONLINE   0 0 0
c6t7d0 ONLINE   0 0 0
c6t1d0 ONLINE   0 0 1
logs
  /dev/zvol/dsk/rpool/log  ONLINE   0 0 0
cache
  c6t0d0p0 ONLINE   0 0 0
spares
  c9d1 INUSE currently in use

errors: No known data errors

And this has been going on for a week now, always restarting when it
should complete.

The questions in my mind atm: 


1. How can i determine the cause for each resilver? Is there a log?


If you're running OI b147 then you should be able to do the following:

# echo "::zfs_dbgmsg" | mdb -k > /var/tmp/dbg.out

Send me the output.



2. Why does it resilver the same data over and over, and not just
the changed bits?


If you're having drives fail prior to the initial resilver finishing 
then it will restart and do all the work over again. Are drives still 
failing randomly for you?




3. Can i force remove c9d1 as it is no longer needed but c11t3 can
be resilvered instead?


You can detach the spare and let the resilver work on only c11t3. Can 
you send me the output of 'zdb - tank 0'?


Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread George Wilson

Tom Bird wrote:

On 18/09/10 09:02, Ian Collins wrote:


In my case, other than an hourly snapshot, the data is not significantly 
changing.


It'd be nice to see a response other than "you're doing it wrong", 
rebuilding 5x the data on a drive relative to its capacity is clearly 
erratic behaviour, I am curious as to what is actually happening.


All said and done though, we will have to live with snv_134's bugs from 
now on, or perhaps I could try Sol 10.


Tom
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



It sounds like you're hitting '6891824 7410 NAS head "continually 
resilvering" following HDD replacement'. If you stop taking and 
destroying snapshots you should see the resilver finish.


Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hang on zpool import (dedup related)

2010-09-12 Thread George Wilson

Chris Murray wrote:

Another hang on zpool import thread, I'm afraid, because I don't seem to have 
observed any great successes in the others and I hope there's a way of saving 
my data ...

In March, using OpenSolaris build 134, I created a zpool, some zfs filesystems, 
enabled dedup on them, moved content into them and promptly discovered how slow 
it was because I only have 4GB RAM. Even with 30GB L2ARC, the performance was 
unacceptable. The trouble started when the machine hung one day. Ever since, 
I've been unable to import my pool without it hanging again. At the time I saw 
posts from others who had run into similar problems, so I thought it best that 
I wait until a later build, on the assumption that some ZFS dedup bug would be 
fixed and I could see my data again. I've been waiting ever since, and only 
just had a chance to try build 147, thanks to illumos and a schillix live CD.

However, the pool still won't import, so I'd much appreciate any 
troubleshooting hints and tips to help me on my way.

schillix b147i

My process is:
1. boot the live CD.
2. on the console session, run  vmstat 1
3. from another machine, SSH in with multiple sessions and:
vmstat 60
vmstat 1
zpool import -f zp
zpool iostat zp 1
zpool iostat zp -v 5
4. wait until it all stops

What I observe is that the zpool import command never finishes, there will be a 
lengthy period of read activity made up of very small reads which then stops 
before an even longer period of what looks like no disk activity.

zp   512G  1.31T  0  0  0  0

The box will be responsive for quite some time, seemingly doing not a great 
deal:

 kthr  memorypagedisk  faults  cpu
 r b w   swap  free  re  mf pi po fr de sr cd cd rm s0   in   sy   cs us sy id
 0 0 0 2749064 3122988 0  7  0  0  0  0  0  0  1  0  0  365  218  714  0  1 99

Then after a matter of hours it'll hang. SSH sessions are no longer responsive. 
On the console I can press return which creates a new line, but vmstat will 
have stopped updating.

Interestingly, what I observed in b134 was the same thing, however the free 
memory would slowly decrease over the course of hours, before a sudden 
nose-dive right before the lock up. Now it appears to hang without that same 
effect.

While the import appears to be working, I can cd to /zp and look at content of the 
filesystems of 5 of the 9 "esx*" directories.
Coincidence or not, it's the last four which appear to be empty - esx_prod 
onward.

# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
zp 905G  1.28T23K  /zp
zp/nfs 889G  1.28T32K  /zp/nfs
zp/nfs/esx_dev 264G  1.28T   264G  /zp/nfs/esx_dev
zp/nfs/esx_hedgehog   25.8G  1.28T  25.8G  /zp/nfs/esx_hedgehog
zp/nfs/esx_meerkat 223G  1.28T   223G  /zp/nfs/esx_meerkat
zp/nfs/esx_meerkat_dedup   938M  1.28T   938M  /zp/nfs/esx_meerkat_dedup
zp/nfs/esx_page   8.90G  1.28T  8.90G  /zp/nfs/esx_page
zp/nfs/esx_prod306G  1.28T   306G  /zp/nfs/esx_prod
zp/nfs/esx_skunk21K  1.28T21K  /zp/nfs/esx_skunk
zp/nfs/esx_temp   45.5G  1.28T  45.5G  /zp/nfs/esx_temp
zp/nfs/esx_template   15.2G  1.28T  15.2G  /zp/nfs/esx_template

Any help would be appreciated. What could be going wrong here? Is it getting 
progressively closer to becoming imported each time I try this, or will it be 
starting from scratch? Feels to me like there's an action in the 
/zp/nfs/esx_prod filesystem it's trying to replay and never getting to the end 
of, for some reason. In case it was getting in a muddle with the l2arc, I 
removed the cache device a matter of minutes into this run. It hasn't hung yet, 
vmstat is still updating, but I tried a 'zpool import' in one of the windows to 
see if I could even see a pool on another disk, and that hasn't returned me 
back to the prompt yet. Also tried to SSH in with another session, and that 
hasn't produced the login prompt.

Thanks in advance,
Chris


It looks like you may be past the import phase and into the mounting 
phase. What I would recommend is that you 'zpool import -N zp' so that 
none of the datasets get mounted and only the import happens. Then one 
by one you can mount the datasets in order (starting with 'zp') so you 
can find out which one maybe hanging.


- George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] what is zfs doing during a log resilver?

2010-09-06 Thread George Wilson

Arne Jansen wrote:

Giovanni Tirloni wrote:



On Thu, Sep 2, 2010 at 10:18 AM, Jeff Bacon > wrote:


So, when you add a log device to a pool, it initiates a resilver.

What is it actually doing, though? Isn't the slog a copy of the
in-memory intent log? Wouldn't it just simply replicate the data 
that's

in the other log, checked against what's in RAM? And presumably there
isn't that much data in the slog so there isn't that much to check?

Or is it just doing a generic resilver for the sake of argument 
because

you changed something?


Good question. Here it takes little over 1 hour to resilver a 32GB SSD 
in a mirror. I've always wondered what exactly it was doing since it 
was supposed to be 30 seconds worth of data. It also generates lots of 
checksum errors.


Here it takes more than 2 days to resilver a failed slog-SSD. I'd also
expect it to finish in a few seconds... It seems it resilvers the whole 
pool,

35T worth of data on 22 spindels (RAID-Z2).

We don't get any errors during resilver.

--
Arne

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Resilvering log devices should really be handled differently than other 
devices in the pool but we don't do that today. This is documented in 
CR: 6899591. As a workaround you can first remove the log device and 
then re-add it to the pool as a mirror-ed log device.


- George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS offline ZIL corruption not detected

2010-08-27 Thread George Wilson

Bob Friesenhahn wrote:

On Thu, 26 Aug 2010, George Wilson wrote:


What gets "scrubbed" in the slog?  The slog contains transient data 
which exists for only seconds at a time.  The slog is quite likely to be 
empty at any given point in time.


Bob


Yes, the typical ZIL block never lives long enough to scrub but if there 
are any blocks which have not been replayed (i.e. zil blocks for an 
unmounted filesystem) then those will get scrubbed.


- George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS offline ZIL corruption not detected

2010-08-26 Thread George Wilson

Edward Ned Harvey wrote:


Add to that:

During scrubs, perform some reads on log devices (even if there's nothing to
read).


We do read from log device if there is data stored on them.

In fact, during scrubs, perform some reads on every device (even if it's
actually empty.)


Reading from the data portion of an empty device wouldn't really show us 
much as we're going to be reading a bunch of non-checksummed data. The 
best we can do is to "probe" the device's label region to determine it's 
health. This is exactly what we do today.


- George



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS offline ZIL corruption not detected

2010-08-26 Thread George Wilson

David Magda wrote:

On Wed, August 25, 2010 23:00, Neil Perrin wrote:

Does a scrub go through the slog and/or L2ARC devices, or only the
"primary" storage components?


A scrub will go through slogs and primary storage devices. The L2ARC 
device is considered volatile and data loss is not possible should it fail.


- George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS offline ZIL corruption not detected

2010-08-26 Thread George Wilson

Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Neil Perrin

This is a consequence of the design for performance of the ZIL code.
Intent log blocks are dynamically allocated and chained together.
When reading the intent log we read each block and checksum it
with the embedded checksum within the same block. If we can't read
a block due to an IO error then that is reported, but if the checksum
does
not match then we assume it's the end of the intent log chain.
Using this design means we the minimum number of writes to add
write an intent log record is just one write.

So corruption of an intent log is not going to generate any errors.


I didn't know that.  Very interesting.  This raises another question ...

It's commonly stated, that even with log device removal supported, the most
common failure mode for an SSD is to blindly write without reporting any
errors, and only detect that the device is failed upon read.  So ... If an
SSD is in this failure mode, you won't detect it?  At bootup, the checksum
will simply mismatch, and we'll chug along forward, having lost the data ...
(nothing can prevent that) ... but we don't know that we've lost data?


If the drive's firmware isn't returning back a write error of any kind 
then there isn't much that ZFS can really do here (regardless of whether 
this is an SSD or not). Turning every write into a read/write operation 
would totally defeat the purpose of the ZIL. It's my understanding that 
SSDs will eventually transition to read-only devices once they've 
exceeded their spare reallocation blocks. This should propagate to the 
OS as an EIO which means that ZFS will instead store the ZIL data on the 
main storage pool.




Worse yet ... In preparation for the above SSD failure mode, it's commonly
recommended to still mirror your log device, even if you have log device
removal.  If you have a mirror, and the data on each half of the mirror
doesn't match each other (one device failed, and the other device is good)
... Do you read the data from *both* sides of the mirror, in order to
discover the corrupted log device, and correctly move forward without data
loss?


Yes, we read all sides of the mirror when we claim (i.e. read) the log 
blocks for a log device. This is exactly what a scrub would do for a 
mirrored data device.


- George



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How do I Import rpool to an alternate location?

2010-08-16 Thread George Wilson

Robert Hartzell wrote:

On 08/16/10 07:47 PM, George Wilson wrote:

The root filesystem on the root pool is set to 'canmount=noauto' so you
need to manually mount it first using 'zfs mount '. Then
run 'zfs mount -a'.

- George



mounting the dataset failed because the /mnt dir was not empty and "zfs 
mount -a" failed I guess because the first command failed.





It's possible that as part of the initial import that one of the mount 
points tried to create a directory under /mnt. You should first unmount 
everything associated with that pool, then ensure that /mnt is empty and 
mount the root filesystem first. Don't mount anything else until the 
root is mounted.


- George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How do I Import rpool to an alternate location?

2010-08-16 Thread George Wilson
 The root filesystem on the root pool is set to 'canmount=noauto' so 
you need to manually mount it first using 'zfs mount '. 
Then run 'zfs mount -a'.


- George


On 08/16/10 07:30 PM, Robert Hartzell wrote:
I have a disk which is 1/2 of a boot disk mirror from a failed system 
that I would like to extract some data from. So i install the disk to 
a test system and do:


zpool import -R /mnt -f rpool bertha

which gives me:


bertha102G   126G84K  /mnt/bertha
bertha/ROOT  34.3G   126G19K  legacy
bertha/ROOT/snv_134  34.3G   126G  10.9G  /mnt
bertha/Vbox  46.9G   126G  46.9G  /mnt/export/Vbox
bertha/dump  2.00G   126G  2.00G  -
bertha/export8.05G   126G31K  /mnt/export
bertha/export/home   8.05G  52.0G  8.01G  /mnt/export/home
bertha/mail  1.54M  5.00G  1.16M  /mnt/var/mail
bertha/swap 4G   130G   181M  -
bertha/zones 6.86G   126G24K  /mnt/export/zones
bertha/zones/bz1 6.05G   126G24K  
/mnt/export/zones/bz1

bertha/zones/bz1/ROOT6.05G   126G21K  legacy
bertha/zones/bz1/ROOT/zbe6.05G   126G  6.05G  legacy
bertha/zones/bz2  821M   126G24K  
/mnt/export/zones/bz2

bertha/zones/bz2/ROOT 821M   126G21K  legacy
bertha/zones/bz2/ROOT/zbe 821M   126G   821M  legacy




cd /mnt ; ls
bertha export var
ls bertha
boot etc

where is the rest of the file systems and data?



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] problem with zpool import - zil and cache drive are not displayed?

2010-08-03 Thread George Wilson

Darren,

It looks like you've lost your log device. The newly integrated missing 
log support will help once it's available. In the meantime, you should 
run 'zdb -l' on your log device to make sure the label is still intact.


Thanks,
George

Darren Taylor wrote:
I'm at a loss, I've managed to get myself into a fix. I'm not sure where the problem is, but essentially i have a zpool i cannot import. This particular pool used to have a two drives (not shown below), one for cache and another for log. I'm unsure why they are no longer detected on zpool import...  the disks are still connected to the system and show up when running "format" for a list. 


dar...@lexx:~# zpool import
  pool: tank
id: 15136317365944618902
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

tank UNAVAIL  missing device
  raidz1-0   ONLINE
c6t4d0   ONLINE
c6t5d0   ONLINE
c6t6d0   ONLINE
c6t7d0   ONLINE
  raidz1-1   ONLINE
c6t0d0   ONLINE
c6t1d0   ONLINE
c6t2d0   ONLINE
c6t3d0   ONLINE
dar...@lexx:~# 

The above disks are the data disks which appear to be online without issue. i was running version 22 on this pool. 


Any help appreciated


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: zpool import despite missing log [PSARC/2010/292Self Review]

2010-07-30 Thread George Wilson

Dmitry Sorokin wrote:



Thanks for the update Robert.

 

Currently I have failed zpool with slog missing, which I was unable to 
recover, although I was able to find out what the GUID was for the slog 
device (below is the uotput of zpool import command).


I couldn’t compile logfix binary either, so I ran out of any ideas of 
how I can recover this zpool.


So for now it just sits there untouched.

This proposed improvement to zfs is definetely a hope for me.

When do you think it’ll be implemented (roughly – this year, early next 
year….) and would I be able to import this pool at it’s current version 
22 (snv_129)?



Dmitry,

I can't comment on when this will be available but I can tell you that 
it will work with version 22. This requires that you have a pool that is 
running a minimum of version 19.


Thanks,
George



 

 


[r...@storage ~]# zpool import

pool: tank

id: 1346464136813319526

state: UNAVAIL

status: The pool was last accessed by another system.

action: The pool cannot be imported due to damaged devices or data.

   see: http://www.sun.com/msg/ZFS-8000-EY

config:

 


tank UNAVAIL  missing device

  raidz2-0   ONLINE

c4t0d0   ONLINE

c4t1d0   ONLINE

c4t2d0   ONLINE

c4t3d0   ONLINE

c4t4d0   ONLINE

c4t5d0   ONLINE

c4t6d0   ONLINE

c4t7d0   ONLINE

[r...@storage ~]#

 


Bets regards,

Dmitry

 

 

*From:* zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] *On Behalf Of *Robert Milkowski

*Sent:* Wednesday, July 28, 2010 7:12 PM
*To:* ZFS Discussions
*Subject:* [zfs-discuss] Fwd: zpool import despite missing log 
[PSARC/2010/292Self Review]


 



fyi

--
Robert Milkowski
http://milek.blogspot.com


 Original Message 

*Subject: *



zpool import despite missing log [PSARC/2010/292 Self Review]

*Date: *



Mon, 26 Jul 2010 08:38:22 -0600

*From: *



Tim Haley  <mailto:tim.ha...@oracle.com>

*To: *



psarc-...@sun.com <mailto:psarc-...@sun.com>

*CC: *



zfs-t...@sun.com <mailto:zfs-t...@sun.com>

 

I am sponsoring the following case for George Wilson.  Requested binding   

is micro/patch.  Since this is a straight-forward addition of a command   

line option, I think itqualifies for self review.  If an ARC member   

disagrees, let me know and I'll convert to a fast-track.  

   

Template Version: @(#)sac_nextcase 1.70 03/30/10 SMI  

This information is Copyright (c) 2010, Oracle and/or its affiliates.   

All rights reserved.  

1. Introduction  

1.1. Project/Component Working Name:  

 zpool import despite missing log  

1.2. Name of Document Author/Supplier:  

     Author:  George Wilson  

1.3  Date of This Document:  

26 July, 2010  

   

4. Technical Description  

   

OVERVIEW:  

   

 ZFS maintains a GUID (global unique identifier) on each device and  

 the sum of all GUIDs of a pool are stored into the ZFS uberblock.  

 This sum is used to determine the availability of all vdevs  

 within a pool when a pool is imported or opened.  Pools which  

 contain a separate intent log device (e.g. a slog) will fail to  

 import when that device is removed or is otherwise unavailable.  

 This proposal aims to address this particular issue.  

   

PROPOSED SOLUTION:  

   

 This fast-track introduce a new command line flag to the  

 'zpool import' sub-command.  This new option, '-m', allows  

 pools to import even when a log device is missing.  The contents  

 of that log device are obviously discarded and the pool will  

 operate as if the log device were offlined.  

   

MANPAGE DIFFS:  

   

   zpool import [-o mntopts] [-p property=value] ... [-d dir | -c  

cachefile]  

-  [-D] [-f] [-R root] [-n] [-F] -a  

+  [-D] [-f] [-m] [-R root] [-n] [-F] -a  

   

   

   zpool import [-o mntopts] [-o property=value] ... [-d dir | -c  

cachefile]  

-  [-D] [-f] [-R root] [-n] [-F] pool |id [newpool]  

+  [-D] [-f] [-m] [-R root] [-n] [-F] pool |id [newpool]  

   

   zpool import [-o mntopts] [ -o property=value] ... [-d dir |  

- -c cachefile] [-D] [-f] [-n] [-F] [-R root] -a  

+ -c cachefile] [-D] [-f] [-m] [-n] [-F] [-R root] -a  

   

   Imports all  pools  found  in  the  search  directories.  

   Identical to the previous command, except that all pools  

   

+ -m  

+  

+Allows a pool to import when there is a missing log device  

   

EXAMPLES:  

   

1). Configuration with a single intent log device:  

   

# zpool status tank  

   pool: tank  

state: ONLINE  

 scan: none requested  

 config:  

   

 NAMESTATE READ WRITE 

Re: [zfs-discuss] zfs hangs with B141 when filebench runs

2010-07-15 Thread George Wilson
I don't recall seeing this issue before. Best thing to do is file a bug 
and include a pointer to the crash dump.


- George

zhihui Chen wrote:

Looks that the txg_sync_thread for this pool has been blocked and
never return, which leads to many other threads have been
blocked. I have tried to change zfs_vdev_max_pending value from 10 to
35 and retested the workload serveral times, this issue
does not happen. But if I change it back to 10, it happens very
easily. Any known bug on this or any suggestion to solve this issue?


ff0502c3378c::wchaninfo -v

ADDR TYPE NWAITERS   THREAD   PROC
ff0502c3378c cond 1730:  ff051cc6b500 go_filebench
 ff051ce61020 go_filebench
 ff051cc4e4e0 go_filebench
 ff051d115120 go_filebench
 ff051e9ed000 go_filebench
 ff051bf644c0 go_filebench
 ff051c65b000 go_filebench
 ff051c728500 go_filebench
 ff050d83a8c0 go_filebench
 ff051c528c00 go_filebench
 ff051b750800 go_filebench
 ff051cdd7520 go_filebench
 ff051ce71bc0 go_filebench
 ff051cb5e840 go_filebench
 ff051cbdec60 go_filebench
 ff0516473c60 go_filebench
 ff051d132820 go_filebench
 ff051d13a400 go_filebench
 ff050fbf0b40 go_filebench
 ff051ce7a400 go_filebench
 ff051b781820 go_filebench
 ff051ce603e0 go_filebench
 ff051d1bf840 go_filebench
 ff051c6c24c0 go_filebench
 ff051d204100 go_filebench
 ff051cbdf160 go_filebench
 ff051ce52c00 go_filebench
 ...

ff051cc6b500::findstack -v

stack pointer for thread ff051cc6b500: ff0020a76ac0
[ ff0020a76ac0 _resume_from_idle+0xf1() ]
  ff0020a76af0 swtch+0x145()
  ff0020a76b20 cv_wait+0x61(ff0502c3378c, ff0502c33700)
  ff0020a76b70 zil_commit+0x67(ff0502c33700, 6b255, 14)
  ff0020a76d80 zfs_write+0xaaf(ff050b5c9140, ff0020a76e40,
40, ff0502dab258, 0)
  ff0020a76df0 fop_write+0x6b(ff050b5c9140, ff0020a76e40,
40, ff0502dab258, 0)
  ff0020a76ec0 pwrite64+0x244(1a, b6f2a000, 800, b841a800, 0)
  ff0020a76f10 sys_syscall32+0xff()

From the zil_commit code, I try to find the thread whose stack have
function call zil_commit_writer. This thread did not
return back from zil_commit_write so that it will not call
cv_broadcast to wake up the waiting threads.


ff051d10fba0::findstack -v

stack pointer for thread ff051d10fba0: ff0021ab9a10
[ ff0021ab9a10 _resume_from_idle+0xf1() ]
  ff0021ab9a40 swtch+0x145()
  ff0021ab9a70 cv_wait+0x61(ff051ae1b988, ff051ae1b980)
  ff0021ab9ab0 zio_wait+0x5d(ff051ae1b680)
  ff0021ab9b20 zil_commit_writer+0x249(ff0502c33700, 6b250, e)
  ff0021ab9b70 zil_commit+0x91(ff0502c33700, 6b250, e)
  ff0021ab9d80 zfs_write+0xaaf(ff050b5c9540, ff0021ab9e40,
40, ff0502dab258, 0)
  ff0021ab9df0 fop_write+0x6b(ff050b5c9540, ff0021ab9e40,
40, ff0502dab258, 0)
  ff0021ab9ec0 pwrite64+0x244(14, bfbfb800, 800, 88f3f000, 0)
  ff0021ab9f10 sys_syscall32+0xff()


ff051ae1b680::zio -r

ADDRESS  TYPE  STAGEWAITER
ff051ae1b680 NULL  CHECKSUM_VERIFY  ff051d10fba0
 ff051a9c1978WRITE VDEV_IO_START-
  ff052454d348   WRITE VDEV_IO_START-
 ff051572b960WRITE VDEV_IO_START-
  ff050accb330   WRITE VDEV_IO_START-
 ff0514453c80WRITE VDEV_IO_START-
  ff0524537648   WRITE VDEV_IO_START-
 ff05090e9660WRITE VDEV_IO_START-
  ff05151cb698   WRITE VDEV_IO_START-
 ff0514668658WRITE VDEV_IO_START-
  ff0514835690   WRITE VDEV_IO_START-
 ff05198979a0WRITE VDEV_IO_START-
  ff0507e1d038   WRITE VDEV_IO_START-
 ff0510727028WRITE VDEV_IO_START-
  ff0523a25018   WRITE VDEV

Re: [zfs-discuss] Scrub issues

2010-06-14 Thread George Wilson

Richard Elling wrote:

On Jun 14, 2010, at 2:12 PM, Roy Sigurd Karlsbakk wrote:

Hi all

It seems zfs scrub is taking a big bit out of I/O when running. During a scrub, 
sync I/O, such as NFS and iSCSI is mostly useless. Attaching an SLOG and some 
L2ARC helps this, but still, the problem remains in that the scrub is given 
full priority.


Scrub always runs at the lowest priority. However, priority scheduling only
works before the I/Os enter the disk queue. If you are running Solaris 10 or
older releases with HDD JBODs, then the default zfs_vdev_max_pending 
is 35. This means that your slow disk will have 35 I/Os queued to it before

priority scheduling makes any difference.  Since it is a slow disk, that could
mean 250 to 1500 ms before the high priority I/O reaches the disk.


Is this problem known to the developers? Will it be addressed?


In later OpenSolaris releases, the zfs_vdev_max_pending defaults to 10
which helps.  You can tune it lower as described in the Evil Tuning Guide.

Also, as Robert pointed out, CR 6494473 offers a more resource management
friendly way to limit scrub traffic (b143).  Everyone can buy George a beer for
implementing this change :-)



I'll glad accept any beer donations and others on the ZFS team are happy 
to help consume it. :-)


I look forward to hearing people's experience with the new changes.

- George


Of course, this could mean that on a busy system a scrub that formerly took
a week might now take a month.  And the fix does not directly address the 
tuning of the queue depth issue with HDDs.  TANSTAAFL.

 -- richard




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup status

2010-05-20 Thread George Wilson

Roy Sigurd Karlsbakk wrote:

Hi all

I've been doing a lot of testing with dedup and concluded it's not really ready 
for production. If something fails, it can render the pool unuseless for hours 
or maybe days, perhaps due to single-threded stuff in zfs. There is also very 
little data available in the docs (though I've from what I've got on this list) 
on how much memory one should have for deduping an xTiB dataset.

Does anyone know how the status is for dedup now? In 134 it doesn't work very 
well, but is it better in ON140 etc?

Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



I just integrated a performance improvement for dedup which will 
dramatically help when the dedup table does not fit in memory. For more 
details take a look at:


6938089 dedup-induced latency causes FC initiator logouts/FC port resets

This will improve performance for such tasks as rm-ing files in a dedup 
enabled dataset, and destroying a dedup enabled dataset. It's still a 
best practice to size your system accordingly such that the dedup table 
can stay resident in the ARC or L2ARC.


- George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Heads-Up: Changes to the zpool(1m) command

2009-12-02 Thread George Wilson

Some new features have recently integrated into ZFS which have change
the output of zpool(1m) command. Here's a quick recap:

1) 6574286 removing a slog doesn't work

This change added the concept of named top-level devices for the purpose
of device removal. The named top-levels are constructed by using the
logical name (mirror, raidz2, etc) and appending a unique numeric
number. After this change 'zpool status' and 'zpool import' will now
print the configuration using this new naming convention:

jaggs# zpool status tank
  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz2-0  ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c1t0d0  ONLINE   0 0 0
c4t0d0  ONLINE   0 0 0
c6t0d0  ONLINE   0 0 0
c7t0d0  ONLINE   0 0 0

errors: No known data errors

2) 6677093 zfs should have dedup capability

This project modified the default 'zpool list' to show the "dedupratio"
for each pool. Subsequently a new property, "dedupratio", is available
when using 'zpool get':

jaggs# zpool list
NAME SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
export   928G  47.5G   881G 5%  1.77x  ONLINE  -
rpool928G  25.7G   902G 2%  1.40x  ONLINE  -

jaggs# zpool get dedup rpool
NAME   PROPERTYVALUE  SOURCE
rpool  dedupratio  1.40x  -

3) 6897693 deduplication can only go so far

The integration of dedup changed the way we report report "used" and
"available" space in a pool. In particular, 'zpool list' reports
"allocated" and "free" physical blocks opposed to 'zfs list' shows
"used" and "available" space to the filesystem. This change replaced the
the "used" property with "allocated" and "available" with "free". This
should help clarify the accounting difference reported by the two
utilities. This does, however, impact any scripts which utilized the old
"used" and "available" properties of the zpool command. Those scripts
should be updated to use the new naming convention:

jaggs# zpool list
NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
rpool  464G  64.6G   399G13%  1.00x  ONLINE  -
tank  2.27T   207K  2.27T 0%  1.00x  ONLINE  -

jaggs# zpool get allocated,free rpool
NAME   PROPERTY   VALUE  SOURCE
rpool  allocated  64.6G  -
rpool  free   399G   -


We realize that these changes may impact some user scripts and we 
apologize for any inconvenience this may cause.


Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR# 6574286, remove slog device

2009-11-30 Thread George Wilson

Moshe Vainer wrote:

I am sorry, i think i confused the matters a bit. I meant the bug that prevents 
importing with slog device missing, 6733267.
I am aware that one can remove a slog device, but if you lose your rpool and 
the device goes missing while you rebuild, you will lose your pool in its 
entirety. Not a situation to be tolerated in production.


Expect the fix for this issue this month.

Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedupe question

2009-11-08 Thread George Wilson

Dennis Clarke wrote:

On Sat, 2009-11-07 at 17:41 -0500, Dennis Clarke wrote:

Does the dedupe functionality happen at the file level or a lower block
level?

it occurs at the block allocation level.


I am writing a large number of files that have the fol structure :

-- file begins
1024 lines of random ASCII chars 64 chars long
some tilde chars .. about 1000 of then
some text ( english ) for 2K
more text ( english ) for 700 bytes or so
--

ZFS's default block size is 128K and is controlled by the "recordsize"
filesystem property.  Unless you changed "recordsize", each of the files
above would be a single block distinct from the others.

you may or may not get better dedup ratios with a smaller recordsize
depending on how the common parts of the file line up with block
boundaries.

the cost of additional indirect blocks might overwhelm the savings from
deduping a small common piece of the file.

- Bill


Well, I as curious about these sort of things and figured that a simple
test would show me the behavior.

Now the first test I did was to write 26^2 files [a-z][a-z].dat in 26^2
directories named [a-z][a-z] where each file is 64K of random
non-compressible data and then some english text.

I guess I was wrong about the 64K random text chunk also .. because I
wrote out that data as chars from the set { [A-Z][a-z][0-9] } and thus ..
compressible ASCII data as opposed to random binary data.

So ... after doing that a few times I now see something fascinating :

$ ls -lo /tester/foo/*/aa/aa.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:38 /tester/foo/1/aa/aa.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:45 /tester/foo/2/aa/aa.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:43 /tester/foo/3/aa/aa.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:43 /tester/foo/4/aa/aa.dat
$ ls -lo /tester/foo/*/zz/az.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:39 /tester/foo/1/zz/az.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:47 /tester/foo/2/zz/az.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:45 /tester/foo/3/zz/az.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:47 /tester/foo/4/zz/az.dat

$ find /tester/foo -type f | wc -l
   70304

Those files, all 70,000+ of them, are unique and smaller than the
filesystem blocksize.

However :

$ zfs get
used,available,referenced,compressratio,recordsize,compression,dedup
zp_dd/tester
NAME  PROPERTY   VALUE SOURCE
zp_dd/tester  used   4.51G -
zp_dd/tester  available  3.49G -
zp_dd/tester  referenced 4.51G -
zp_dd/tester  compressratio  1.00x -
zp_dd/tester  recordsize 128K  default
zp_dd/tester  compressionoff   local
zp_dd/tester  dedup  onlocal

Compression factors don't interest me at the moment .. but see this :

$ zpool get all zp_dd
NAME   PROPERTY   VALUE   SOURCE
zp_dd  size   67.5G   -
zp_dd  capacity   6%  -
zp_dd  altroot-   default
zp_dd  health ONLINE  -
zp_dd  guid   14649016030066358451  default
zp_dd  version21  default
zp_dd  bootfs -   default
zp_dd  delegation on  default
zp_dd  autoreplaceoff default
zp_dd  cachefile  -   default
zp_dd  failmode   waitdefault
zp_dd  listsnapshots  off default
zp_dd  autoexpand off default
zp_dd  dedupratio 1.95x   -
zp_dd  free   63.3G   -
zp_dd  allocated  4.22G   -

The dedupe ratio has climbed to 1.95x with all those unique files that are
less than %recordsize% bytes.



You can get more dedup information by running 'zdb -DD zp_dd'. This 
should show you how we break things down. Add more 'D' options and get 
even more detail.


- George

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS dedup issue

2009-11-03 Thread George Wilson

Eric Schrock wrote:


On Nov 3, 2009, at 12:24 PM, Cyril Plisko wrote:


I think I'm observing the same (with changeset 10936) ...


  # mkfile 2g /var/tmp/tank.img
  # zpool create tank /var/tmp/tank.img
  # zfs set dedup=on tank
  # zfs create tank/foobar


This has to do with the fact that dedup space accounting is charged 
to all
filesystems, regardless of whether blocks are deduped.  To do 
otherwise is
impossible, as there is no true "owner" of a block, and the fact that 
it may
or may not be deduped is often beyond the control of a single 
filesystem.


This has some interesting pathologies as the pool gets full.  Namely, 
that
ZFS will artificially enforce a limit on the logical size of the pool 
based
on non-deduped data.  This is obviously something that should be 
addressed.




Eric,

Many people (me included) perceive deduplication as a mean to save
disk space and allow more data to be squeezed into a storage. What you
are saying is that effectively ZFS dedup does a wonderful job in
detecting duplicate blocks and goes into all the trouble of removing
an extra copies and keep accounting of everything. However, when it
comes to letting me use the freed space I will be plainly denied to do
so. If that so, what would be the reason to use ZFS deduplication at
all ?


Please read my response before you respond.  What do you think "this is 
obviously something that should be addressed" means?  There is already a 
CR filed and the ZFS team is working on it.


We have a fix for this and it should be available in a couple of days.

- George



- Eric




--
Regards,
   Cyril


--
Eric Schrock, Fishworks
http://blogs.sun.com/eschrock




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool export taking hours

2009-07-29 Thread George Wilson

fyleow wrote:

fyleow wrote:

I have a raidz1 tank of 5x 640 GB hard drives on my

newly installed OpenSolaris 2009.06 system. I did a
zpool export tank and the process has been running
for 3 hours now taking up 100% CPU usage.

When I do a zfs list tank it's still shown as

mounted. What's going on here? Should it really be
taking this long?

$ zfs list tank
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank  1.10T  1.19T  36.7K  /tank

$ zpool status tank
  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
c7t0d0  ONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
c7t4d0  ONLINE   0 0 0

errors: No known data errors

Can you run the following command and post the
output:

# echo "::pgrep zpool | ::walk thread | ::findstack
-v" | mdb -k


Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discu
ss


Here's what I get

# echo "::pgrep zpool | ::walk thread | ::findstack -v" | mdb -k
stack pointer for thread ff00f717b020: ff0003684cf0
  ff0003684d60 restore_mstate+0x129(fb8568ee)


It might be best to generate a live crash dump so we can see what might 
be hanging up. You can also try running the command above multiple times 
and even run 'pstack ' to see if we get additional 
information.


Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool export taking hours

2009-07-27 Thread George Wilson

fyleow wrote:

I have a raidz1 tank of 5x 640 GB hard drives on my newly installed OpenSolaris 
2009.06 system. I did a zpool export tank and the process has been running for 
3 hours now taking up 100% CPU usage.

When I do a zfs list tank it's still shown as mounted. What's going on here? 
Should it really be taking this long?

$ zfs list tank
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank  1.10T  1.19T  36.7K  /tank

$ zpool status tank
  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
c7t0d0  ONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
c7t4d0  ONLINE   0 0 0

errors: No known data errors


Can you run the following command and post the output:

# echo "::pgrep zpool | ::walk thread | ::findstack -v" | mdb -k


Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-22 Thread George Wilson
Once these bits are available in Opensolaris then users will be able to 
upgrade rather easily. This would allow you to take a liveCD running 
these bits and recover older pools.


Do you currently have a pool which needs recovery?

Thanks,
George

Alexander Skwar wrote:

Hi.

Good to Know!

But how do we deal with that on older sStems, which don't have the
patch applied, once it is out?

Thanks, Alexander

On Tuesday, July 21, 2009, George Wilson  wrote:
  

Russel wrote:

OK.

So do we have an zpool import --xtg 56574 mypoolname
or help to do it (script?)

Russel


We are working on the pool rollback mechanism and hope to have that soon. The 
ZFS team recognizes that not all hardware is created equal and thus the need 
for this mechanism. We are using the following CR as the tracker for this work:

6667683 need a way to rollback to an uberblock from a previous txg

Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-21 Thread George Wilson

Russel wrote:

OK.

So do we have an zpool import --xtg 56574 mypoolname
or help to do it (script?)

Russel
  
We are working on the pool rollback mechanism and hope to have that 
soon. The ZFS team recognizes that not all hardware is created equal and 
thus the need for this mechanism. We are using the following CR as the 
tracker for this work:


6667683 need a way to rollback to an uberblock from a previous txg

Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread George Wilson

Ben wrote:

Hi all,

I have a ZFS mirror of two 500GB disks, I'd like to up these to 1TB disks, how 
can I do this?  I must break the mirror as I don't have enough controller on my 
system board.  My current mirror looks like this:

[b]r...@beleg-ia:/share/media# zpool status share
pool: share
state: ONLINE
scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
share   ONLINE   0 0 0
mirrorONLINE   0 0 0
c5d0s0  ONLINE   0 0 0
c5d1s0  ONLINE   0 0 0

errors: No known data errors[/b]

If I detach c5d1s0, add a 1TB drive, attach that, wait for it to resilver, then 
detach c5d0s0 and add another 1TB drive and attach that to the zpool, will that 
up the storage of the pool?

Thanks very much,
Ben
  


The following changes, which went into snv_116, change this behavior:

PSARC 2008/353 zpool autoexpand property
6475340 when lun expands, zfs should expand too
6563887 in-place replacement allows for smaller devices
6606879 should be able to grow pool without a reboot or export/import
6844090 zfs should be able to mirror to a smaller disk

With this change we introduced a new property ('autoexpand') which you must 
enable if you want devices to automatically grow (this includes replacing them 
with larger ones). You can alternatively use the '-e' (expand) option to 'zpool 
online' to grow individual drives even if 'autoexpand' is disabled. The reason 
we made this change was so that all device expansion would be managed in the 
same way. I'll try to blog about this soon but for now be aware that post 
snv_116 the typical method of growing pools by replacing devices will require 
at least one additional step.

Thanks,
George


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Issues with slightly different sized drives in raidz pool?

2009-06-08 Thread George Wilson

David Bryan wrote:

Sorry if the question has been discussed before...did a pretty extensive 
search, but no luck...

Preparing to build my first raidz pool. Plan to use 4 identical drives in a 3+1 
configuration.

My question is -- what happens if one drive dies, and when I replace it, design 
has changed slightly and the drive is (very slightly) different sized. Still a 
1TB or what have you, but not identical. I'm guessing if it is slightly larger, 
no problem, slightly smaller is trouble, but that isn't always obvious when you 
buy a drive. My concern is that in a year when the drive blows, XYZ brand's 
model 1000 will be replaced by XYZ model 1001 that formats to 1MB less (or 
worse, I need to replace an XYZ brand 1TB with a similar 1TB ABC brand)

Is there a best practices suggestion here? Is this a real problem? Can I force 
format the drives very slightly less than full capacity before adding them to 
the pool to prevent such an issue?

Thanks.
  
I just integrated changes which addresses '6844090 zfs should be able to 
mirror to a smaller disk'. This allows ZFS to deal with slightly 
different sized devices as long as we can create the same number of 
metaslabs. With this change you should be okay.


Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LUN expansion

2009-06-08 Thread George Wilson

Leonid Zamdborg wrote:

George,

Is there a reasonably straightforward way of doing this partition table edit 
with existing tools that won't clobber my data?  I'm very new to ZFS, and 
didn't want to start experimenting with a live machine.
  

Leonid,

What you could do is to write a program which calls 
efi_use_whole_disk(3EXT) to re-write the label for you. Once you have a 
new label you will be able to export/import the pool and it will pickup 
the new size.


BTW, the LUN expansion project was just integrated today.

Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LUN expansion

2009-06-04 Thread George Wilson

Leonid,

I will be integrating this functionality within the next week:

PSARC 2008/353 zpool autoexpand property
6475340 when lun expands, zfs should expand too

Unfortunately, the won't help you until they get pushed to Opensolaris. 
The problem you're facing is that the partition table needs to be 
expanded to use the newly created space. This all happens automatically 
with my code changes but if you want to do this you'll have to change 
the partition table and export/import the pool.


Your other option is to wait till these bits show up in Opensolaris.

Thanks,
George

Leonid Zamdborg wrote:

Hi,

I have a problem with expanding a zpool to reflect a change in the underlying 
hardware LUN.  I've created a zpool on top of a 3Ware hardware RAID volume, 
with a capacity of 2.7TB.  I've since added disks to the hardware volume, 
expanding the capacity of the volume to 10TB.  This change in capacity shows up 
in format:

0. c0t0d0 
/p...@0,0/pci10de,3...@e/pci13c1,1...@0/s...@0,0

When I do a prtvtoc /dev/dsk/c0t0d0, I get:

* /dev/dsk/c0t0d0 partition map
*
* Dimensions:
* 512 bytes/sector
* 21484142592 sectors
* 5859311549 accessible sectors
*
* Flags:
*   1: unmountable
*  10: read-only
*
* Unallocated space:
*   First SectorLast
*   Sector CountSector 
*  34   222   255

*
*  First SectorLast
* Partition  Tag  FlagsSector CountSector  Mount Directory
   0  400256 5859294943 5859295198
   8 1100  5859295199 16384 5859311582

The new capacity, unfortunately, shows up as inaccessible.  I've tried exporting and 
importing the zpool, but the capacity is still not recognized.  I kept seeing things 
online about "Dynamic LUN Expansion", but how do I do this?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing HDD with larger HDD..

2009-05-23 Thread George Wilson

Jorgen Lundman wrote:



Rob Logan wrote:

you meant to type
zpool import -d /var/tmp grow



Bah - of course, I can not just expect zpool to know what random 
directory to search.


You Sir, are a genius.

Works like a charm, and thank you.

Lund

I will be integrating some changes soon which will automatically grow 
the pool without having to export/reboot.


You can look up the any of the following for more info:

PSARC 2008/353 zpool autoexpand property
6475340 when lun expands, zfs should expand too
6563887 in-place replacement allows for smaller devices
6606879 should be able to grow pool without a reboot or export/import

Thanks,
George

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR# 6574286, remove slog device

2009-05-23 Thread George Wilson

Mike Gerdts wrote:
On Tue, May 19, 2009 at 2:16 PM, Paul B. Henson > wrote:

>
> I was checking with Sun support regarding this issue, and they say 
"The CR
> currently has a high priority and the fix is understood. However, 
there is

> no eta, workaround, nor IDR."
>
> If it's a high priority, and it's known how to fix it, I was curious 
as to

> why has there been no progress? As I understand, if a failure of the log
> device occurs while the pool is active, it automatically switches 
back to
> an embedded pool log. It seems removal would be as simple as 
following the

> failure path to an embedded log, and then update the pool metadata to
> remove the log device. Is it more complicated than that? We're about 
to do

> some testing with slogs, and it would make me a lot more comfortable to
> deploy one in production if there was a backout plan :)...
>

A rather interesting putback just happened...

http://hg.genunix.org/onnv-gate.hg/rev/cc5b64682e64

6803605  
should be able to offline log devices
6726045  
vdev_deflate_ratio is not set when offlining a log device
6599442  zpool 
import has faults in the display


I love comments that tell you what is really going on...

8.75 
  /*
8.76 
   * If this device has the only valid copy of some data,

8.77  -  
   * don't allow it to be offlined.
8.78  +  
   * don't allow it to be offlined. Log devices are always

8.79  +  
   * expendable.
8.80 
   */


For some reason, the CR's listed above are not available through 
bugs.opensolaris.org .  However, at least 
6833605 is available through sunsolve if you have a support contract.


--
Mike Gerdts
http://mgerdts.blogspot.com/

This putback is the precursor to slog device removal and the ability to  
import pools with failed slogs. I'll provide more details as we get 
closer to integrating the slog removal feature. We are working on it, it 
is one of our top priorities.


Stay tuned for more details...

George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] vdev_disk_io_start() sending NULL pointer in ldi_ioctl()

2009-04-10 Thread George Wilson

shyamali.chakrava...@sun.com wrote:

Hi All,

I have corefile where we see NULL pointer de-reference PANIC as we 
have sent (deliberately) NULL pointer for return value.



vdev_disk_io_start()
...
...

error = ldi_ioctl(dvd->vd_lh, zio->io_cmd,
   (uintptr_t)&zio->io_dk_callback,
   FKIOCTL, kcred, NULL);


ldi_ioctl() expects last parameter as an integer pointer ( int 
*rvalp).  I see that in strdoictl().  Corefile I am analysing has 
similar BAD trap while trying tostw%g0, [%i5]  ( clr   
[%i5] )


This doesn't make since as strdoictl() should only be called on a 
stream. Normal call path should be to cdev_ioctl() and eventually to 
sdioctl(). Can you provide the stack?


- George


/*
* Set return value.
*/
   *rvalp = iocbp->ioc_rval;


*/

Is it a bug??  This code is all we do in vdev_disk_io_start().  I 
would appreciate any feedback on this.


regards,
--shyamali
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS hanging on import

2009-04-03 Thread George Wilson

Arne Schwabe wrote:

Am 03.04.2009 2:42 Uhr, schrieb George Wilson:

Arne Schwabe wrote:

Hi,

I have a zpool in a degraded state:

[19:15]{1}a...@charon:~% pfexec zpool import

  pool: npool
id: 5258305162216370088
 state: DEGRADED
status: The pool is formatted using an older on-disk version.
action: The pool can be imported despite missing or damaged 
devices.  The

fault tolerance of the pool may be compromised if imported.
config:

npool   DEGRADED
  raidz1DEGRADED
c3d0ONLINE
c7d0ONLINE
replacing   UNAVAIL  insufficient replicas
  c7d0s0/o  FAULTED  corrupted data
  c7d0  FAULTED  corrupted data



This output looks really busted. What were the original disks that 
you used? Looks like you may have lost one of them.


The following output will be useful:

# zdb -l /dev/dsk/c3d0s0
# zdb -l /dev/dsk/c7d0s0
Yes the "failed c7d0" is beyond hope and the at the moment not 
physically connected to the system. If I reconnected the drive it will 
show up as. I removed the drive because I wanted to try to import the 
pool with the two good drives. But it does make no difference if I 
import it with or without it. Is there any guid/howto start who to 
debug zfs/solaris kernel? I have done kernel debugging before but not 
with solaris.


  pool: npoolid: 5258305162216370088
 state: DEGRADED
status: The pool is formatted using an older on-disk version.
action: The pool can be imported despite missing or damaged devices.  The
fault tolerance of the pool may be compromised if imported.
config:

npool   DEGRADED
  raidz1DEGRADED
c3d0ONLINE
c7d0ONLINE
replacing   DEGRADED
  c7d0s0/o  FAULTED  corrupted data
  c6d0  ONLINE

I think at one moment I must have switched the drives physically when 
I replaced the failed drive (ironically with another bad drive, which 
is why I disconnected it again)


zdb -l /dev/dsk/c3d0s0 gives:


LABEL 0

version=13
name='npool'
state=1
txg=10901730
pool_guid=5258305162216370088
hostid=5148207
hostname='charon'
top_guid=17800957881283225684
guid=5737717478922700505
vdev_tree
type='raidz'
id=0
guid=17800957881283225684
nparity=1
metaslab_array=14
metaslab_shift=33
ashift=9
asize=1500262957056
is_log=0
children[0]
type='disk'
id=0
guid=5737717478922700505
path='/dev/dsk/c3d0s0'

devid='id1,c...@awdc_wd5000aajb-00yra0=_wd-wcas81952111/a'

phys_path='/p...@0,0/pci-...@2,5/i...@0/c...@0,0:a'
whole_disk=1
DTL=85
children[1]
type='disk'
id=1
guid=17036915785869798182
path='/dev/dsk/c6d0s0'
devid='id1,c...@asamsung_hd501lj=s0muj1fpc08381/a'
phys_path='/p...@0,0/pci-...@5/i...@0/c...@0,0:a'
whole_disk=1
DTL=35
children[2]
type='replacing'
id=2
guid=10545980583204781570
whole_disk=0
children[0]
type='disk'
id=0
guid=232847032327795094
path='/dev/dsk/c7d0s0/old'
phys_path='/p...@0,0/pci-...@5/i...@1/c...@0,0:a'
whole_disk=1
DTL=339
children[1]
type='disk'
id=1
guid=13182214352713316760
path='/dev/dsk/c7d0s0'

devid='id1,c...@asamsung_hd501lj=s0muj1fpc08380/a'

phys_path='/p...@0,0/pci-...@5/i...@1/c...@0,0:a'
whole_disk=1
DTL=123
faulted=1

LABEL 1

version=13
name='npool'
state=1
txg=10901730
pool_guid=5258305162216370088
hostid=5148207
hostname='charon'
top_guid=17800957881283225684
guid=5737717478922700505
vdev_tree
type='raidz'
id=0
guid=17800957881283225684
nparity=1
metaslab_array=14
metaslab_shift=33
ashift=9
asize=1500262957056
is_log=0
children[0]
type

Re: [zfs-discuss] ZFS hanging on import

2009-04-02 Thread George Wilson

Arne Schwabe wrote:

Hi,

I have a zpool in a degraded state:

[19:15]{1}a...@charon:~% pfexec zpool import

  pool: npool
id: 5258305162216370088
 state: DEGRADED
status: The pool is formatted using an older on-disk version.
action: The pool can be imported despite missing or damaged devices.  The
fault tolerance of the pool may be compromised if imported.
config:

npool   DEGRADED
  raidz1DEGRADED
c3d0ONLINE
c7d0ONLINE
replacing   UNAVAIL  insufficient replicas
  c7d0s0/o  FAULTED  corrupted data
  c7d0  FAULTED  corrupted data



This output looks really busted. What were the original disks that you 
used? Looks like you may have lost one of them.


The following output will be useful:

# zdb -l /dev/dsk/c3d0s0
# zdb -l /dev/dsk/c7d0s0

Thanks,
George



When I try to import the pool the opensolaris box simply hangs. I can 
still ping it. But nothing else works. ssh only works to the point where 
the 3way handshake is established. The strange thing  is, that there is 
no kernel panic or any log.


The last mmessage from truss zpool import npool are:

open("/dev/dsk/c7d0s0", O_RDONLY)   = 6
fxstat(2, 6, 0x08043250)= 0
modctl(MODSIZEOF_DEVID, 0x01980080, 0x0804324C, 0xFEA41239, 0xFE8E92C0) = 0
modctl(MODGETDEVID, 0x01980080, 0x002A, 0x080D18D0, 0xFE8E92C0) = 0
fxstat(2, 6, 0x08043250)= 0
modctl(MODSIZEOF_MINORNAME, 0x01980080, 0x6000, 0x0804324C, 
0xFE8E92C0) = 0

modctl(MODGETMINORNAME, 0x01980080, 0x6000, 0x0002, 0x0808DFC8) = 0
close(6)= 0
ioctl(3, ZFS_IOC_POOL_STATS, 0x080423B0)Err#2 ENOENT
ioctl(3, ZFS_IOC_POOL_TRYIMPORT, 0x08042420)= 0
open("/usr/lib/locale/de_DE.UTF-8/LC_MESSAGES/SUNW_OST_OSLIB.mo", 
O_RDONLY) Err#2 ENOENT


I think the systems hangs on ZFS_IOC_POOL_TRYIMPORT

Any pointer where I could try to debug/diagnose the problem further? 
System is already at 110 but 109 behaved the same.


Arne
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS crashes on boot

2009-04-02 Thread George Wilson

Cyril Plisko wrote:

On Tue, Mar 31, 2009 at 11:01 PM, George Wilson  wrote:

Cyril Plisko wrote:

On Thu, Mar 26, 2009 at 8:45 PM, Richard Elling
 wrote:


assertion failures are bugs.


Yup, I know that.



 Please file one at http://bugs.opensolaris.org


Just did.


Do you have a crash dump from this issue?


George,

Getting crash dump turned out to be somewhat problematic. Apparently it
panics before the dump volume is being activated (or so it seems).
Moreover the machine owners decided to put other disks inside and to
get it working (they were planning to put bigger disks anyhow). I have
the original disks with pool that causes that kept aside, but I am
looking for a machine to put these disks in. Since there are 10 disks
it is not trivial to find a suitable machine. I think, however, that I
may try having only 5 disks (one half of the each mirror). Do you
think it is ok to try it that way ?


Given that the panic was pretty reproducible I would think that having 
half the mirrors would be sufficient. BTW, did you ever try to import 
the root pool by booting failsafe or off the DVD/CD?


Thanks,
George





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS crashes on boot

2009-03-31 Thread George Wilson

Cyril Plisko wrote:

On Thu, Mar 26, 2009 at 8:45 PM, Richard Elling
 wrote:
  

assertion failures are bugs.



Yup, I know that.

  

 Please file one at http://bugs.opensolaris.org



Just did.
  


Do you have a crash dump from this issue?

- George
  

You may need to try another version of the OS, which may not have
the bug.



Well, I kinda guessed that. I hoped, may be wrongly, to hear something
more concrete... Tough luck, I guess...

  

-- richard

Cyril Plisko wrote:


Hello !

I have a machine that started to panic on boot (see panic message
below). It think it panics when it imports the pool (5 x 2 mirror).
Are there any ways to recover from that ?

Some history info: that machine was upgraded a couple of days ago from
snv78 to snv110. This morning zpool was upgraded to v14 and scrub was
run to verify data health. After 3 or 4 hours the scrub was stopped
(the IO impact was considered too high for the moment). Short time
after that one person reboot it (because it felt sluggish [I hope that
person will never get root access again ! ]). On reboot machine
panic'ed. I had a another boot disk with fresh b110, so I booted from
it, only to see it panic'ing again on zpool import.

So, any ideas how to get this pool imported ? This specific
organization uses Linux everywhere, but fileservers, due to ZFS. It
would be pity to let them loose their trust.

Here is the panic.

panic[cpu2]/thread=ff000c697c60: assertion failed: 0 ==
zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx),
file: ../../common/fs/zfs/dsl_dataset.c, line: 1493

ff000c6978d0 genunix:assfail+7e ()
ff000c697a50 zfs:dsl_dataset_destroy_sync+84b ()
ff000c697aa0 zfs:dsl_sync_task_group_sync+eb ()
ff000c697b10 zfs:dsl_pool_sync+112 ()
ff000c697ba0 zfs:spa_sync+32a ()
ff000c697c40 zfs:txg_sync_thread+265 ()
ff000c697c50 unix:thread_start+8 ()




  




  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] GSoC 09 zfs ideas?

2009-03-03 Thread George Wilson

Richard Elling wrote:

David Magda wrote:


On Feb 27, 2009, at 18:23, C. Bergström wrote:


Blake wrote:

Care to share any of those in advance?  It might be cool to see input
from listees and generally get some wheels turning...


raidz boot support in grub 2 is pretty high on my list to be honest..

Which brings up another question of where is the raidz stuff mostly?

usr/src/uts/common/fs/zfs/vdev_raidz.c ?

Any high level summary, docs or blog entries of what the process 
would look like for a raidz boot support is also appreciated.


Given the threads that have appeared on this list lately, how about 
codifying / standardizing the output of "zfs send" so that it can be 
backed up to tape? :)


It wouldn't help.  zfs send is a data stream which contains parts of 
files,

not files (in the usual sense), so there is no real way to take a send
stream and extract a file, other than by doing a receive.

At the risk of repeating the Best Practices Guide (again):
The zfs send and receive commands do not provide an enterprise-level 
backup solution.

-- richard
Along these lines you can envision a restore tool that is capable of 
reading multiple 'zfs send' streams to construct the various versions of 
files which are available. In addition, it would be nice if the tool 
could read in the streams and then make it easy to traverse and 
construct a single file from all available streams. For example, if I 
have 5 send streams then the tool would be able to ingest all the data 
and provide a directory structure similar to .zfs which would allow you 
to restore any file which is completely intact.


- George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] GSoC 09 zfs ideas?

2009-03-03 Thread George Wilson

Matthew Ahrens wrote:

Blake wrote:

zfs send is great for moving a filesystem with lots of tiny files,
since it just handles the blocks :)



I'd like to see:

pool-shrinking (and an option to shrink disk A when i want disk B to
become a mirror, but A is a few blocks bigger)


I'm working on it.


install to mirror from the liveCD gui

zfs recovery tools (sometimes bad things happen)


We've actually discussed this at length and there will be some work 
started soon.


automated installgrub when mirroring an rpool

I'm working on it.

- George


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing a Root Pool

2008-08-29 Thread George Wilson
Krister Joas wrote:
> Hello.
> 
> I have a machine at home on which I have SXCE B96 installed on a root  
> zpool mirror.  It's been working great until yesterday.  The root pool  
> is a mirror with two identical 160GB disks.  The other day I added a  
> third disk to the mirror, a 250 GB disk.  Soon after, the third disk  
> developed some hardware problem and this is now preventing the system  
> from booting from the root pool.  It panics early on and reboots.
> 
> I'm trying to repair the system by dropping into single user mode  
> after booting from a DVD-ROM.  I had to yank the third disk in order  
> for the machine to boot successfully at all.  However, in single user  
> mode I'm so far unable to do anything useful with the pool.  Using  
> "zpool import" the pool is listed as being DEGRADED, with one device  
> being UNAVAILABLE (cannot open).  The pool is also shown to be last  
> accessed by another system.  All this is as expected.  Any command  
> other than "zpool import" knows nothing about the pool "rpool", e.g.  
> "zpool status".  Assuming I have to import the pool before doing  
> anything like detaching any bad devices I try importing it using  
> "zpool import -f rpool".  This displays an error:
> 
>  cannot import 'rpool': one or more devices is currently unavailable
> 
> At this point I'm stuck.  I can't boot from the pool and I can't  
> access the pool after booting into single user mode from a DVD-ROM.   
> Does anyone have any advice on how to repair my system?
> 
> Krister
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Krister,

When you boot off of the DVD, there should be an option to go into 
single-user shell. This will search for root instances. Does this get 
displayed?

Once you get to the DVD shell prompt, can you try to run 'zpool import 
-F rpool'?

thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap & dump on ZFS volume

2008-07-02 Thread George Wilson
Kyle McDonald wrote:
> David Magda wrote:
>   
>> Quite often swap and dump are the same device, at least in the  
>> installs that I've worked with, and I think the default for Solaris  
>> is that if dump is not explicitly specified it defaults to swap, yes?  
>> Is there any reason why they should be separate?
>>
>>   
>> 
> I beleive there are technical limitations with ZFS Boot that stop them 
> from sharing the same Zvol..
>   
Yes, there is. Swap zvols are ordinary zvols which still COW their 
blocks and leverage checksumming, etc. Dump zvols don't have this luxury 
because when the system crashes you are limited in the number of tasks 
that you can perform. So we solved this by changing the personality of a 
zvol when it's added as a dump device. In particular, we needed to make 
sure that all the blocks that the dump device cared about were available 
at the time of a system crash. So we preallocate the dump device when it 
gets created. We also follow a different I/O path when writing to a dump 
device allowing us to behave as if we were a separate partition on the 
disk. The dump subsystem doesn't know the difference which is exactly 
what we wanted. :-)

>> Having two just seems like a waste to me, even with disk sizes being  
>> what they are (and growing). A separate dump device is only really  
>> needed if something goes completely wrong, otherwise it's just  
>> sitting there "doing nothing". If you're panicing, then whatever is  
>> in swap is now no longer relevant, so over writing it is no big deal.
>>   
>> 
> That said, with all the talk of dynamic sizing, If, during normal 
> operation the swap Zvol has space allocated, and the Dump Zvol is sized 
> to 0. Then during a panic, could the swap volume be sized to 0 and the 
> dump volume expanded to whatever size.
>   

Unfortunately that's not possible for the reasons I mentioned. You can 
resize the dump zvol to a smaller size but unfortunately you can't make 
it a size 0 as there is a minimum size requirement.

Thanks,
George
> This at least while still requireing 2 Zvol's would allow (even when the 
> rest of the pool is short on space) a close approximation of the old 
> behavior of sharing the same slice for both swap and dump.
>
>   -Kyle
>
>   
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>   
>> 
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs write cache enable on boot disks ?

2008-04-24 Thread George Wilson
Just to clarify a bit, ZFS will not enable the write cache for the root 
pool. That said, there are disk drives which have the write cache 
enabled by default. That behavior remains unchanged.

- George

George Wilson wrote:
> Unfortunately not.
>
> Thanks,
> George
>
> Par Kansala wrote:
>   
>> Hi,
>>
>> Will the upcoming zfs boot capabilities also enable write cache on a 
>> boot disk
>> like it does on regular data disks (when whole disks are used) ?
>>
>> //Par
>> -- 
>> -- 
>> <http://www.sun.com> *Pär Känsälä*
>> OEM Engagement Architect
>> *Sun Microsystems*
>> Phone +46 8 631 1782 (x45782)
>> Mobile +46 70 261 1782
>> Fax +46 455 37 92 05
>> Email [EMAIL PROTECTED]
>> <http://www.sun.com>
>>
>> 
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>   
>> 
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs write cache enable on boot disks ?

2008-04-24 Thread George Wilson
Unfortunately not.

Thanks,
George

Par Kansala wrote:
> Hi,
>
> Will the upcoming zfs boot capabilities also enable write cache on a 
> boot disk
> like it does on regular data disks (when whole disks are used) ?
>
> //Par
> -- 
> -- 
>   *Pär Känsälä*
> OEM Engagement Architect
> *Sun Microsystems*
> Phone +46 8 631 1782 (x45782)
> Mobile +46 70 261 1782
> Fax +46 455 37 92 05
> Email [EMAIL PROTECTED]
> 
>
> 
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to set ZFS metadata copies=3?

2008-02-15 Thread George Wilson
Vincent Fox wrote:
> Let's say you are paranoid and have built a pool with 40+ disks in a Thumper.
>
> Is there a way to set metadata copies=3 manually?
>
> After having built RAIDZ2 sets with 7-9 disks and then pooled these together, 
> it just seems like a little bit of extra insurance to increase metadata 
> copies.  I don't see a need for extra data copies which is currently the only 
> trigger I see for that.
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   
ZFS already does something like this for metadata by setting either 2 or 
3 copies based on the metadata type. Take a look at 
dmu_get_replication_level().

- George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] Preventing zpool imports on boot

2008-02-15 Thread George Wilson
Mike Gerdts wrote:
> On Feb 15, 2008 2:31 PM, Dave <[EMAIL PROTECTED]> wrote:
>   
>> This is exactly what I want - Thanks!
>>
>> This isn't in the man pages for zfs or zpool in b81. Any idea when this
>> feature was integrated?
>> 
>
> Interesting... it is in b76.  I checked several other releases both
> before and after and they didn't have it either.  Perhaps it is not
> part of the committed interface.  I stumbled upon it because I thought
> that I remembered "zpool import -R / poolname" having the behavior you
> were looking for.  The rather consistent documentation for "zpool
> import -R" mentioned the temporary attribute.
>
>   
We actually changed this to make it more robust. Now the property is 
called 'cachefile' and you can set it to 'none' if you want it to behave 
like the older 'temporary' property.

- George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool question

2007-10-30 Thread George Wilson
Krzys wrote:
> hello folks, I am running Solaris 10 U3 and I have small problem that I dont 
> know how to fix...
> 
> I had a pool of two drives:
> 
> bash-3.00# zpool status
>pool: mypool
>   state: ONLINE
>   scrub: none requested
> config:
> 
>  NAME  STATE READ WRITE CKSUM
>  mypoolONLINE   0 0 0
>emcpower0a  ONLINE   0 0 0
>emcpower1a  ONLINE   0 0 0
> 
> errors: No known data errors
> 
> I added another drive
> 
> so now I have pool of 3 drives
> 
> bash-3.00# zpool status
>pool: mypool
>   state: ONLINE
>   scrub: none requested
> config:
> 
>  NAME  STATE READ WRITE CKSUM
>  mypoolONLINE   0 0 0
>emcpower0a  ONLINE   0 0 0
>emcpower1a  ONLINE   0 0 0
>emcpower2a  ONLINE   0 0 0
> 
> errors: No known data errors
> 
> everything is great but I've made a mistake and I would like to remove 
> emcpower2a from my pool and I cannot do that...
> 
> Well the mistake that I made is that I did not format my device correctly so 
> instead of adding 125gig I added 128meg
> 
> here is my partition on that disk:
> partition> print
> Current partition table (original):
> Total disk cylinders available: 63998 + 2 (reserved cylinders)
> 
> Part  TagFlag Cylinders SizeBlocks
>0   rootwm   0 -63  128.00MB(64/0/0)   262144
>1   swapwu  64 -   127  128.00MB(64/0/0)   262144
>2 backupwu   0 - 63997  125.00GB(63998/0/0) 262135808
>3 unassignedwm   00 (0/0/0) 0
>4 unassignedwm   00 (0/0/0) 0
>5 unassignedwm   00 (0/0/0) 0
>6usrwm 128 - 63997  124.75GB(63870/0/0) 261611520
>7 unassignedwm   00 (0/0/0) 0
> 
> partition>
> 
> what I would like to do is to remove my emcpower2a device, format it and then 
> add 125gig one instead of the 128meg. Is it possible to do this in Solaris 10 
> U3? If not what are my options?
> 
> Regards,
> 
> Chris
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

One other (risker) option would be to export the pool and grow slice 0 
in emcpower2a so that it consumes the entire disk. Then reimport the 
pool and we should detect the new size and grow the pool accordingly. 
You want to make sure you don't change the starting cylinder so that we 
can still see the front half of the labels.

I've been able to successfully do this with EFI labels but have not 
tried this with VTOCs. If you do decide to go this route, a full backup 
is highly recommended.

- George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Solaris 10u5 Proposed Changes

2007-09-19 Thread George Wilson
Gzip support will be there but it was not a PSARC case. Once we have 
committed the content I will publish a complete list of all the CRs that 
are going into s10u5.

Thanks,
George

Jesus Cea wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> George Wilson wrote:
>> Here's a list of features that we are proposing for Solaris 10u5. Keep 
>> in mind that this is subject to change.
> 
> No GZIP compression?
> 
> - --
> Jesus Cea Avion _/_/  _/_/_/_/_/_/
> [EMAIL PROTECTED] http://www.argo.es/~jcea/ _/_/_/_/  _/_/_/_/  _/_/
> jabber / xmpp:[EMAIL PROTECTED] _/_/_/_/  _/_/_/_/_/
>_/_/  _/_/_/_/  _/_/  _/_/
> "Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
> "My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
> "El amor es poner tu felicidad en la felicidad de otro" - Leibniz
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iQCVAwUBRvG9D5lgi5GaxT1NAQLh1AP/aQWTkYWzAIJWx7Izbf8PIMAAHn2ha9j7
> BCUueq/MEXNy+iifxPv6/g2977+EmXcBXCOqsPGQ+/qxod/Q9aQgVQaP2mJKGhxq
> YlYlVGzRBjeEDcoku4WaEEd19wA/wNWNrgpHIJSscCcOKDiyE4PLeGrjpzeoM+tF
> +p+yWlbjv74=
> =qNTY
> -END PGP SIGNATURE-
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Solaris 10 Update 4 Patches

2007-09-19 Thread George Wilson
You will also have to issue a 'zpool upgrade -a' for the pools to update 
their version.

- George

Prabahar Jeyaram wrote:
> Yep. Just installing the patch would get you all the U4 ZFS updates.
> 
> -- 
> Prabahar.
> 
> On Sep 19, 2007, at 7:25 AM, Brian H. Nelson wrote:
> 
>> Does simply installing that patch on a U3 machine get you all of the U4
>> zfs updates, or must a full OS upgrade be done?
>>
>> -Brian
>>
>>
>> George Wilson wrote:
>>> The latest ZFS patches for Solaris 10 are now available:
>>>
>>> 120011-14 - SunOS 5.10: kernel patch
>>> 120012-14 - SunOS 5.10_x86: kernel patch
>>>
>>> ZFS Pool Version available with patches = 4
>>>
>>> These patches will provide access to all of the latest features and bug
>>> fixes:
>>>
>>> Features:
>>> PSARC 2006/288 zpool history
>>> PSARC 2006/308 zfs list sort option
>>> PSARC 2006/479 zfs receive -F
>>> PSARC 2006/486 ZFS canmount property
>>> PSARC 2006/497 ZFS create time properties
>>> PSARC 2006/502 ZFS get all datasets
>>> PSARC 2006/504 ZFS user properties
>>> PSARC 2006/622 iSCSI/ZFS Integration
>>> PSARC 2006/638 noxattr ZFS property
>>>
>>> Go to http://www.opensolaris.org/os/community/arc/caselog/ for more
>>> details on the above.
>>>
>>> See http://www.opensolaris.org/jive/thread.jspa?threadID=39903&tstart=0
>>> for complete list of CRs.
>>>
>>>
>>> Thanks,
>>> George
>>> ___
>>> zfs-discuss mailing list
>>> zfs-discuss@opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>
>>
>> -- 
>> ---
>> Brian H. Nelson Youngstown State University
>> System Administrator   Media and Academic Computing
>>   bnelson[at]cis.ysu.edu
>> ---
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] import zpool error if use loop device as vdev

2007-09-18 Thread George Wilson
By default, 'zpool import' looks only in /dev/dsk. Since you are using 
/dev/lofi you will need to use 'zpool import -d /dev/lofi' to import 
your pool.

Thanks,
George

sunnie wrote:
> Hey, guys
>   I just do the test for use loop device as vdev for zpool
> Procedures as followings:
> 1)  mkfile -v 100m disk1
>  mkfile -v 100m disk2
> 
> 2)  lofiadm -a disk1 /dev/lofi
>  lofiadm -a disk2 /dev/lofi
> 
> 3)  zpool create pool_1and2  /dev/lofi/1 and /dev/lofi/2
> 
> 4)  zpool export pool_1and2
> 
> 5) zpool  import pool_1and2
> 
> error info here:
> bash-3.00# zpool import pool1_1and2
> cannot import 'pool1_1and2': no such pool available
> 
> So, can anyone help explain some details that differ from loop devices and 
> physical  block device?
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] what patches are needed to enable zfs_nocacheflush

2007-09-18 Thread George Wilson
Bernhard,

Here are the solaris 10 patches:

120011-14 - SunOS 5.10: kernel patch
120012-14 - SunOS 5.10_x86: kernel patch

See http://www.opensolaris.org/jive/thread.jspa?threadID=39951&tstart=0 
for more info.

Thanks,
George

Bernhard Holzer wrote:
> Hi,
> 
> this parameter (zfs_nocacheflush) is now integrated into Solaris10/U4. 
> Is it possible to "just install a few patches" to enable this.
> What patches are required?
> 
> Thanks
> Bernhard
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Solaris 10 Update 4 Patches

2007-09-18 Thread George Wilson
The latest ZFS patches for Solaris 10 are now available:

120011-14 - SunOS 5.10: kernel patch
120012-14 - SunOS 5.10_x86: kernel patch

ZFS Pool Version available with patches = 4

These patches will provide access to all of the latest features and bug 
fixes:

Features:
PSARC 2006/288 zpool history
PSARC 2006/308 zfs list sort option
PSARC 2006/479 zfs receive -F
PSARC 2006/486 ZFS canmount property
PSARC 2006/497 ZFS create time properties
PSARC 2006/502 ZFS get all datasets
PSARC 2006/504 ZFS user properties
PSARC 2006/622 iSCSI/ZFS Integration
PSARC 2006/638 noxattr ZFS property

Go to http://www.opensolaris.org/os/community/arc/caselog/ for more 
details on the above.

See http://www.opensolaris.org/jive/thread.jspa?threadID=39903&tstart=0 
for complete list of CRs.


Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System hang caused by a "bad" snapshot

2007-09-18 Thread George Wilson
Ben,

Much of this code has been revamped as a result of:

6514331 in-memory delete queue is not needed

Although this may not fix your issue it would be good to try this test 
with more recent bits.

Thanks,
George

Ben Miller wrote:

> Hate to re-open something from a year ago, but we just had this problem 
> happen again.  We have been running Solaris 10u3 on this system for awhile.  
> I searched the bug reports, but couldn't find anything on this.  I also think 
> I understand what happened a little more.  We take snapshots at noon and the 
> system hung up during that time.  When trying to reboot the system would hang 
> on the ZFS mounts.  After I boot into single use and remove the snapshot from 
> the filesystem causing the problem everything is fine.  The filesystem in 
> question at 100% use with snapshots in use.
> 
> Here's the back trace for the system when it was hung:
>> ::stack
> 0xf0046a3c(f005a4d8, 2a10004f828, 0, 181c850, 1848400, f005a4d8)
> prom_enter_mon+0x24(0, 0, 183b400, 1, 1812140, 181ae60)
> debug_enter+0x118(0, a, a, 180fc00, 0, 183d400)
> abort_seq_softintr+0x94(180fc00, 18a9800, 180c000, 2a10004fd98, 1, 1857c00)
> intr_thread+0x170(2, 30007b64bc0, 0, c001ed9, 110, 6000240)
> 0x985c8(300adca4c40, 0, 0, 0, 0, 30007b64bc0)
> dbuf_hold_impl+0x28(60008cd02e8, 0, 0, 0, 7b648d73, 2a105bb57c8)
> dbuf_hold_level+0x18(60008cd02e8, 0, 0, 7b648d73, 0, 0)
> dmu_tx_check_ioerr+0x20(0, 60008cd02e8, 0, 0, 0, 7b648c00)
> dmu_tx_hold_zap+0x84(60011fb2c40, 0, 0, 0, 30049b58008, 400)
> zfs_rmnode+0xc8(3002410d210, 2a105bb5cc0, 0, 60011fb2c40, 30007b3ff58, 
> 30007b56ac0)
> zfs_delete_thread+0x168(30007b56ac0, 3002410d210, 69a4778, 30007b56b28, 
> 2a105bb5aca, 2a105bb5ac8)
> thread_start+4(30007b56ac0, 0, 0, 489a48, d83a10bf28, 50386)
> 
> Has this been fixed in more recent code?  I can make the crash dump available.
> 
> Ben
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool history not found

2007-09-18 Thread George Wilson
You need to install patch 120011-14. After you reboot you will be able 
to run 'zpool upgrade -a' to upgrade to the latest version.

Thanks,
George

sunnie wrote:
> Hey, guys
>  Since corrent zfs software only support ZFS pool version 3, how should I 
> do to upgrade the zfs software or package?
>  PS. my current os: SUNOS 5.10 Generic_118833-33 sun4u sparc
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Solaris 10u5 Proposed Changes

2007-09-18 Thread George Wilson
ZFS Fans,

Here's a list of features that we are proposing for Solaris 10u5. Keep 
in mind that this is subject to change.

Features:
PSARC 2007/142 zfs rename -r
PSARC 2007/171 ZFS Separate Intent Log
PSARC 2007/197 ZFS hotplug
PSARC 2007/199 zfs {create,clone,rename} -p
PSARC 2007/283 FMA for ZFS Phase 2
PSARC/2006/465 ZFS Delegated Administration
PSARC/2006/577 zpool property to disable delegation
PSARC/2006/625 Enhancements to zpool history
PSARC/2007/121 zfs set copies
PSARC/2007/228 ZFS delegation amendments
PSARC/2007/295 ZFS Delegated Administration Addendum
PSARC/2007/328 zfs upgrade

Stay tuned for a finalized list of RFEs and fixes.

Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opensolaris ZFS version & Sol10U4 compatibility

2007-08-16 Thread George Wilson
The on-disk format for s10u4 will be version 4. This is equivalent to 
Opensolaris build 62.

Thanks,
George

David Evans wrote:
> As the release date Solaris 10 Update 4 approaches (hope, hope),  I was 
> wondering if someone could comment on which versions of opensolaris ZFS will 
> seamlessly work when imported.
> 
> Thanks.
> 
> dce
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Again ZFS with expanding LUNs!

2007-08-06 Thread George Wilson
I'm planning on putting back the changes to ZFS into Opensolaris in 
upcoming weeks. This will still require a manual step as the changes 
required in the sd driver are still under development.

The ultimate plan is to have the entire process totally automated.

If you have more questions, feel free to drop me a line.

Thanks,
George

Yan wrote:
> Hey David
> might I need to track the evolution of that size-change utility to ZFS
> could I have a contact at Sun that would be able to give me more information 
> on that ? 
> Being able to resize LUNS dynamically Is a reality here, I currently do it 
> with UFS after a EMC Clariion LUN Migration to a larger LUN
> 
> That is our current show-stopper to using ZFS
> thanks 
> Yannick
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool fragmentation

2007-07-10 Thread George Wilson
This fix plus the fix for '6495013 Loops and recursion in 
metaslab_ff_alloc can kill performance, even on a pool with lots of free 
data' will greatly help your situation.

Both of these fixes will be in Solaris 10 update 4.

Thanks,
George

?ukasz wrote:
> I have a huge problem with ZFS pool fragmentation. 
> I started investigating problem about 2 weeks ago 
> http://www.opensolaris.org/jive/thread.jspa?threadID=34423&tstart=0
> 
> I found workaround for now - changing recordsize -  but I want better 
> solution. 
> The best solution would be a defragmentator tool, but I can see that it is 
> not easy.
> 
> When ZFS pool is fragmented then:
>  1. spa_sync function is executing very long ( > 5 seconds )
>  2. spa_sync thread often takes 100% CPU
>  3. metaslab space map is very big
> 
> There are some changes hidding the problem like this
>  http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6512391
> and I hope there will be available in Solaris 10 update 4
> 
> But I suggest that:
> 1. in sync phase when for the first time we did not found block we need 
> ( for example 128k ), pool schould remember this for some time ( 5 minutes ) 
> and stop asking for this kind of blocks. 
> 
> 2. We should be more careful with unloading space maps.
> At the end of sync phase space maps for metaslabs without active flag are 
> unloaded.
> On my fragmented pool spacemap with 800MB space available ( from 2GB ) 
> is unloaded because there was no 128K blocks.
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status -v: machine readable format?

2007-07-03 Thread George Wilson
David Smith wrote:
> I was wondering if anyone had a script to parse the "zpool status -v" output 
> into a more machine readable format?
>
> Thanks,
>
> David
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   
David,

Are you using the latest Nevada bits, as they will actually print out 
the pathname associated with the errors. Take a look at Eric's blog:

http://blogs.sun.com/erickustarz/entry/damaged_files_and_zpool_status

Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: B62 AHCI and ZFS

2007-04-30 Thread George Wilson

Peter Goodman wrote:

# zpool status -x
  pool: mtf
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
mtf UNAVAIL  0 0 0  insufficient replicas
  c1t0d0s6  UNAVAIL  0 0 0  cannot open
  c1t1d0s6  UNAVAIL  0 0 0  cannot open
  c1t2d0s6  UNAVAIL  0 0 0  cannot open
  c1t3d0s6  UNAVAIL  0 0 0  cannot open
  c1t4d0s6  UNAVAIL  0 0 0  cannot open
  c1t5d0s6  UNAVAIL  0 0 0  cannot open
#
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
Are you able to see the devices with format? I noticed that you have 
specified slice 6 for these devices does that mean that you have other 
filesystems on other slices?



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] B62 AHCI and ZFS

2007-04-30 Thread George Wilson

Peter,

Can you send the 'zpool status -x' output after your reboot. I suspect 
that the pool error is occurring early in the boot and later the devices 
are all available and the pool is brought into an online state.


Take a look at:

*6401126 ZFS DE should verify that diagnosis is still valid before 
solving cases *


Thanks,
George

Peter Goodman wrote:

Hardware Supermicro X7DAE (AHCI BIOS) dual Intel Woodcrest processors, 6 x 
Western Digital Raptor SATA drives.

I have installed b62 running 64 bit succesfully on a PATA drive. The BIOS is 
configured to access the SATA drives in native mode using hte AHCI Bios.
I have 6 SATA II drives accessed via the Solaris AHCI driver. I have created a 
ZFS file system across all 6 drives. This works fine until I reboot. ZFS then 
shows the drives as unvail. This is repeatable.

Is this a problem with the device names I have used to create the filesystem? 
Have I used an alias for the device names that is not valid at boot time?


# zpool create -f mtf 
/dev/dsk/c1t0d0s6 /dev/dsk/c1t1d0s6 /dev/dsk/c1t2d0s6 /dev/dsk/c1t3d0s6 /dev/dsk/c1t4d0s6 /dev/dsk/c1t5d0s6


Dmesg reports the following:

Apr 27 09:30:03 weston sata: [ID 663010 kern.notice] /[EMAIL 
PROTECTED],0/pci15d9,[EMAIL PROTECTED],2
:
Apr 27 09:30:03 weston sata: [ID 761595 kern.notice]SATA disk device at port
 2
Apr 27 09:30:03 weston sata: [ID 846691 kern.notice]model WDC WD740ADFD-00NL
R1
Apr 27 09:30:03 weston sata: [ID 693010 kern.notice]firmware 20.07P20
Apr 27 09:30:03 weston sata: [ID 163988 kern.notice]serial number  WD-WM
ANS1324283
Apr 27 09:30:03 weston sata: [ID 594940 kern.notice]supported features:
Apr 27 09:30:03 weston sata: [ID 981177 kern.notice] 48-bit LBA, DMA, Native
 Command Queueing, SMART, SMART self-test
Apr 27 09:30:03 weston sata: [ID 674233 kern.notice]SATA1 compatible
Apr 27 09:30:03 weston sata: [ID 349649 kern.notice]queue depth 32
Apr 27 09:30:03 weston sata: [ID 349649 kern.notice]capacity = 145226112 sec
tors
Apr 27 09:30:03 weston scsi: [ID 193665 kern.notice] sd3 at ahci0: target 2 lun
0
Apr 27 09:30:03 weston genunix: [ID 936769 kern.notice] sd3 is /[EMAIL 
PROTECTED],0/pci15d9,
[EMAIL PROTECTED],2/[EMAIL PROTECTED],0
Apr 27 09:30:04 weston genunix: [ID 408114 kern.notice] /[EMAIL 
PROTECTED],0/pci15d9,[EMAIL PROTECTED]
,2/[EMAIL PROTECTED],0 (sd3) online



Apr 27 09:30:04 weston genunix: [ID 408114 kern.notice] /[EMAIL 
PROTECTED],0/pci15d9,[EMAIL PROTECTED]
,2/[EMAIL PROTECTED],0 (sd6) online
Apr 27 09:30:04 weston unix: [ID 469452 kern.notice] NOTICE: ncrs: 64-bit driver
 module not found
Apr 27 09:30:04 weston unix: [ID 469452 kern.notice] NOTICE: hpfc: 64-bit driver
 module not found
Apr 27 09:30:04 weston unix: [ID 469452 kern.notice] NOTICE: adp: 64-bit driver
module not found
Apr 27 09:30:04 weston unix: [ID 469452 kern.notice] NOTICE: mscsi: 64-bit drive
r module not found
Apr 27 09:30:04 weston unix: [ID 469452 kern.notice] NOTICE: i2o_scsi: 64-bit dr
iver module not found
Apr 27 09:30:04 weston e1000g: [ID 801593 kern.notice] NOTICE: pciex8086,1096 -
e1000g[0] : Adapter 1000Mbps full duplex copper link is up.
Apr 27 09:30:04 weston scsi: [ID 193665 kern.notice] sd0 at ata0: target 1 lun 0
Apr 27 09:30:04 weston genunix: [ID 936769 kern.notice] sd0 is /[EMAIL 
PROTECTED],0/pci-ide@
1f,1/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
Apr 27 09:30:04 weston unix: [ID 469452 kern.notice] NOTICE: symhisl: 64-bit dri
ver module not found
Apr 27 09:30:04 weston unix: [ID 469452 kern.notice] NOTICE: cadp: 64-bit driver
 module not found
Apr 27 09:30:05 weston pseudo: [ID 129642 kern.notice] pseudo-device: pm0
Apr 27 09:30:05 weston genunix: [ID 936769 kern.notice] pm0 is /pseudo/[EMAIL 
PROTECTED]
Apr 27 09:30:05 weston pseudo: [ID 129642 kern.notice] pseudo-device: power0
Apr 27 09:30:05 weston genunix: [ID 936769 kern.notice] power0 is /pseudo/power@
0
Apr 27 09:30:05 weston pseudo: [ID 129642 kern.notice] pseudo-device: devinfo0
Apr 27 09:30:05 weston genunix: [ID 936769 kern.notice] devinfo0 is /pseudo/devi
[EMAIL PROTECTED]
Apr 27 09:30:08 weston rootnex: [ID 349649 kern.notice] iscsi0 at root
Apr 27 09:30:08 weston genunix: [ID 936769 kern.notice] iscsi0 is /iscsi
Apr 27 09:30:08 weston rootnex: [ID 349649 kern.notice] xsvc0 at root: space 0 o
ffset 0
Apr 27 09:30:08 weston genunix: [ID 936769 kern.notice] xsvc0 is /[EMAIL 
PROTECTED],0
Apr 27 09:30:08 weston pseudo: [ID 129642 kern.notice] pseudo-device: pseudo1
Apr 27 09:30:08 weston genunix: [ID 936769 kern.notice] pseudo1 is /pseudo/zcons
[EMAIL PROTECTED]
Apr 27 09:30:08 weston pcplusmp: [ID 803547 kern.info] pcplusmp: asy (asy) insta
nce 0 vector 0x4 ioapic 0x2 intin 0x4 is bound to cpu 1
Apr 27 09:30:08 weston isa: [ID 202937 kern.notice] ISA-device: asy0
Apr 27 09:30:08 weston genunix: [ID 936769 kern.notice] asy0 is /isa/[EMAIL 
PROTECTED],3f8
Apr 27 09:30:08 weston pcplusmp: [ID 444295 kern.info] pcplusmp: asy (asy) 

Re: [zfs-discuss] Permanently removing vdevs from a pool

2007-04-19 Thread George Wilson

This is a high priority for us and is actively being worked.

Vague enough for you. :-) Sorry I can't give you anything more exact 
that that.


- George

Matty wrote:

On 4/19/07, Mark J Musante <[EMAIL PROTECTED]> wrote:

On Thu, 19 Apr 2007, Mario Goebbels wrote:

> Is it possible to gracefully and permanently remove a vdev from a pool
> without data loss?

Is this what you're looking for?
http://bugs.opensolaris.org/view_bug.do?bug_id=4852783

If so, the answer is 'not yet'.


Can the ZFS team comment on how far out this feature is? There are a
couple of items 9and bugs) that are preventing us from deploying ZFS,
and this is one of them.

Thanks for any insight,
- Ryan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] misleading zpool state and panic -- nevada b60 x86

2007-04-09 Thread George Wilson

William D. Hathaway wrote:
I'm running Nevada build 60 inside VMWare, it is a test rig with no data of value. 
SunOS b60 5.11 snv_60 i86pc i386 i86pc

I wanted to check out the FMA handling of a serious zpool error, so I did the 
following:

2007-04-07.08:46:31 zpool create tank mirror c0d1 c1d1
2007-04-07.15:21:37 zpool scrub tank
(inserted some errors with dd on one device to see if it showed up, which it 
did, but healed fine)
2007-04-07.15:22:12 zpool scrub tank
2007-04-07.15:22:46 zpool clear tank c1d1
(added a single device without any redundancy)
2007-04-07.15:28:29 zpool add -f tank /var/500m_file
(then I copied data into /tank and removed the /var/500m_file, a panic 
resulted, which was expected)

I created a new /var/500m_file and then decided to destroy the pool and start 
over again.  This caused a panic, which I wasn't expecting.  On reboot, I did a 
zpool -x, which shows:
  pool: tank
 state: ONLINE
status: One or more devices could not be used because the label is missing or
invalid.  Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
tank  ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0d1  ONLINE   0 0 0
c1d1  ONLINE   0 0 0
  /var/500m_file  UNAVAIL  0 0 0  corrupted data

errors: No known data errors

Since there was no redundancy for the /var/500m_file vdev, I don't see how a 
replace will help (unless I still had the original device/file with the data 
intact).

When I try to destroy the pool with "zpool destroy tank", I get a panic with:
Apr  7 16:00:17 b60 genunix: [ID 403854 kern.notice] assertion failed: 
vdev_config_sync(rvd, t
xg) == 0, file: ../../common/fs/zfs/spa.c, line: 2910
Apr  7 16:00:17 b60 unix: [ID 10 kern.notice]
Apr  7 16:00:17 b60 genunix: [ID 353471 kern.notice] d893cd0c 
genunix:assfail+5a (f9e87e74, f9
e87e58,)
Apr  7 16:00:17 b60 genunix: [ID 353471 kern.notice] d893cd6c zfs:spa_sync+6c3 
(da89cac0, 1363
, 0)
Apr  7 16:00:17 b60 genunix: [ID 353471 kern.notice] d893cdc8 
zfs:txg_sync_thread+1df (d467854
0, 0)
Apr  7 16:00:18 b60 genunix: [ID 353471 kern.notice] d893cdd8 
unix:thread_start+8 ()
Apr  7 16:00:18 b60 unix: [ID 10 kern.notice]
Apr  7 16:00:18 b60 genunix: [ID 672855 kern.notice] syncing file systems...

My question/comment boil down to:
1) Should the pool state really be 'online' after losing a non-redundant vdev?
  


Yeah, this seems odd and is probably a bug.

2) It seems like a bug if I get a panic when trying to destroy a pool (although 
this clearly may be related to #1).
  


This is a known problem and one that we're working on right now:

6413847 vdev label write failure should be handled more gracefully.

In your case we are trying to update the label to indicate that the pool 
has been destroyed and this results in label write failure and thus the 
panic.


Thanks,
George

Am I hitting a known bug (or misconceptions about how the pool should function)?
I will happily provide any debugging info that I can.

I haven't tried a 'zpool destroy -f tank' yet since I didn't know if there was 
any debugging value in my current state.

Thanks,
William Hathaway
www.williamhathaway.com
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] other panic caused by ZFS

2007-04-09 Thread George Wilson

Gino,

Were you able to recover by setting zfs_recover?

Thanks,
George

Gino wrote:

Hi All,
here is an other kind of kernel panic caused by ZFS that we found.
I have dumps if needed.


#zpool import

pool: zpool8
id: 7382567111495567914
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
config:

volume8  ONLINE
  c0t60001FE100118DB9119074440055d0  ONLINE


#zpool import zpool8



Apr  7 22:53:34 SERVER140 ^Mpanic[cpu1]/thread=ff001807dc80: 
Apr  7 22:53:34 SERVER140 genunix: [ID 361072 kern.notice] zfs: freeing free segment (offset=17712545792 size=131072)
Apr  7 22:53:34 SERVER140 unix: [ID 10 kern.notice] 
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d380 genunix:vcmn_err+28 ()

Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d470 
zfs:zfs_panic_recover+b6 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d500 
zfs:space_map_remove+147 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d5a0 
zfs:space_map_load+1f4 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d5e0 
zfs:metaslab_activate+66 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d6a0 
zfs:metaslab_group_alloc+1fb ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d770 
zfs:metaslab_alloc_dva+17d ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d810 
zfs:metaslab_alloc+6f ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d850 
zfs:zio_dva_allocate+63 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d870 
zfs:zio_next_stage+b3 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d8a0 
zfs:zio_checksum_generate+6e ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d8c0 
zfs:zio_next_stage+b3 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d930 
zfs:zio_write_compress+202 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d950 
zfs:zio_next_stage+b3 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d9a0 
zfs:zio_wait_for_children+5d ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d9c0 
zfs:zio_wait_children_ready+20 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807d9e0 
zfs:zio_next_stage_async+bb ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807da00 
zfs:zio_nowait+11 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807da80 
zfs:dmu_objset_sync+180 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807dad0 
zfs:dsl_dataset_sync+42 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807db40 
zfs:dsl_pool_sync+a7 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807dbd0 
zfs:spa_sync+1c5 ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807dc60 
zfs:txg_sync_thread+19a ()
Apr  7 22:53:34 SERVER140 genunix: [ID 655072 kern.notice] ff001807dc70 
unix:thread_start+8 ()
Apr  7 22:53:34 SERVER140 unix: [ID 10 kern.notice] 
Apr  7 22:53:34 SERVER140 genunix: [ID 672855 kern.notice] syncing file systems...

Apr  7 22:53:35 SERVER140 genunix: [ID 904073 kern.notice]  done
Apr  7 22:53:36 SERVER140 genunix: [ID 111219 kern.notice] dumping to 
/dev/dsk/c2t0d0s3, offset 1677983744, content: kernel
Apr  7 22:54:04 SERVER140 genunix: [ID 409368 kern.notice] ^M100% done: 644612 pages dumped, compression ratio 4.30, 
Apr  7 22:54:04 SERVER140 genunix: [ID 851671 kern.notice] dump succeeded


gino
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Unbelievable. an other crashed zpool :(

2007-04-09 Thread George Wilson

Gino,

Can you send me the corefile from the zpool command? This looks like a 
case where we can't open the device for some reason. Are you using a 
multi-pathing solution other than MPXIO?


Thanks,
George

Gino wrote:
Today we lost an other zpool! 
Fortunately it was only a backup repository. 


SERVER144@/# zpool import zpool3
internal error: unexpected error 5 at line 773 of ../common/libzfs_pool.c

this zpool was a RAID10 from 4 HDS LUN.

trying to import it into snv_60 (recovery mode) doesn't work.

gino
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] panic with zfs

2007-01-29 Thread George Wilson

Ihsan,

If you are running Solaris 10 then you are probably hitting:

6456939 sd_send_scsi_SYNCHRONIZE_CACHE_biodone() can issue TUR which 
calls biowait()and deadlock/hangs host


This was fixed in opensolaris (build 48) but a patch is not yet 
available for Solaris 10.


Thanks,
George

Ihsan Dogan wrote:

Am 24.1.2007 15:49 Uhr, Michael Schuster schrieb:


I am going to create the same conditions here but with snv_55b and
then yank
a disk from my zpool.  If I get a similar response then I will *hope*
for a
crash dump.

You must be kidding about the "open a case" however.  This is
OpenSolaris.

no, I'm not. That's why I said "If you have a supported version of
Solaris". Also, Ihsan seems to disagree about OpenSolaris:


I opened a case this morning. Lets see, what the support guys are saying.



Ihsan


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs services , label and packages

2006-12-28 Thread George Wilson



storage-disk wrote:

Hi there

I have 3 questions regarding zfs.

1. what are zfs packages?


SUNWzfsr, SUNWzfskr, and SUNWzfsu. Note that ZFS has dependencies on 
other components of Solaris so installing just the packages in not 
supported.




2. what services need to be started in order for zfs working properly?


ZFS uses relies on three SMF services:

1) svc:/system/filesystem/local:default
2) svc:/system/device/local:default
3) svc:/system/fmd:default

The last service maybe disabled but you lose your ability to receive FMA 
events.




3. Is there an explanation for zfs label> This is the out put from zdb -l /dev/dsk/C_t_d. I'd like to know what they mean each line. I have some idea what it means but not every line. 



LABEL 3

version=2
name='viper'
state=0
txg=14693
pool_guid=514982409923329758
top_guid=5076607254487322717
guid=9253340189361483228
vdev_tree
type='mirror'
id=0
guid=5076607254487322717
metaslab_array=13
metaslab_shift=20
ashift=9
asize=129499136
children[0]
type='disk'
id=0
guid=9253340189361483228
path='/dev/dsk/c3t53d0s1'
devid='id1,/b'
whole_disk=0
DTL=19
children[1]
type='disk'
id=1
guid=6007798096730764430
path='/dev/dsk/c3t54d0s1'
devid='id1, /b'
whole_disk=0
DTL=18


Take a look at chapter 1 of the Ondisk format guide (specifically pages 
 9-14) as they include the majority of the lines above.


Thanks,
George



Thank you very much.
Giang
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Solaris 10 11/06

2006-12-28 Thread George Wilson
Now that Solaris 10 11/06 is available, I wanted to post the complete list of 
ZFS features and bug fixes that were included in that release. I'm also 
including the necessary patches for anyone wanting to get all the ZFS features 
and fixes via patches (NOTE: later patch revision may already be available):

Solaris 10 Update 3 (11/06) Patches
sparc Patches

* 118833-36 SunOS 5.10: kernel patch
* 124204-03 SunOS 5.10: zfs patch
* 122660-07 SunOS 5.10: zones jumbo patch
* 120986-07 SunOS 5.10: mkfs and newfs patch
* 123839-01 SunOS 5.10: Fault Manager patch

i386 Patches

* 118855-36 SunOS 5.10_x86: kernel Patch
* 122661-05 SunOS 5.10_x86: zones jumbo patch
* 124205-04 SunOS 5.10_x86: zfs/zpool patch
* 120987-07 SunOS 5.10_x86: mkfs, newfs, other ufs utils patch
* 123840-01 SunOS 5.10_x86: Fault Manager patch


ZFS Features/Projects

PSARC 2006/223 ZFS Hot Spares
PSARC 2006/303 ZFS Clone Promotion
PSARC 2006/388 snapshot -r

ZFS Bug Fixes/RFEs

4034947 anon_swap_adjust() should call kmem_reap() if availrmem is low.
6276916 support for "clone swap"
6288488 du reports misleading size on RAID-Z
6354408 libdiskmgt needs to handle sysevent failures in miniroot or failsafe 
environments better
6366301 CREATE with owner_group attribute is not set correctly with NFSv4/ZFS
6373978 want to take lots of snapshots quickly ('zfs snapshot -r')
6385436 zfs set  returns an error, but still sets property value
6393490 libzfs should be a real library
6397148 fbufs debug code should be removed from buf_hash_insert()
6401400 zfs(1) usage output is excessively long
6405330 swap on zvol isn't added during boot
6405966 Hot Spare support in ZFS
6409228 typo in aclutils.h
6409302 passing a non-root vdev via zpool_create() panics system
6415739 assertion failed: !(zio->io_flags & 0x00040)
6416482 filebench oltp workload hangs in zfs
6416759 ::dbufs does not find bonus buffers anymore
6416794 zfs panics in dnode_reallocate during incremental zfs restore
6417978 double parity RAID-Z a.k.a. RAID6
6420204 root filesystem's delete queue is not running
6421216 ufsrestore should use acl_set() for setting ACLs
6424554 full block re-writes need not read data in
6425111 detaching an offline device can result in import confusion
6425740 assertion failed: new_state != old_state
6430121 3-way deadlock involving tc_lock within zfs
6433208 should not be able to offline/online a spare
6433264 crash when adding spare: nvlist_lookup_string(cnv, "path", &path) == 0
6433406 zfs_open() can leak memory on failure
6433408 namespace_reload() can leak memory on allocation failure
6433679 zpool_refresh_stats() has poor error semantics
6433680 changelist_gather() ignores libuutil errors
6433717 offline devices should not be marked persistently unavailble
6435779 6433679 broke zpool import
6436502 fsstat needs to support file systems greater than 2TB
6436514 zfs share on /var/mail needs to be run explicitly after system boots
6436524 importing a bogus pool config can panic system
6436526 delete_queue thread reporting drained when it may not be true
6436800 ztest failure: spa_vdev_attach() returns EBUSY instead of ENOTSUP
6439102 assertain failed: dmu_buf_refcount(dd->dd_dbuf) == 2 in 
dsl_dir_destroy_check()
6439370 assertion failures possible in dsl_dataset_destroy_sync()
6440499 zil should avoid txg_wait_synced() and use dmu_sync() to issue parallel 
IOs when fsyncing
6443585 zpool create of poolname > 250 and < 256 characters panics in debug 
printout
6444346 zfs promote fails in zone
6446569 deferred list is hooked on flintstone vitamins
6447377 ZFS prefetch is inconsistant
6447381 dnode_free_range() does not handle non-power-of-two blocksizes 
correctly6451860 zfs rename' a filesystem|clone to its direct child will cause 
internal error
6447452 re-creating zfs files can lead to failure to unmount
6448371 'zfs promote' of a volume clone fails with EBUSY
6448999 panic: used == ds->ds_phys->ds_unique_bytes
6449033 PIT nightly fails due to the fix for 6436514
6449078 Makefile for fsstat contains '-g' option
6450292 unmount original file system, 'zfs promote' cause system panic.
6451124 assertion failed: rc->rc_count >= number
6451412 renaming snapshot with 'mv' makes unmounting snapshot impossible
6452372 assertion failed: dnp->dn_nlevels == 1
6452420 zfs_get_data() of page data panics when blocksize is less than pagesize
6452923 really out of space panic even though ms_map.sm_space > 0
6453304 s10u3_03 integration for 6405966 breaks on10-patch B3 feature build
6458781 random spurious ENOSPC failures

Thanks,
George
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] using zpool attach/detach to migrate drives from one controller to another

2006-12-27 Thread George Wilson

Derek,

Have you tried doing a 'zpool replace poolname c1t53d0 c2t53d0'? I'm not 
sure if this will work but worth a shot. You may still end up with a 
complete resilver.


Thanks,
George

Derek E. Lewis wrote:

On Thu, 28 Dec 2006, George Wilson wrote:

You're best bet is to export and re-import the pool after moving 
devices. You might also try to 'zpool offline' the device, move it and 
then 'zpool online' it. This should force a reopen of the device and 
then it would only have to resilver the transactions during the time 
that the device was offline. I have not tried the later but it should 
work.


George,

I haven't moved any devices around. I have two physical paths to the 
JBOD, which allows the system to see all the disks on two different 
controllers (c1t53d0 and c2t53d0 are already there). 'zfs 
online/offline' and 'zfs import/export' aren't going to help at all 
unless I physically swap the fibre paths. This won't work because I have 
other pools on the JBOD.


If this were a production system, exporting the entire pool would not be 
ideal, just to change the controller the mirrored pooldevs are using. If 
ZFS cannot do this without (1) exporting the pool and importing it or 
(2) doing a complete resilver of the disk(s), this sounds like a valid 
RFE for a more intelligent 'zfs replace' or 'zfs attach/detach'.


Thanks,

Derek E. Lewis
[EMAIL PROTECTED]
http://delewis.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Saving scrub results before scrub completes

2006-12-27 Thread George Wilson

Siegfried,

Can you provide the panic string that you are seeing? We should be able 
to pull out the persistent error log information from the corefile. You 
can take a look at spa_get_errlog() function as a starting point.


Additionally, you can look at the corefile using mdb and take a look at 
the vdev error stats. Here's an example (hopefully the formatting 
doesn't get messed up):


> ::spa -v
ADDR STATE NAME 


060004473680ACTIVE test

ADDR STATE AUX  DESCRIPTION 


060004bcb500 HEALTHY   -root
060004bcafc0 HEALTHY   -  /dev/dsk/c0t2d0s0

> 060004bcb500::vdev -re
ADDR STATE AUX  DESCRIPTION
060004bcb500 HEALTHY   -root

   READWRITE FREECLAIMIOCTL
OPS   00000
BYTES 00000
EREAD 0
EWRITE0
ECKSUM0

060004bcafc0 HEALTHY   -  /dev/dsk/c0t2d0s0

   READWRITE FREECLAIMIOCTL
OPS0x170x1d2000
BYTES  0x19c000 0x11da00000
EREAD 0
EWRITE0
ECKSUM0

This will show you and read/write/cksum errors.

Thanks,
George


Siegfried Nikolaivich wrote:

Hello All,

I am wondering if there is a way to save the scrub results right before the 
scrub is complete.

After upgrading to Solaris 10U3 I still have ZFS panicing right as the scrub completes.  
The scrub results seem to be "cleared" when system boots back up, so I never 
get a chance to see them.

Does anyone know of a simple way?
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] using zpool attach/detach to migrate drives from one controller to another

2006-12-27 Thread George Wilson

Derek,

I don't think 'zpool attach/detach' is what you want as it will always 
result in a complete resilver.


You're best bet is to export and re-import the pool after moving 
devices. You might also try to 'zpool offline' the device, move it and 
then 'zpool online' it. This should force a reopen of the device and 
then it would only have to resilver the transactions during the time 
that the device was offline. I have not tried the later but it should work.


Thanks,
George

Derek E. Lewis wrote:

Greetings,

I'm trying to move some of my mirrored pooldevs to another controller. I 
have a StorEdge A5200 (Photon) with two physical paths to it, and 
originally, when I created the storage pool, I threw all of the drives 
on c1. Several days after my realization of this, I'm trying to change 
the mirrored pooldevs to c2 (c1t53d0 -> c2t53d0). At first, 'zpool 
replace' seemed ideal; however, it warned that c2t53d0 was already an 
existing pooldev for the pool. I then tried detaching the c1 device and 
re-attaching the c2 device; however, this caused a complete resilver, 
which is very expensive. This is a Solaris 10 11/06 system -- any chance 
zpool attach/detach has become more intelligent in Solaris Express? 
Perhaps, 'zpool replace' was the right way to go about it?


Thanks,

Derek E. Lewis
[EMAIL PROTECTED]
http://delewis.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Uber block corruption?

2006-12-12 Thread George Wilson
Also note that the UB is written to every vdev (4 per disk) so the 
chances of all UBs being corrupted is rather low.


Thanks,
George

Darren Dunham wrote:

DD> To reduce the chance of it affecting the integrety of the filesystem,
DD> there are multiple copies of the UB written, each with a checksum and a
DD> generation number.  When starting up a pool, the oldest generation copy
DD> that checks properly will be used.  If the import can't find any valid
DD> UB, then it's not going to have access to any data.  Think of a UFS
DD> filesystem where all copies of the superblock are corrupt.

Actually the latest UB, not the oldest.


My *other* oldest...  yeah.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Corruption

2006-12-12 Thread George Wilson

Bill,

If you want to find the file associated with the corruption you could do 
a "find /u01 -inum 4741362" or use the output of "zdb -d u01" to 
find the object associated with that id.


Thanks,
George

Bill Casale wrote:

Please reply directly to me. Seeing the message below.

Is it possible to determine exactly which file is corrupted?
I was thinking the OBJECT/RANGE info may be pointing to it
but I don't know how to equate that to a file.


# zpool status -v
  pool: u01
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
u01 ONLINE   0 0 6
  c1t102d0  ONLINE   0 0 6

errors: The following persistent errors have been detected:

  DATASET  OBJECT   RANGE
  u01  4741362  600178688-600309760



Thanks,
Bill



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS patches for S10 6/06

2006-10-04 Thread George Wilson

Andreas,

The first ZFS patch will be released in the upcoming weeks. For now, the 
latest available bits are the ones from s10 6/06.


Thanks,
George

Andreas Sterbenz wrote:

Hi,

I am about to create a mirrored pool on an amd64 machine running S10 6/06
(no other patches). I plan to install the latest kernel patch (118855).

Are there any ZFS patches already out that I should also install first?

(No, I don't want to move to Nevada, but I will upgrade to S10 11/06 as
soon as it is out)

Andreas.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] is there any way to merge pools and zfs file systems together?

2006-09-20 Thread George Wilson

Chris,

You could use 'zfs send/recv' to migrate one of the filesystems to pool 
you want to keep. You will need to make sure that the properties for 
that filesystem are correct after migrating it over. Then you can 
destroy the pool you just migrated off of and add those disks to the 
pool you migrated to. This will prevent you from having to destroy all 
and restore.


Thanks,
George

Krzys wrote:
Weird question but I have two separate pools and I have zfs file system 
on both of them, I wanted to see if there is a way to merge them 
together?!? or I have to dump content to tape (or some other location) 
then destroy both pools make one big pool of those two and then create 
zfs on it and recover data?!?


thanks for help or any info.

Chris

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Re: ZFS causes panic on import

2006-09-04 Thread George Wilson

Stuart,

Issuing a 'zpool import' will show all the pools which are accessible 
for import and that's why you are seeing them. The fact that a forced 
import gives results in a panic is indicative of pool corruption that 
resulted from being imported on more than one host.


Thanks,
George

Stuart Low wrote:

[EMAIL PROTECTED] ~]$ zpool status -v
no pools available
[EMAIL PROTECTED] ~]$

[EMAIL PROTECTED] ~]$ zpool status -v
no pools available
[EMAIL PROTECTED] ~]$

It's like it's "not there" but when I do a zpool import it reports it as
there and available just that I need to use -f. Use -f gives me
instareboot. :)

Stuart
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Re: ZFS causes panic on import

2006-09-04 Thread George Wilson

Stuart,

Given that the pool was imported on both nodes simultaneously may have 
corrupted it beyond repair. I'm assuming that the "same problem" is a 
system panic? If so, can you send the panic string from that node?


Thanks,
George

Stuart Low wrote:

I thought that might work too but having tried the move of zpool.cache alas 
same problem. :(

Stuart
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: ZFS causes panic on import

2006-09-04 Thread George Wilson

Stuart,

Can you send the output of 'zpool status -v' from both nodes?

Thanks,
George

Stuart Low wrote:

Nada.

[EMAIL PROTECTED] ~]$ zpool export -f ax150s
cannot open 'ax150s': no such pool
[EMAIL PROTECTED] ~]$

I wonder if it's possible to force the pool to be marked as inactive? Ideally 
all I want to do is get it back online then scrub it for errors. :-|

Stuart
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS causes panic on import

2006-09-04 Thread George Wilson

Stuart,

Is it possible that you ran 'zpool import' on node1 and then failed it 
over to the other node which ran 'zpool import' on node2? If so, then 
the pool configuration was automatically added to zpool.cache so that 
the pool could be automatically loaded upon reboot. This may result in 
the pool being imported on both node at the same time.


What you need to do is run 'zpool import -R ' instead as this 
prevent the pool from being added to the cache. You should also ensure 
that the zpool.cache file does not exist on either node and that the 
import is driven by your failover scripts only.


Thanks,
George

Stuart Low wrote:

Hi there,

We've been working with ZFS at an initial setup stage hoping to eventually integrate with 
Sun Cluster 3.2 and create a failover fs. Somehow between my two machines I managed to 
get the file system mounted on both. On reboot of both machines I can now no longer 
import my ZFS file systems. ZFS reports they're online (which they were, albeit before 
the reboot) but when I perform an import I'm told the pool "may be active".

When trying a zpool import -f the machine kernel panics and instantly reboots. 
I'm at a loss as to how to mount this file system properly again. Please help! 
:-|

[EMAIL PROTECTED] ~]$ zpool import
  pool: ax150s
id: 1586873787799685
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
The pool may be active on on another system, but can be imported using
the '-f' flag.
config:

ax150s ONLINE
  mirror   ONLINE
c4t6006016071851800B86C8EE05831DB11d0  ONLINE
c4t6006016031E0180032F8E9868E30DB11d0  ONLINE
  mirror   ONLINE
c4t6006016071851800CA1D94EF5831DB11d0  ONLINE
c4t6006016031E0180026057F9B8E30DB11d0  ONLINE
  mirror   ONLINE
c4t6006016031E018003810E7AC8E30DB11d0  ONLINE
c4t60060160718518009A7926FF5831DB11d0  ONLINE
  mirror   ONLINE
c4t6006016031E01800AC7E34918E30DB11d0  ONLINE
c4t600601607185180010A65FE75831DB11d0  ONLINE
  mirror   ONLINE
c4t6006016031E018005A9B74A48E30DB11d0  ONLINE
c4t600601607185180064063BF85831DB11d0  ONLINE



[EMAIL PROTECTED] ~]$ zpool import ax150s
cannot import 'ax150s': pool may be in use from other system
use '-f' to import anyway
[EMAIL PROTECTED] ~]$ zpool import -f ax150s
Read from remote host solaris1: Connection reset by peer
Connection to solaris1 closed.
[EMAIL PROTECTED] ~]$

Any pointers muchly appreciated! :-|

Stuart
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Significant "pauses" during zfs writes

2006-08-28 Thread George Wilson

A fix for this should be integrated shortly.

Thanks,
George

Michael Schuster - Sun Microsystems wrote:

Robert Milkowski wrote:

Hello Michael,

Wednesday, August 23, 2006, 12:49:28 PM, you wrote:

MSSM> Roch wrote:


MSSM> I sent this output offline to Roch, here's the essential ones 
and (first)

MSSM> his reply:


So it looks like this:

6421427 netra x1 slagged by NFS over ZFS leading to long spins in 
the ATA driver code"

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6421427



Is there any workarounds?
Or maybe some code not yeyt integrated to try?


not that I know of - Roch may be better informed than me though.

cheers
Michael

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: zpool status panics server

2006-08-25 Thread George Wilson

Neal,

This is not fixed yet. Your best best is to run a replicated pool.

Thanks,
George

Neal Miskin wrote:

Hi Dana


It is ZFS bug 6322646; a flaw.


Is this fixed in a patch yet?

nelly_bo
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool hangs

2006-08-24 Thread George Wilson


Robert Milkowski wrote:

Hello George,

Thursday, August 24, 2006, 5:48:08 PM, you wrote:

GW> Robert,

GW> One of your disks is not responding. I've been trying to track down why
GW> the scsi command is not being timed out but for now check out each of 
GW> the devices to make sure they are healthy.


I know - I unmaped LUNs on the array.
But it should time out.


Agreed! I'm trying to track down what changes in the sd driver maybe 
contributing to this.


- George



GW> BTW, if you capture a corefile let me know.

Ooopss. already restarted



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool hangs

2006-08-24 Thread George Wilson

Robert,

One of your disks is not responding. I've been trying to track down why 
the scsi command is not being timed out but for now check out each of 
the devices to make sure they are healthy.


BTW, if you capture a corefile let me know.

Thanks,
George

Robert Milkowski wrote:

Hi.

 S10U2 + patches, SPARC, Generic_118833-20

I issued zpool create but possibly some (or all) MPxIO devices aren't there 
anymore.
Now I can't kill zpool.

bash-3.00# zpool create f3-1 mirror c5t600C0FF0098FD5275268D600d0 
c5t600C0FF0098FD564175B0600d0 mirror 
c5t600C0FF0098FD567D3965E00d0 c5t600C0FF0098FD57E58FAEB00d0 
mirror c5t600C0FF0098FD50642D41000d0 
c5t600C0FF0098FD5580A39C100d0 mirror 
c5t600C0FF0098FD53F34388300d0 c5t600C0FF0098FD57A96A41900d0

^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C

bash-3.00# ps -ef|grep zpool
root  2516  2409   0 16:23:02 pts/6   0:00 zpool create f3-1 mirror 
c5t600C0FF0098FD5275268D600d0 c5t600C0FF00
bash-3.00# kill -9 2516
bash-3.00# kill -9 2516
bash-3.00# kill -9 2516
bash-3.00# ps -ef|grep zpool
root  2516  2409   0 16:23:02 pts/6   0:00 zpool create f3-1 mirror 
c5t600C0FF0098FD5275268D600d0 c5t600C0FF00
bash-3.00# pstack 2516
pstack: cannot examine 2516: no such process
bash-3.00#

bash-3.00# mdb -kw
Loading modules: [ unix krtld genunix dtrace specfs ufs sd md ip sctp usba fcp 
fctl qlc ssd lofs zfs random logindmux ptm cpc fcip crypto nfs ipc ]

::ps!grep pool

R   2516   2409   2516   2403  0 0x4a304902 06001f3c8410 zpool

06001f3c8410::walk thread|::findstack -v

stack pointer for thread 3000352e660: 2a1041f0b51
[ 02a1041f0b51 sema_p+0x130() ]
  02a1041f0c01 biowait+0x6c(60017fc1e80, 0, 183d400, 180c000, 790, 
60017fc1e80)
  02a1041f0cb1 ssd_send_scsi_cmd+0x394(760790, 2a1041f1668, 
600018ef580, 1, 1, 0)
  02a1041f0da1 ssd_send_scsi_TEST_UNIT_READY+0x100(600018ef580, 1, f2, 790, 
0, 0)
  02a1041f0eb1 ssd_get_media_info+0x64(760790, ffbfa8c8, 15, 
600018ef580, 790, 8)
  02a1041f0fc1 ssdioctl+0xb28(760790, 198b2800, 600018ef580, 0, 
60006f20860, 2a1041f1adc)
  02a1041f10d1 fop_ioctl+0x20(60007bae340, 42a, ffbfa8c8, 15, 
60006f20860, 11ff9f0)
  02a1041f1191 ioctl+0x184(3, 60007a14a88, ffbfa8c8, fff8, 73, 42a)
  02a1041f12e1 syscall_trap32+0xcc(3, 42a, ffbfa8c8, fff8, 73, 
70)
stack pointer for thread 3fe0c80: 2a104208f11
[ 02a104208f11 cv_wait+0x38() ]
  02a104208fc1 exitlwps+0x11c(0, 3fe0c80, 4a004002, 6001f3c8410, 
20a0, 4a004002)
  02a104209071 proc_exit+0x20(2, 2, 0, 6efc, 6000787b1a8, 1857400)
  02a104209121 exit+8(2, 2, , 0, 6001f3c8410, 2)
  02a1042091d1 post_syscall+0x3e8(4, 0, 3fe0e54, c9, 6000787b1a8, 1)
  02a1042092e1 syscall_trap32+0x18c(0, 0, 0, 0, fed7bfa0, 4)
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: SCSI synchronize cache cmd

2006-08-22 Thread George Wilson



Roch wrote:

Dick Davies writes:
 > On 22/08/06, Bill Moore <[EMAIL PROTECTED]> wrote:
 > > On Mon, Aug 21, 2006 at 02:40:40PM -0700, Anton B. Rang wrote:
 > > > Yes, ZFS uses this command very frequently. However, it only does this
 > > > if the whole disk is under the control of ZFS, I believe; so a
 > > > workaround could be to use slices rather than whole disks when
 > > > creating a ZFS pool on a buggy device.
 > >
 > > Actually, we issue the command no matter if we are using a whole disk or
 > > just a slice.  Short of an mdb script, there is not a way to disable it.
 > > We are trying to figure out ways to allow users to specify workarounds
 > > for broken hardware without getting the ZFS code all messy as a result.
 > 
 > Has that behaviour changed then? I was definitely told (on list) that

 > write cache was only enabled for a 'full ZFS disk'. Am I wrong in
 > thinking this could be risky for UFS slices on the same disk
 > (or does UFS journalling mitigate that)?

There are 2 things: enabling write cache (done once on disk open)
and flushing the write cache every time it's required (say after
O_DSYNC write).

ZFS does a WCE  if it owns a whole disk.

And  it DKIOCFLUSHWRITECACHE  when  needed  on  all   disks,
whether or  not it enabled  the cache  (as somebody else may
have). If the device respond that this DKIOC is unsupported,
ZFS stops issuing the requests.


Also worth noting that some devices may have the write cache enabled by 
default (like SATA/IDE) so we issue the DKIOCFLUSHWRITECACHE even if we 
didn't enable it. Safety first.


Thanks,
George



-r

 > 
 > -- 
 > Rasputin :: Jack of All Trades - Master of Nuns

 > http://number9.hellooperator.net/
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS questions with mirrors

2006-08-21 Thread George Wilson

Peter,

Are you sure your customer is not hitting this:

6456939 sd_send_scsi_SYNCHRONIZE_CACHE_biodone() can issue TUR which 
calls biowait()and deadlock/hangs host


I have a fix that you could have your customer try.

Thanks,
George

Peter Wilk wrote:

IHAC that is asking the following. any thoughts would be appreciated

Take two drives, zpool to make a mirror.
Remove a drive - and the server HANGS. Power off and reboot the server,
and everything comes up cleanly.

Take the same two drives (still Solaris 10). Install Veritas Volume
Manager (4.1). Mirror the two drives. Remove a drive - everything is
still running. Replace the drive, everything still working. No outage.

So the big questions to Tech support:
1. Is this a "known property" of ZFS ? That when a drive from a hot swap
system is removed the server hangs ? (We were attempting to simulate a
drive failure)
2. Or is this just because it was an E450 ? Ie, would removing a zfs
mirror disk (unexpected hardware removal as opposed to using zfs to
remove the disk) on a V240 or V480 cause the same problem ?
3. What could we expect if a drive "mysteriously failed" during
operation of a server with a zfs mirror ? Would the server hang like it
did during testing ? How can we test this ?
4. If it is a "known property" of zfs, is there a date when it is
expected to be fixed (if ever) ?



Peter

PS: I may not be on this alias so please respond to me directly

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

2006-08-18 Thread George Wilson

Frank,

The SC 3.2 beta maybe closed but I'm forwarding your request to Eric 
Redmond.


Thanks,
George

Frank Cusack wrote:
On August 10, 2006 6:04:38 PM -0700 eric kustarz <[EMAIL PROTECTED]> 
wrote:
If you're doing HA-ZFS (which is SunCluster 3.2 - only available in 
beta right now),


Is the 3.2 beta publicly available?  I can only locate 3.1.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] system unresponsive after issuing a zpool attach

2006-08-16 Thread George Wilson

I believe this is what you're hitting:

6456888 zpool attach leads to memory exhaustion and system hang

We are currently looking at fixing this so stay tuned.

Thanks,
George

Daniel Rock wrote:

Joseph Mocker schrieb:
Today I attempted to upgrade to S10_U2 and migrate some mirrored UFS 
SVM partitions to ZFS.


I used Live Upgrade to migrate from U1 to U2 and that went without a 
hitch on my SunBlade 2000. And the initial conversion of one side of 
the UFS mirrors to a ZFS pool and subsequent data migration went fine. 
However, when I attempted to attach the second side mirrors as a 
mirror of the ZFS pool, all hell broke loose.

 >

9. attach the partition to the pool as a mirror
  zpool attach storage cXtXdXs4 cYtYdYs4

A few minutes after issuing the command the system became unresponsive 
as described above.


Same here. I also did upgrade to S10_U2, and converted my non-root md
similar like you. Everything went fine until the "zpool attach". The system
seemed to be hanging for at least 2-3 minutes. Then I could type something
again. "top" then showed 98% system time.

This was on a SunBlade 1000 with 2 x 750MHz CPUs. The zpool/zfs was 
created with checksum=sha256.




Daniel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: System hangs on SCSI error

2006-08-10 Thread George Wilson

Brad,

I have a suspicion about what you might be seeing and I want to confirm 
it. If it locks up again you can also collect a threadlist:


"echo $
The core dump timed out (related to the SCSI bus reset?), so I don't
have one.  I can try it again, though, it's easy enough to reproduce. 


I was seeing errors on the fibre channel disks as well, so it's possible
the whole thing was locked up. 

BP 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: System hangs on SCSI error

2006-08-09 Thread George Wilson

Brad,

I'm investigating a similar issue and would like to get a coredump if 
you have one available.


Thanks,
George

Brad Plecs wrote:

I have similar problems ... I have a bunch of D1000 disk shelves attached via
SCSI HBAs to a V880.  If I do something as simple as unplug a drive in a raidz 
vdev, it generates SCSI errors that eventually freeze the entire system.  I can

access the filesystem okay for a couple minutes until the SCSI bus resets, then
I have a frozen box.  I have to stop-a/sync/reset. 

If I offline the device before unplugging the drive, I have no problems.  

Yeah, sure, I know you're supposed to offline it first, but I'm trying to test 
unexpected failures.  If the power supplies fail on one of my shelves, the data will be intact, but the system will hang.   This is good, but not great, since I really
want this to be a high-availability system. 


I believe this is a failure of the OS, controller, or SCSI driver to isolate 
the bad device and let the rest of the system operate, rather than a ZFS issue.
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Querying ZFS version?

2006-08-08 Thread George Wilson

Luke,

You can run 'zpool upgrade' to see what on-disk version you are capable 
of running. If you have the latest features then you should be running 
version 3:


hadji-2# zpool upgrade
This system is currently running ZFS version 3.

Unfortunately this won't tell you if you are running the latest fixes 
but it does tell you that you have all the latest features (at least up 
through snv_43).


Thanks,
George

Luke Scharf wrote:
Although regular Solaris is good for what I'm doing at work, I prefer 
apt-get or yum for package management for a desktop.  So, I've been 
playing with Nexenta / GnuSolaris -- which appears to be the 
open-sourced Solaris kernel and low-level system utilities with Debian 
package management -- and a bunch of packages from Ubuntu.


The release I'm playing with (Alpha 5) does, indeed, have ZFS.  However, 
I can't determine what version of ZFS is included.  Dselect gives the 
following information, which doesn't ring any bells for me:

*** Req base sunwzfsr 5.11.40-1   5.11.40-1   ZFS (Root)

Is there a zfs version command that I don't see?

Thanks,
-Luke




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS

2006-08-07 Thread George Wilson

Leon,

Looking at the corefile doesn't really show much from the zfs side. It 
looks like you were having problems with your san though:


/scsi_vhci/[EMAIL PROTECTED] (ssd5) offline
/scsi_vhci/[EMAIL PROTECTED] (ssd5) multipath status: failed, path 
/[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp2) to target address: 
w101738043811,3 is offline Load balancing: none

/scsi_vhci/[EMAIL PROTECTED] (ssd6) offline
/scsi_vhci/[EMAIL PROTECTED] (ssd6) multipath status: failed, path 
/[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp2) to target address: 
w101738043811,2 is offline Load balancing: none

/scsi_vhci/[EMAIL PROTECTED] (ssd7) offline
/scsi_vhci/[EMAIL PROTECTED] (ssd7) multipath status: failed, path 
/[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp2) to target address: 
w101738043811,1 is offline Load balancing: none

WARNING: /scsi_vhci/[EMAIL PROTECTED] (ssd8):
transport rejected fatal error

WARNING: fp(0)::GPN_ID for D_ID=10400 failed

WARNING: fp(0)::N_x Port with D_ID=10400, PWWN=101738279c10 
disappeared from fabric


/[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fcp0):
Lun=0 for target=10400 disappeared
WARNING: /[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 
(fcp0):
FCP: target=10400 reported NO Luns
WARNING: fp(0)::GPN_ID for D_ID=10400 failed

WARNING: fp(0)::N_x Port with D_ID=10400, PWWN=101738279c10 
disappeared from fabric


/[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fcp0):
Lun=0 for target=10400 disappeared
WARNING: /[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 
(fcp0):
FCP: target=10400 reported NO Luns
/[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fcp0):
Lun=0 for target=10400 disappeared
WARNING: /[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 
(fcp0):
FCP: target=10400 reported NO Luns
/scsi_vhci/[EMAIL PROTECTED] (ssd5) multipath status: failed, path 
/[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp2) to target address: 
w101738043811,3 is offline Load balancing: none

/scsi_vhci/[EMAIL PROTECTED] (ssd5) offline
/scsi_vhci/[EMAIL PROTECTED] (ssd6) multipath status: failed, path 
/[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp2) to target address: 
w101738043811,2 is offline Load balancing: none

/scsi_vhci/[EMAIL PROTECTED] (ssd6) offline
/scsi_vhci/[EMAIL PROTECTED] (ssd7) multipath status: failed, path 
/[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp2) to target address: 
w101738043811,1 is offline Load balancing: none

/scsi_vhci/[EMAIL PROTECTED] (ssd7) offline
/scsi_vhci/[EMAIL PROTECTED] (ssd8) multipath status: failed, path 
/[EMAIL PROTECTED],70/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp2) to target address: 
w101738043811,0 is offline Load balancing: none


panic[cpu0]/thread=2a10057dcc0:
BAD TRAP: type=31 rp=2a10057cee0 addr=0 mmu_fsr=0 occurred in module 
"unix" due

to a NULL pointer dereference

Can you reproduce this hang?

Thanks,
George

Leon Koll wrote:

On 8/7/06, William D. Hathaway <[EMAIL PROTECTED]> wrote:

If this is reproducible, can you force a panic so it can be analyzed?


The core files and explorer output are here:
http://napobo3.lk.net/vinc/
The core files were created after the box was hungbreak to OBP...sync
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >