Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-11 Thread Jamie Krier
It appears this bug has been fixed in Solaris 11.1 SRU 3.4

719137515809921SUNBT7191375 metadata rewrites should coordinate with l2arc

Cindy can you confirm?

Thanks


On Fri, Jan 4, 2013 at 3:55 PM, Richard Elling wrote:

> On Jan 4, 2013, at 11:12 AM, Robert Milkowski 
> wrote:
>
>
> Illumos is not so good at dealing with huge memory systems but perhaps
> it is also more stable as well.
>
>
> Well, I guess that it depends on your environment, but generally I would
> expect S11 to be more stable if only because the sheer amount of bugs
> reported by paid customers and bug fixes by Oracle that Illumos is not
> getting (lack of resource, limited usage, etc.).
>
>
> There is a two-edged sword. Software reliability analysis shows that the
> most reliable software is the software that is oldest and unchanged. But
> people also want new functionality. So while Oracle has more changes
> being implemented in Solaris, it is destabilizing while simultaneously
> improving reliability. Unfortunately, it is hard to get both wins. What is
> more
> likely is that new features are being driven into Solaris 11 that are
> destabilizing. By contrast, the number of new features being added to
> illumos-gate (not to be confused with illumos-based distros) is relatively
> modest and in all cases are not gratuitous.
>  -- richard
>
> --
>
> richard.ell...@richardelling.com
> +1-760-896-4422
>
>
>
>
>
>
>
>
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-14 Thread Jamie Krier
I have removed all L2arc devices as a precaution.  Has anyone seen this
error with no L2arc device configured?


On Thu, Dec 13, 2012 at 9:03 AM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Wed, 12 Dec 2012, Jamie Krier wrote:
>
>>
>>
>> I am thinking about switching to an Illumos distro, but wondering if this
>> problem may be present there
>> as well.
>>
>
> I believe that Illumos is forked before this new virtual memory sub-system
> was added to Solaris.  There have not been such reports on Illumos or
> OpenIndiana mailing lists and I don't recall seeing this issue in the bug
> trackers.
>
> Illumos is not so good at dealing with huge memory systems but perhaps it
> is also more stable as well.
>
> Bob
> --
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/**
> users/bfriesen/ <http://www.simplesystems.org/users/bfriesen/>
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-12 Thread Jamie Krier
I've hit this bug on four of my Solaris 11 servers. Looking for anyone else
who has seen it, as well as comments/speculation on cause.

This bug is pretty bad.  If you are lucky you can import the pool read-only
and migrate it elsewhere.

I've also tried setting zfs:zfs_recover=1,aok=1 with varying results.


http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc


Hardware platform:

Supermicro X8DAH

144GB ram

Supermicro sas2 jbods

LSI 9200-8e controllers (Phase 13 fw)

Zuesram log

ZuesIops sas l2arc

Seagate ST33000650SS sas drives


All four servers are running the same hardware, so at first I suspected a
problem there.  I opened a ticket with Oracle which ended with this email:

-

We strongly expect that this is a software issue because this problem does
not happen

on Solaris 10.   On Solaris 11, it happens with both the SPARC and the X64
versions of

Solaris.


We have quite a few customer who have seen this issue and we are in the
process of

working on a fix.  Because we do not know the source of the problem yet, I
cannot speculate

on the time to fix.  This particular portion of Solaris 11 (the virtual
memory sub-system) is quite

different than in Solaris 10.  We re-wrote the memory management in order
to get ready for

systems with much more memory than Solaris 10 was designed to handle.


Because this is the memory management system, there is not expected to be
any

work-around.


Depending on your company's requirements, one possibility is to use Solaris
10 until this

issue is resolved.


I apologize for any inconvenience that  this bug may cause.  We are working
on it as a Sev 1 Priority1 in sustaining engineering.

-


I am thinking about switching to an Illumos distro, but wondering if this
problem may be present there as well.


Thanks


- Jamie
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

2011-06-28 Thread Jamie Krier
Markus Kovero  nebula.fi> writes:

> 
> 
> > Sync was disabled on the main pool and then let to inherrit to everything
else. The > reason for disabled
> this in the first place was to fix bad NFS write performance (even with > Zil
on an X25e SSD it was under 1MB/s).
> > I've also tried setting the logbias to throughput and latency but they both
perform > around the same level.
> 
> > Thanks
> > -Matt
> 
> I believe you're hitting bug "7000208: Space map trashing affects NFS write
throughput". We also did, and
> it did impact iscsi as well.
> 
> If you have enough ram you can try enabling metaslab debug (which makes
problem vanish);
> 
> # echo metaslab_debug/W1 | mdb -kw
> 
> And calculating amount of ram needed:
> 
> /usr/sbin/amd64/zdb -mm  > /tmp/zdb-mm.out
> 
> awk '/segments/ {s+=$2}END {printf("sum=%d\n",s)}' zdb_mm.out
> 
> 93373117 sum of segments
> 16 VDEVs
> 116 metaslabs
> 1856 metaslabs in total
> 
> 93373117/1856 = 50308 average number of segments per metaslab
> 
> 50308*1856*64
> 5975785472
> 
> 5975785472/1024/1024/1024
> 5.56
> 
> = 5.56 GB
> 
> Yours
> Markus Kovero
> 

We are running Solaris Express 11. We hit this same problem when we crossed 80%
usage on our mirror pool. 
Write performance dropped to 3-8 MB/sec on 48 mirror sets!

echo metaslab_debug/W1 | mdb -kw  resolves problem instantly.

What are the ramifications of running this command?

Thanks
- Jamie



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss