Re: [zfs-discuss] reboot when copying large amounts of data

Remco Lengers Wed, 11 Mar 2009 12:41:46 -0700

looks worth a go otherwise:

if the boot disk is also off that controller it may be too hosed towrite anything to the boot disk hence FMA doesn't see any issue when itcomes up. Possible further actions:


- Upgrade FW of controller to highest or known working level
- Upgrade driver or OS level.
- Try another controller (may be its broken and barfs under stress ?)
- Analyze the crash dump (if any is saved)

- It may be its a know Solaris or driver bug and somebody has heard ofit before.


hth,

..Remco

Blake wrote:

Could the problem be related to this bug:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6793353

I'm testing setting the maximum payload size as a workaround, as noted
in the bug notes.



On Wed, Mar 11, 2009 at 3:14 PM, Blake <blake.ir...@gmail.com> wrote:

I think that TMC Research is the company that designed the
Supermicro-branded controller card that has the Marvell SATA
controller chip on it.  Googling around I see connections between
Supermicro and TMC.

This is the card:

http://www.supermicro.com/products/accessories/addon/AOC-SAT2-MV8.cfm

On Wed, Mar 11, 2009 at 3:08 PM, Remco Lengers <re...@lengers.com> wrote:

Something is not right in the IO space. The messages talk about

vendor ID = 11AB

0x11AB  Marvell Semiconductor

TMC Research

Vendor Id: 0x1030
Short Name: TMC

Does "fmdump -eV" give any clue when the box comes back up?

..Remco


Blake wrote:

I'm attaching a screenshot of the console just before reboot.  The
dump doesn't seem to be working, or savecore isn't working.

On Wed, Mar 11, 2009 at 11:33 AM, Blake <blake.ir...@gmail.com> wrote:

I'm working on testing this some more by doing a savecore -L right
after I start the copy.

BTW, I'm copying to a raidz2 of only 5 disks, not 16 (the chassis
supports 16, but isn't fully populated).

So far as I know, there is no spinup happening - these are not RAID
controllers, just dumb SATA JBOD controllers, so I don't think they
control drive spin in any particular way.  Correct me if I'm wrong, of
course.



On Wed, Mar 11, 2009 at 11:23 AM, Marc Bevand <m.bev...@gmail.com> wrote:

The copy operation will make all the disks start seeking at the same
time and
will make your CPU activity jump to a significant percentage to compute
the
ZFS checksum and RAIDZ parity. I think you could be overloading your PSU
because of the sudden increase in power consumption...

However if you are *not* using SATA staggered spin-up, then the above
theory
is unlikely because spinning up consumes much more power than when
seeking.
So, in a sense, a successful boot proves your PSU is powerful enough.

Trying reproducing the problem by copying data on a smaller number of
disks.
You tried 2 and 16. Try 8.

-marc

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

------------------------------------------------------------------------


------------------------------------------------------------------------

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] reboot when copying large amounts of data

Reply via email to