On Mon, 10 Mar 2008, Lida Horn wrote:

> Paul Raines wrote:
>> Well, I ran updatemanager and started applying about 64 updates.  After
>> the progress meter got about half way it seemed to hang not moving for
>> hours.  I finally gave up and did a reboot.  But the machine would not
>> reboot.  I went in the ILOM and tried 'stop /SYS' but after a few minutes
>> would get back an error on the console saying something like "shutdown 
>> failed".  So I finally just hard power cycled the box.  Luckily, it came
>> back up seemingly okay and I was able to rerun updatemanager and get all
>> updates installed.  However, after rebooting I now note the following
>> error messages on the console:
>> 
>> Mar  9 03:22:16 raidsrv03 sata: NOTICE: 
>> /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
>> Mar  9 03:22:16 raidsrv03  port 6: device reset
>> Mar  9 03:22:16 raidsrv03 sata: NOTICE: 
>> /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
>> Mar  9 03:22:16 raidsrv03  port 6: link lost
>> Mar  9 03:22:16 raidsrv03 sata: NOTICE: 
>> /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
>> Mar  9 03:22:16 raidsrv03  port 6: link established
>> Mar  9 03:22:16 raidsrv03 scsi: WARNING: 
>> /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL 
>> PROTECTED]/[EMAIL PROTECTED],0 (sd46):
>> Mar  9 03:22:16 raidsrv03       Error for Command: write(10) Error Level: 
>> Retryable
>> Mar  9 03:22:16 raidsrv03 scsi:         Requested Block: 68158362 Error 
>> Block: 68158362
>> Mar  9 03:22:16 raidsrv03 scsi:         Vendor: ATA Serial Number:
>> Mar  9 03:22:16 raidsrv03 scsi:         Sense Key: No Additional Sense
>> Mar  9 03:22:16 raidsrv03 scsi:         ASC: 0x0 (no additional sense 
>> info), ASCQ: 0x0, FRU: 0x0
>> 
>> 
>> The above repeated a few times but now seems to have stopped. Running 'hd 
>> -c'
>> shows all disks as ok.  But it seems like I do have a disk problem.  But 
>> since
>> everything is redundant (zraid) why a failed disk should lock up the 
>> machine
>> like I saw I don't understand unless there is a some bigger issue.
>> 
>> Any advice?
>> 
> It is unclear what you are talking about.  Do you have any evidence to 
> connect
> that retryable write errors with the previous hang or were they two 
> independent
> events?  The retried write error would appear to be normal behavior with
> a bad sector.  If the sector is actually bad, there would be the initial 
> write
> attempt followed by five retries.  The last retry would have "Error Level: 
> Fatal"
> as opposed to "Error Level: Retryable", otherwise one of the retries would
> have been successful and everything would move on.
>
> Regards,
> Lida

No, I cannot connect the two events.  When the 'zfs create' hang happened, and 
the hang on applying updates, there were no error messages at all I could 
find.  The above only happened after the reboot.  SO it is circumstancial.


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to