Re: [NILFS users] NILFS hanging SLES 11 - advise on diagnosis needed

Ryusuke Konishi Fri, 02 Oct 2009 04:42:39 -0700

Hi,
On Fri, 2 Oct 2009 12:11:14 +0200, "Barham, David" wrote:
> Hi
> I'm running SLES 11, 2.6.27.19-5-default with NILFS2 nilfs-2.0.16. I have a 
> 1.5Tb NILFS2 partition which I am setting up with the intention of using 
> Robocopy from various PCs via samba. The robocopy scripts run nightly and a 
> checkpoint is taken once night. A script stops samba, unmounts the previous 
> weeks checkpoint, deletes the checkpoint, creates a new one and then mounts 
> it and restarts samba. This should mean that at any time the user can go back 
> to 'snapshot_{DAY}' to get their files back.
> 
> So far so good.
> 
> However as I copy the previously backed up files from the previous linux 
> machine where I was doing this (only giving a 'current' copy with reiserfs). 
> I'm finding that the new machine is occasionally hanging. The OS just locks 
> up, screen on console frozen but host still responds to ping. 
> 
> I'm trying to work out what is causing the hang, I'm getting various messages 
> in the log from smartd relating to the disk which houses the NILFS along the 
> lines of:
> 
>  Oct  2 09:56:59 cpli6008 syslog-ng[1933]: Log statistics; 
> dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', 
> processed='center(queued)=947', processed='center(received)=478', 
> processed='destination(newsnotice)=0', processed='destination(acpid)=0', 
> processed='destination(firewall)=0', processed='destination(mail)=12', 
> processed='destination(mailinfo)=12', processed='destination(console)=151', 
> processed='destination(newserr)=0', processed='destination(newscrit)=0', 
> processed='destination(messages)=466', processed='destination(mailwarn)=0', 
> processed='destination(localmessages)=0', processed='destination(netmgm)=0', 
> processed='destination(mailerr)=0', processed='destination(xconsole)=151', 
> processed='destination(warn)=155', processed='source(src)=478'
> Oct  2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sda [SAT], SMART Usage 
> Attribute: 194 Temperature_Celsius changed from 110 to 112
> Oct  2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART 
> Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 115 to 117
> Oct  2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage 
> Attribute: 189 High_Fly_Writes changed from 88 to 87
> Oct  2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage 
> Attribute: 190 Airflow_Temperature_Cel changed from 60 to 61
> Oct  2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage 
> Attribute: 194 Temperature_Celsius changed from 40 to 39
> Oct  2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage 
> Attribute: 195 Hardware_ECC_Recovered changed from 50 to 51
> 
> {machine stops responding and gets power cycled}
>
> Oct  2 10:10:58 cpli6008 syslog-ng[1948]: syslog-ng starting up; 
> version='2.0.9'
> 
> Do folks think that the hang is NILFS or dodgy hardware/reporting
> from smartd? Is there any advise on getting some debug or status
> information from NILFS to help show it isn't the cause of the
> problem. I would have expected that if it went bang I'd have seen
> something 'worrying' in the log.


The nilfs2 standalone module has a debug mode.  You can enable it by
commenting out the following line (i.e. CONFIG_NILFS_DEBUG=y) in
nilfs2-module/fs/Makefile before compiling:

ifndef CONFIG_NILFS
  EXTERNAL_BUILD=y
  CONFIG_NILFS=m
  # Uncomment below to do debug build.
  CONFIG_NILFS_DEBUG=y
  # Uncomment below to enable bmap validity check.
  #CONFIG_NILFS_BMAP_DEBUG=y
endif

By the way, I'm planning to release nilfs-2.0.17 tomorrow in order to
solve file system corruption problems which infrequently happen and
were reported on this list.

The bugfix was already merged in the mainline and also sent to -stable
trees for 3.6.30 and 3.6.31, but not yet done.

Your problem looks hardware problem to me, but I think the new version
is worth a try.

Cheers,
Ryusuke Konishi
_______________________________________________
users mailing list
[email protected]
https://www.nilfs.org/mailman/listinfo/users

Re: [NILFS users] NILFS hanging SLES 11 - advise on diagnosis needed

Reply via email to