> > Shot in the dark here: > > What are you using for the sharenfs value on the ZFS filesystem? Something > like rw=.mydomain.lan ?
They are IP blocks or hosts specified as FQDNs, eg., pptank/home/tcrane sharenfs rw=@192.168.101/24,rw=serverX.xx.rhul.ac.uk:serverY.xx.rhul.ac.uk > > I've had issues where a ZFS server loses connectivity to the primary DNS > server and as a result the reverse lookups used to validate the identity It was using our slave DNS but there have been no recent problems with it. I've switched it to the primary DNS. > of client systems fails and the connections hang. Any chance there's a > planned reboot of the DNS server Sunday morning? That sounds like the kind of No. The only things tied to Sunday morning are these two (Solaris factory installed?) cronjobs; root@server5:/# grep nfsfind /var/spool/cron/crontabs/root 15 3 * * 0 /usr/lib/fs/nfs/nfsfind root@server5:/# grep 13 /var/spool/cron/crontabs/lp # At 03:13am on Sundays: 13 3 * * 0 cd /var/lp/logs; if [ -f requests ]; then if [ -f requests.1 ]; then /bin/mv requests.1 requests.2; fi; /usr/bin/cp requests requests.1; >requests; fi The lp one does not access the main ZFS pool but the nfsfind does. However, AFAICT it has usually finished before the problem manifests itself. > preventative maintenance that might be happening in that time window. Cheers Tom. > > Cheers, > > Erik > > On 13 juin 2012, at 12:47, tpc...@mklab.ph.rhul.ac.uk wrote: > > > Dear All, > > I have been advised to enquire here on zfs-discuss with the > > ZFS problem described below, following discussion on Usenet NG > > comp.unix.solaris. The full thread should be available here > > https://groups.google.com/forum/#!topic/comp.unix.solaris/uEQzz1t-G1s > > > > Many thanks > > Tom Crane > > > > > > > > -- forwarded message > > > > cindy.swearin...@oracle.com wrote: > > : On Tuesday, May 29, 2012 5:39:11 AM UTC-6, (unknown) wrote: > > : > Dear All, > > : > Can anyone give any tips on diagnosing the following recurring > > problem? > > : > > > : > I have a Solaris box (server5, SunOS server5 5.10 Generic_147441-15 > > : > i86pc i386 i86pc ) whose ZFS FS NFS exported service fails every so > > : > often, always in the early hours of Sunday morning. I am barely > > : > familiar with Solaris but here what I have managed to discern when the > > : > problem occurs; > > : > > > : > Jobs on other machines which access server5's shares (via automounter) > > : > hang and attempts to manually remote-mount shares just timeout. > > : > > > : > Remotely, showmount -e server5 shows all the exported FS are available. > > : > > > : > On server5, the following services are running; > > : > > > : > root@server5:/var/adm# svcs | grep nfs > > : > online May_25 svc:/network/nfs/status:default > > : > online May_25 svc:/network/nfs/nlockmgr:default > > : > online May_25 svc:/network/nfs/cbd:default > > : > online May_25 svc:/network/nfs/mapid:default > > : > online May_25 svc:/network/nfs/rquota:default > > : > online May_25 svc:/network/nfs/client:default > > : > online May_25 svc:/network/nfs/server:default > > : > > > : > On server5, I can list and read files on the affected FSs w/o problem > > : > but any attempt to write to the FS (eg. copy a file to or rm a file > > : > on the FS) just hangs the cp/rm process. > > : > > > : > On server5, using a zfs command zfs 'get sharenfs pptank/local_linux' > > : > displays the expected list of hosts/IPs with remote ro & rw access. > > : > > > : > Here is the O/P from some other hopefully relevant commands; > > : > > > : > root@server5:/# zpool status > > : > pool: pptank > > : > state: ONLINE > > : > status: The pool is formatted using an older on-disk format. The pool > > can > > : > still be used, but some features are unavailable. > > : > action: Upgrade the pool using 'zpool upgrade'. Once this is done, the > > : > pool will no longer be accessible on older software versions. > > : > scan: none requested > > : > config: > > : > > > : > NAME STATE READ WRITE CKSUM > > : > pptank ONLINE 0 0 0 > > : > raidz1-0 ONLINE 0 0 0 > > : > c3t0d0 ONLINE 0 0 0 > > : > c3t1d0 ONLINE 0 0 0 > > : > c3t2d0 ONLINE 0 0 0 > > : > c3t3d0 ONLINE 0 0 0 > > : > c3t4d0 ONLINE 0 0 0 > > : > c3t5d0 ONLINE 0 0 0 > > : > c3t6d0 ONLINE 0 0 0 > > : > > > : > errors: No known data errors > > : > > > : > root@server5:/# zpool list > > : > NAME SIZE ALLOC FREE CAP HEALTH ALTROOT > > : > pptank 12.6T 384G 12.3T 2% ONLINE - > > : > > > : > root@server5:/# zpool history > > : > History for 'pptank': > > : > <just hangs here> > > : > > > : > root@server5:/# zpool iostat 5 > > : > capacity operations bandwidth > > : > pool alloc free read write read write > > : > ---------- ----- ----- ----- ----- ----- ----- > > : > pptank 384G 12.3T 92 115 3.08M 1.22M > > : > pptank 384G 12.3T 1.11K 629 35.5M 3.03M > > : > pptank 384G 12.3T 886 889 27.1M 3.68M > > : > pptank 384G 12.3T 837 677 24.9M 2.82M > > : > pptank 384G 12.3T 1.19K 757 37.4M 3.69M > > : > pptank 384G 12.3T 1.02K 759 29.6M 3.90M > > : > pptank 384G 12.3T 952 707 32.5M 3.09M > > : > pptank 384G 12.3T 1.02K 831 34.5M 3.72M > > : > pptank 384G 12.3T 707 503 23.5M 1.98M > > : > pptank 384G 12.3T 626 707 20.8M 3.58M > > : > pptank 384G 12.3T 816 838 26.1M 4.26M > > : > pptank 384G 12.3T 942 800 30.1M 3.48M > > : > pptank 384G 12.3T 677 675 21.7M 2.91M > > : > pptank 384G 12.3T 590 725 19.2M 3.06M > > : > > > : > > > : > top shows the following runnable processes. Nothing excessive here > > AFAICT? > > : > > > : > last pid: 25282; load avg: 1.98, 1.95, 1.86; up 1+09:02:05 > > 07:46:29 > > : > 72 processes: 67 sleeping, 1 running, 1 stopped, 3 on cpu > > : > CPU states: 81.5% idle, 0.1% user, 18.3% kernel, 0.0% iowait, 0.0% > > swap > > : > Memory: 2048M phys mem, 32M free mem, 16G total swap, 16G free swap > > : > > > : > PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND > > : > 748 root 18 60 -20 103M 9752K cpu/1 78:44 6.62% nfsd > > : > 24854 root 1 54 0 1480K 792K cpu/1 0:42 0.69% cp > > : > 25281 root 1 59 0 3584K 2152K cpu/0 0:00 0.02% top > > : > > > : > The above cp job is as mentioned above, attempting to copy a file to > > : > an effected FS, I've noticed is apparently not completely hung. > > : > > > : > The only thing that appears specific to Sunday morning is a cronjob to > > : > remove old .nfs* files, > > : > > > : > root@server5:/# crontab -l | grep nfsfind > > : > 15 3 * * 0 /usr/lib/fs/nfs/nfsfind > > : > > > : > Any suggestions on how to proceed? > > : > > > : > Many thanks > > : > Tom Crane > > : > > > : > Ps. The email address in the header is just a spam-trap. > > : > -- > > : > Tom Crane, IT support, RHUL Particle Physics., > > : > Dept. Physics, Royal Holloway, University of London, Egham Hill, > > : > Egham, Surrey, TW20 0EX, England. > > : > Email: T.Crane at rhul dot ac dot uk > > > > : Hi Tom, > > > > Hi Cindy, > > Thanks for the followup > > > > : I think SunOS server5 5.10 Generic_147441-15 is the Solaris 10 8/11 > > : release. Is this correct? > > > > I think so,... > > root@server5:/# cat /etc/release > > Solaris 10 10/08 s10x_u6wos_07b X86 > > Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. > > Use is subject to license terms. > > Assembled 27 October 2008 > > > > > > : We looked at your truss output briefly and it looks like it is hanging > > : trying to allocate memory. At least, that's what the "br ...." statements > > : are at the end. > > > > : I will see if I can find out what diagnostic info would be help in > > : this case. > > > > Thanks. That would be much appreciated. > > > > : You might get a faster response on zfs-discuss as John suggested. > > > > I will CC to zfs-discuss. > > > > Best regards > > Tom. > > > > : Thanks, > > > > : Cindy > > > > Ps. The email address in the header is just a spam-trap. > > -- > > Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill, > > Egham, Surrey, TW20 0EX, England. > > Email: T.Crane at rhul dot ac dot uk > > -- end of forwarded message -- > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 0EX, England. Email: t.cr...@rhul.ac.uk Fax: +44 (0) 1784 472794 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss