Hello Roy Do you see anything of interest in the netapp /etc/messages file?
Might it be a problem with the tape cart or the tape drive? What is the make and model of the lto-4 tape drive and the library? Do you see anything of interest in the tape drive/library error logs. Is it always the same tape cart or same tape drive that reports the i/o error? -----Original Message----- From: veritas-bu-boun...@mailman.eng.auburn.edu [mailto:veritas-bu-boun...@mailman.eng.auburn.edu] On Behalf Of Roy McMorran Sent: Wednesday, June 23, 2010 3:29 PM To: Veritas-bu@mailman.eng.auburn.edu Subject: [Veritas-bu] NDMP Fulls fail w/error 99 I have a case open with Symantec, but I thought I'd try to tap the collective wisdom of this list to see if anyone has encountered this before. I am consistently seeing a failure of full (only full) NDMP backups on one volume on a NetApp filer. The last successful full backup of this volume was three weekends ago. However, full backups of four other (smaller) volumes on the same filer run successfully, as do all incrementals on all volumes. This is what we are seeing in the detailed status report: 06/16/2010 04:27:43 - Error bpbrm (pid=12985) db_FLISTsend failed: premature eof encountered (233) 06/16/2010 04:27:44 - Error ndmpagent (pid=12992) write socket err 4 (Interrupted system call) 06/16/2010 04:27:45 - Error ndmpagent (pid=12992) FH send to BRM failed - Ino - 2884 0 2884 0 2454527 0 0 8 33261 mcmorran staff 2884 1267655394 1195639440 1239117488 0000020e1df4040000000000019fda9b 27253403 06/16/2010 04:27:45 - Error ndmpagent (pid=12992) NDMP backup failed, path = /vol/vol4/ 06/16/2010 04:28:28 - Error bptm (pid=12993) io_ioctl_ndmp (MTBSF) failed on media id DC4000, drive index 5, return code 18 (NDMP_XDR_DECODE_ERR) (bptm.c.8703) 06/16/2010 04:28:29 - end writing; write time: 2:56:58 NDMP backup failure (99) So far we have increased MAX_FILES_PER_ADD up to 25000 and then 40000 but no joy there. The LAN and fibre channel connections look to be OK. No errors or collisions visible on any interface or switch. There are no errors in the messages files of the server or the filer. Symantec has verbose logs for bptm, bpbrm, ndmp, ndmpagent, etc. and the ndmpd log from the filer, but so far hasn't found anything helpful. I've included some excerpts below. A test of dump to null (from the command line) on the filer ran to completion without errors. The filer is a NetApp 3020 running OnTap 7.2.5. It has one LTO4 drive fibre-attached. In the same (SL-500) library there are three other drives that are attached to the master/media server. Netbackup is Version 7. The Master/media server is running Solaris 10. Any ideas, or tips for troubleshooting? Thanks much. Some log excerpts: bpbrm log: 04:27:03.230 [12985] <2> db_end: Need to collect reply 04:27:42.832 [12985] <2> get_long: (2) premature end of file (byte 1) 04:27:42.952 [12985] <2> db_getdata: get_string() failed: I/O error (5) premature end of file encountered (-9) 04:27:42.953 [12985] <2> db_end: no DONE from db_getreply(): premature eof encountered 04:27:42.953 [12985] <2> db_FLISTsend: db_end_sts() failed: premature eof encountered (233) 04:27:42.987 [12985] <16> bpbrm main: db_FLISTsend failed: premature eof encountered (233) 04:27:43.348 [12985] <2> vnet_vnetd_service_socket: ../../libvlibs/vnet_vnetd.c.2054: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006 04:27:43.348 [12985] <2> vnet_vnetd_service_socket: ../../libvlibs/vnet_vnetd.c.2068: service: bpjobd 04:27:43.363 [12985] <2> job_connect: SO_KEEPALIVE set on socket 5 for client nbusvr 04:27:43.363 [12985] <2> logconnections: BPJOBD CONNECT FROM 209.222.195.241.37331 TO 209.222.195.241.13724 04:27:43.363 [12985] <2> job_authenticate_connection: ignoring VxSS authentication check for now... 04:27:43.364 [12985] <2> job_connect: Connected to the host nbusvr contype 10 jobid <143965> socket <5> 04:27:43.364 [12985] <2> job_connect: Connected on port 37331 04:27:43.484 [12985] <2> job_monitoring_exex: ACK disconnect 04:27:43.484 [12985] <2> job_disconnect: Disconnected 04:27:43.508 [12985] <4> db_error_add_to_file: db_FLISTsend failed: premature eof encountered (233) 04:27:43.551 [12985] <2> signal_ndmpagent: sending signal=1 to ndmpagent on nbusvr, client_pid=12992 04:27:43.551 [12985] <2> bpcr_send_signal: Ignoring connect_opts = 0x01030202 04:27:43.551 [12985] <2> read_client: ? 04:27:43.551 [12985] <2> read_client: opendir() failed: nbusvr: No such file or directory (2) bptm log: 04:27:25.432 [12993] <2> NdmpSession: [1164] Sending 15 (GET_KBYTES) "" 04:27:45.068 [12993] <2> NdmpSession: [1164] Reply 15 "786532096 0 12289564", error = 0 04:28:15.072 [12993] <2> NdmpSession: [1165] Sending 15 (GET_KBYTES) "" 04:28:28.005 [12993] <2> NdmpSession: [1166] Received 8 (FAILED_ALL_PATHS) "" 04:28:28.005 [12993] <2> NdmpSession: [1166] Replying error = 0 04:28:28.006 [12993] <2> NdmpSession: [1165] Reply 15 "791310912 0 12364233", error = 0 04:28:28.006 [12993] <16> write_data: ndmpagent reports failure - all paths 04:28:28.006 [12993] <2> write_backup: write_data() returned, exit_status = 99, CINDEX = 0, TWIN_INDEX = 0, backup_status = -8 04:28:28.006 [12993] <2> write_backup: tp.tv_sec = 1276676908, stp.tv_sec = 1276666291, tp.tv_usec = 6090, stp.tv_usec = 921462, et = 37732520, mpx_total_kbytes[TWIN_INDEX = 0] = 2211481024 04:28:28.006 [12993] <2> io_terminate_tape: writing empty backup header, drive index 5, copy 1 04:28:28.006 [12993] <2> io_terminate_tape: reposition to previous tapemark and rewrite header 04:28:28.006 [12993] <2> io_ioctl: command (2)MTBSF 1 from (bptm.c.8703) on drive index 5 04:28:28.715 [12993] <16> io_ioctl: io_ioctl_ndmp (MTBSF) failed on media id DC4000, drive index 5, return code 18 (NDMP_XDR_DECODE_ERR) (bptm.c.8703) 04:28:28.715 [12993] <2> send_MDS_msg: DEVICE_STATUS 1 7884 nbusvr DC4000 4000449 LTO4-4 2000022 POSITION_ERROR 0 0 04:28:29.444 [12993] <2> log_media_error: successfully wrote to error file - 06/16/10 04:28:29 DC4000 5 POSITION_ERROR LTO4-4 04:28:29.444 [12993] <2> io_close: closing /usr/openv/netbackup/db/media/tpreq/drive_LTO4-4, from bptm.c.18975 04:28:29.556 [12993] <2> io_close: ignore ndmp close error, NDMP_EOF_ERR 04:28:29.556 [12993] <2> drivename_write: Called with mode 1 04:28:29.556 [12993] <2> drivename_unlock: unlocked 04:28:29.556 [12993] <2> drivename_close: Called for file LTO4-4 04:28:29.557 [12993] <2> tpunmount: NOP: MEDIA_DONE 0 143965 0 DC4000 4000449 0 {4A61577C-7908-11DF-BD6F-0003BAC24451} 04:28:29.557 [12993] <2> send_brm_msg: EXIT netapp-e0d_1276638844 99 04:28:29.560 [12993] <2> NdmpSession: ndmp_public_session_destory: destroying session 0x100937f90 04:28:29.750 [12993] <2> NdmpSession: ndmp_net_read: connection 1009b3b20: (9) end of file detected 04:28:29.750 [12993] <2> NdmpSession: ndmp_message_receive: xdr_ndmp_header failed, unable to XDR decode 04:28:29.750 [12993] <2> NdmpSession: ndmp_net_read: connection 1009b3b20: (9) end of file detected 04:28:29.750 [12993] <2> NdmpSession: ndmp_xm_wait_for_connect_close: ERROR ndmp_process_message failed, status = 18 04:28:29.750 [12993] <2> NdmpSession: 1009b3b20 -> (9) 1174 [0] 04:28:29 NDMP_CONNECT_CLOSE (0x902) NDMP_NO_ERR (0x0) 04:28:29.944 [12993] <2> bptm: EXITING with status 99 <---------- filer ndmpd.log: Jun 16 04:27:45 EDT [ndmpd:47]: NDMP message type: NDMP_MOVER_GET_STATE Jun 16 04:27:45 EDT [ndmpd:47]: NDMP message replysequence: 1212 Jun 16 04:27:45 EDT [ndmpd:47]: Message Header: Jun 16 04:27:45 EDT [ndmpd:47]: Sequence 0 Jun 16 04:27:45 EDT [ndmpd:47]: Timestamp 0 Jun 16 04:27:45 EDT [ndmpd:47]: Msgtype 1 Jun 16 04:27:45 EDT [ndmpd:47]: Method 2560 Jun 16 04:27:45 EDT [ndmpd:47]: ReplySequence 1212 Jun 16 04:27:45 EDT [ndmpd:47]: Error NDMP_NO_ERR Jun 16 04:27:45 EDT [ndmpd:47]: Error code: NDMP_NO_ERR Jun 16 04:27:45 EDT [ndmpd:47]: Record number: 34554391 Jun 16 04:27:45 EDT [ndmpd:47]: Mover state: NDMP_MOVER_STATE_ACTIVE Jun 16 04:27:45 EDT [ndmpd:48]: NDMP message type: NDMP_DATA_GET_STATE_V2 Jun 16 04:27:45 EDT [ndmpd:48]: NDMP message replysequence: 1131 Jun 16 04:27:45 EDT [ndmpd:48]: Message Header: Jun 16 04:27:45 EDT [ndmpd:48]: Sequence 0 Jun 16 04:27:45 EDT [ndmpd:48]: Timestamp 0 Jun 16 04:27:45 EDT [ndmpd:48]: Msgtype 1 Jun 16 04:27:45 EDT [ndmpd:48]: Method 1024 Jun 16 04:27:45 EDT [ndmpd:48]: ReplySequence 1131 Jun 16 04:27:45 EDT [ndmpd:48]: Error NDMP_NO_ERR Jun 16 04:27:45 EDT [ndmpd:48]: Error code: NDMP_NO_ERR Jun 16 04:27:45 EDT [ndmpd:48]: NDMP message type: NDMP_DATA_ABORT Jun 16 04:27:45 EDT [ndmpd:48]: NDMP message replysequence: 1132 Jun 16 04:27:45 EDT [ndmpd:48]: Message Header: Jun 16 04:27:45 EDT [ndmpd:48]: Sequence 0 Jun 16 04:27:45 EDT [ndmpd:48]: Timestamp 0 Jun 16 04:27:45 EDT [ndmpd:48]: Msgtype 1 Jun 16 04:27:45 EDT [ndmpd:48]: Method 1027 Jun 16 04:27:45 EDT [ndmpd:48]: ReplySequence 1132 Jun 16 04:27:45 EDT [ndmpd:48]: Error NDMP_NO_ERR Jun 16 04:27:45 EDT [ndmpd:47]: Message NDMP_NOTIFY_MOVER_HALTED sent Jun 16 04:27:45 EDT [ndmpd:47]: Message Header: Jun 16 04:27:45 EDT [ndmpd:47]: Sequence 1217 Jun 16 04:27:45 EDT [ndmpd:47]: Timestamp 1276676865 Jun 16 04:27:45 EDT [ndmpd:47]: Msgtype 0 Jun 16 04:27:45 EDT [ndmpd:47]: Method 1283 Jun 16 04:27:45 EDT [ndmpd:47]: ReplySequence 0 Jun 16 04:27:45 EDT [ndmpd:47]: Error NDMP_NO_ERR Jun 16 04:27:45 EDT [ndmpd:47]: Reason: 1 Jun 16 04:28:27 EDT [ndmpd:48]: Message: NDMP_NOTIFY_DATA_HALTED sent Jun 16 04:28:27 EDT [ndmpd:48]: Message Header: Jun 16 04:28:27 EDT [ndmpd:48]: Sequence 2766 Jun 16 04:28:27 EDT [ndmpd:48]: Timestamp 1276676907 Jun 16 04:28:27 EDT [ndmpd:48]: Msgtype 0 Jun 16 04:28:27 EDT [ndmpd:48]: Method 1281 Jun 16 04:28:27 EDT [ndmpd:48]: ReplySequence 0 Jun 16 04:28:27 EDT [ndmpd:48]: Error NDMP_NO_ERR Jun 16 04:28:27 EDT [ndmpd:48]: Reason: 2 Jun 16 04:28:27 EDT [ndmpd:48]: Message NDMP_LOG_MESSAGE sent Jun 16 04:28:27 EDT [ndmpd:48]: Message Header: Jun 16 04:28:27 EDT [ndmpd:48]: Sequence 2767 Jun 16 04:28:27 EDT [ndmpd:48]: Timestamp 1276676907 Jun 16 04:28:27 EDT [ndmpd:48]: Msgtype 0 Jun 16 04:28:27 EDT [ndmpd:48]: Method 1539 Jun 16 04:28:27 EDT [ndmpd:48]: ReplySequence 0 Jun 16 04:28:27 EDT [ndmpd:48]: Error NDMP_NO_ERR Jun 16 04:28:27 EDT [ndmpd:48]: Log type: 0 Jun 16 04:28:27 EDT [ndmpd:48]: Message id: 0 Jun 16 04:28:27 EDT [ndmpd:48]: Log message: Aborted by client Jun 16 04:28:27 EDT [ndmpd:48]: Associated message valid: 0 Jun 16 04:28:27 EDT [ndmpd:48]: Associated message sequence: 0 Jun 16 04:28:27 EDT [ndmpd:48]: Error code: NDMP_NO_ERR Jun 16 04:28:27 EDT [ndmpd:48]: NDMP message type: NDMP_DATA_STOP Jun 16 04:28:27 EDT [ndmpd:48]: NDMP message replysequence: 1133 Jun 16 04:28:27 EDT [ndmpd:48]: Message Header: Jun 16 04:28:27 EDT [ndmpd:48]: Sequence 0 Jun 16 04:28:27 EDT [ndmpd:48]: Timestamp 0 Jun 16 04:28:27 EDT [ndmpd:48]: Msgtype 1 Jun 16 04:28:27 EDT [ndmpd:48]: Method 1031 Jun 16 04:28:27 EDT [ndmpd:48]: ReplySequence 1133 Jun 16 04:28:27 EDT [ndmpd:48]: Error NDMP_NO_ERR Jun 16 04:28:27 EDT [ndmpd:48]: Error code: NDMP_NO_ERR filer backup log: dmp Tue Jun 15 18:01:12 EDT /vol/vol4/(0) Phase_change (III) dmp Tue Jun 15 18:46:31 EDT /vol/vol4/(0) Phase_change (IV) dmp Wed Jun 16 04:27:45 EDT /vol/vol4/(0) Tape_close (ndmp) dmp Wed Jun 16 04:27:45 EDT /vol/vol4/(0) Abort (2159666 MB) dmp Wed Jun 16 04:27:45 EDT /vol/vol4/(0) Log_msg (reg inodes: 2860708 other inodes: 7324 dirs: 127786 nt dirs: 31694 nt inodes: 32206 acls: 801) dmp Wed Jun 16 04:27:45 EDT /vol/vol4/(0) Log_msg (Phase 3: directories dumped: 159481) -- Roy McMorran Systems Administrator MDI Biological Laboratory mcmor...@mdibl.org _______________________________________________ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu _______________________________________________ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu