I'm setting up a new cell, and I think I've run into something that is either a bug or a piece of documentation that needs to be clarified (at http://www.openafs.org/pages/doc/QuickStartUnix/auqbg005.htm#HDRWQ80 steps 5-8)
The behavior is that after creating read-only replicas of root.cell and root.afs on the same server as the read-write replicas, /afs becomes inaccessible. At this point you can still use vos remove on the read-only copies and get back to normal, however if you reboot invoking the rc.afs script causes a system dump / crash / reboot, and also some server files to be corrupted (for example, local/BosConfig gets zero-byted). I couldn't figure out how to recover from that (the utilities all complained about something regarding symbols in /unix when I tried to use them - I can reproduce that bit of the bug if it's important, but I assumed it was just because the kernel extentions were not loaded), so did a clean reinstall. This is on AIX 5.3. ML4 plus most recent patches as of yesterday, and OpenAFS 1.4.1 rs_aix53 binaries as distributed from openafs.org. Hardware is a pSeries 570 DLPAR (virtual machine), 2GB RAM, 1 POWER5 processor, 64 bits. Below are more details on the problem; a pseudo-workflow of everything leading up to the problem, and then a demo of the problem. I also have core/dump files I could provide. # Make sure /vicepa exists # Make sure /usr/vice/cache is of type jfs DIST=/root/afs/afs-1.4.1/rs_aix53 chown -R 0.0 $DIST umask 022 mkdir /usr/afs mkdir /usr/vice mkdir /usr/vice/etc cd $DIST/root.client/usr/vice/etc cp -rp dkload /usr/vice/etc cp -p dkload/rc.afs /etc/rc.afs vi /etc/rc.afs chmod 755 /etc/rc.afs /etc/rc.afs cd $DIST/root.server/usr/afs cp -rp * /usr/afs /usr/afs/bin/bosserver -noauth & cd /usr/afs/bin MACH=afsdb1.dclark.us CELL=dclark.us ./bos setcellname $MACH $CELL -noauth ./bos listhosts $MACH -noauth ./bos create $MACH kaserver simple /usr/afs/bin/kaserver -cell $CELL -noauth ./bos create $MACH buserver simple /usr/afs/bin/buserver -cell $CELL -noauth ./bos create $MACH ptserver simple /usr/afs/bin/ptserver -cell $CELL -noauth ./bos create $MACH vlserver simple /usr/afs/bin/vlserver -cell $CELL -noauth printf "create afs ; create admin ; setfields admin -flags admin ; quit\n" ./kas -cell $CELL -noauth ./bos adduser $MACH admin -cell $CELL -noauth ./bos addkey $MACH -kvno 0 -cell $CELL -noauth ./bos listkeys $MACH -cell $CELL -noauth ./pts createuser -name admin -cell $CELL -noauth ./pts adduser admin system:administrators -cell $CELL -noauth ./pts membership admin -cell $CELL -noauth ./bos restart $MACH -all -cell $CELL -noauth ./bos create $MACH fs fs /usr/afs/bin/fileserver /usr/afs/bin/volserver /usr/af\ s/bin/salvager -cell $CELL -noauth ./bos status $MACH fs -long -noauth ./vos create $MACH vicepa root.afs -cell $CELL -noauth ./bos create $MACH upserver simple "/usr/afs/bin/upserver -crypt /usr/afs/etc -\ clear /usr/afs/bin" -cell $CELL -noauth ########################## cd $DIST/root.client/usr/vice/etc cp -p * /usr/vice/etc cp -rp C /usr/vice/etc cd /usr/vice/etc vi CellServDB echo "/afs:/usr/vice/cache:50000" > /usr/vice/etc/cacheinfo mkdir /afs printf "afs 4 none none # Needs to be in /etc/vfs" grep afs /etc/vfs vi /etc/rc.afs /usr/afs/bin/bos shutdown $MACH -wait -noauth ps auxw | grep bosserver cd / shutdown -r now ########################## /etc/rc.afs /usr/afs/bin/klog admin /usr/afs/bin/tokens /usr/afs/bin/bos status $MACH cd / /usr/afs/bin/fs checkvolumes /usr/afs/bin/fs setacl /afs system:anyuser rl /usr/afs/bin/vos create $MACH vicepa root.cell /usr/afs/bin/fs mkmount /afs/$CELL root.cell /usr/afs/bin/fs setacl /afs/$CELL system:anyuser rl cd /usr/afs/bin ./fs mkmount /afs/.${CELL} root.cell -rw ./vos addsite $MACH vicepa root.afs ./vos addsite $MACH vicepa root.cell ./fs examine /afs ./fs examine /afs/$CELL ################### # - Fine at this point - # ################### ./vos release root.afs ./vos release root.cell ./fs checkvolumes bash-3.00# ./fs checkvolumes All volumeID/name mappings checked. ###################### # - Broken at this point - # ##################### ./fs examine /afs ./fs examine /afs/$CELL bash-3.00# ./fs examine /afs fs: File '/afs' doesn't exist bash-3.00# ./fs examine /afs fs: File '/afs' doesn't exist bash-3.00# cd /afs bash: cd: /afs: A file or directory in the path name does not exist. bash-3.00# ls -l / | grep afs ls: 0653-341 The file /afs does not exist. bash-3.00# vos listvldb VLDB entries for all servers root.afs RWrite: 536870912 ROnly: 536870913 number of sites -> 2 server tiv570test.dclark.us partition /vicepa RW Site server tiv570test.dclark.us partition /vicepa RO Site root.cell RWrite: 536870915 ROnly: 536870916 number of sites -> 2 server tiv570test.dclark.us partition /vicepa RW Site server tiv570test.dclark.us partition /vicepa RO Site Total entries: 2 bash-3.00# vos remove $MACH vicepa -id 536870913 -cell $CELL Volume 536870913 on partition /vicepa server tiv570test.dclark.us delete\ d bash-3.00# vos remove $MACH vicepa -id 536870916 -cell $CELL Volume 536870916 on partition /vicepa server tiv570test.dclark.us delete\ d bash-3.00# fs examine /afs File /afs (536870912.1.1) contained in volume 536870912 Volume status for vid = 536870912 named root.afs Current disk quota is 5000 Current blocks used are 4 The partition has 426704764 blocks available out of 426770432 bash-3.00# fs examine /afs/$CELL File /afs/notesdev.ibm.com (536870915.1.1) contained in volume 536870915 Volume status for vid = 536870915 named root.cell Current disk quota is 5000 Current blocks used are 2 The partition has 426704764 blocks available out of 426770432 Things work again now... Here is the output of errpt -a: --------------------------------------------------------------------------- LABEL: CORE_DUMP IDENTIFIER: A63BEB70 Date/Time: Wed May 31 18:09:54 EDT 2006 Sequence Number: 145 Machine Id: 00CBDEEA4C00 Node Id: tiv570test Class: S Type: PERM Resource Name: SYSPROC Description SOFTWARE PROGRAM ABNORMALLY TERMINATED Probable Causes SOFTWARE PROGRAM User Causes USER GENERATED SIGNAL Recommended Actions CORRECT THEN RETRY Failure Causes SOFTWARE PROGRAM Recommended Actions RERUN THE APPLICATION PROGRAM IF PROBLEM PERSISTS THEN DO THE FOLLOWING CONTACT APPROPRIATE SERVICE REPRESENTATIVE Detail Data SIGNAL NUMBER 11 USER'S PROCESS ID: 311342 FILE SYSTEM SERIAL NUMBER 2 INODE NUMBER 65696 PROCESSOR ID 0 CORE FILE NAME /usr/afs/bin/core PROGRAM NAME bos STACK EXECUTION DISABLED 0 ADDITIONAL INFORMATION ?? ?? ?? Unable to generate symptom string. --------------------------------------------------------------------------- --------------------------------------------------------------------------- LABEL: DUMP_STATS IDENTIFIER: 67145A39 Date/Time: Wed May 31 17:46:49 EDT 2006 Sequence Number: 142 Machine Id: 00CBDEEA4C00 Node Id: tiv570test Class: S Type: UNKN Resource Name: SYSDUMP Description SYSTEM DUMP Probable Causes UNEXPECTED SYSTEM HALT User Causes SYSTEM DUMP REQUESTED BY USER Recommended Actions PERFORM PROBLEM DETERMINATION PROCEDURES Failure Causes UNEXPECTED SYSTEM HALT Recommended Actions PERFORM PROBLEM DETERMINATION PROCEDURES Detail Data DUMP DEVICE /dev/lg_dumplv DUMP SIZE 32018432 TIME Wed May 31 17:45:15 2006 DUMP TYPE (1 = PRIMARY, 2 = SECONDARY) 0 DUMP STATUS 0 ERROR CODE 0 DUMP INTEGRITY Compressed dump - Run dmpfmt with -c flag on dum p after uncompressing. FILE NAME PROCESSOR ID 0 --------------------------------------------------------------------------- LABEL: MINIDUMP_LOG IDENTIFIER: F48137AC Date/Time: Wed May 31 17:46:33 EDT 2006 Sequence Number: 141 Machine Id: 00CBDEEA4C00 Node Id: tiv570test Class: O Type: UNKN Resource Name: minidump Description COMPRESSED MINIMAL DUMP Probable Causes System dumped. Minimal Dump collected in Non-Volatile Memory. Recommended Actions PERFORM PROBLEM DETERMINATION PROCEDURES Detail Data Minidump Data: 4D32 039B 082E 0010 0035 003B 0030 0058 0000 0000 01E8 9000 0000 0000 13EF C1BA 0000 0003 4000 0000 447E 0E95 1737 A43F 0165 6E64 0074 6100 0041 0A31 8C72 3771 CE00 04C6 C400 6563 DDBA 0108 0040 0180 0944 FC38 5803 A0A1 C387 101B 3600 E000 C68C 1630 7060 84D1 8206 0B18 005E 9029 63E7 059B 335F C8D4 6903 878D 1D65 6345 A4C0 8801 2000 002C 44FC 3858 03A0 218E 8610 234A 6CD8 6980 9726 53B2 4CB1 D2A4 0988 8920 232A 0809 8040 4893 130D 0165 6300 4004 0863 66CE 9736 6FEA B8A1 B322 4690 2100 2246 2490 030B 1600 8686 3C69 0205 4080 1104 2402 A088 0504 3C89 2FE8 C901 2492 800E 6AA8 444A 0C20 B3A6 4814 F13C 02D8 E08C 151F 9B40 4306 9D06 69CB 301D E072 D284 7000 0C90 9639 B2F0 53BA 1318 1BA0 0A44 BE90 C782 9856 AE27 814A F4D6 CAD7 3B91 64CD D60C 8041 2488 804F 25BE 0541 C32D 0052 98EC 7641 22F2 9FDF 7F54 31B0 0B81 4D64 001F 84C2 463C BC00 4DDF BF54 3740 33C1 52A2 4B54 E82A 4B84 244B 6681 012C 4492 4041 4484 5D00 1000 C411 40D0 2042 850C 63DC 8021 9324 073B 050F 265C B8E2 06C4 9632 32A0 C3ED 7A77 6C1C 3579 6082 36E7 35EF 1846 68CB 0412 0294 1DE7 0C65 14F9 2D12 480A CD12 6FE0 0B50 A000 00C5 232B 5E14 2901 D295 9144 8DF2 5051 1941 6A00 03C4 4A44 8000 BC7E A0ED BD47 407C C3FC 504F 6520 6C20 9D48 FCF9 3751 5036 1922 5380 13BA 075F 51C3 0821 4B65 286C 805E 449D 0CE0 4513 150A 48E0 109E D4D7 0104 0DC8 14E2 8825 5E68 9411 24A8 C854 4D14 0E58 9431 2698 2052 0720 2490 DF7F 86CD 6000 9101 1CD2 0B24 8625 C101 0E91 D521 1491 8B29 9652 1C0B 8AF4 6144 0BD8 4413 9511 5949 0F96 5186 24A6 1D32 0510 402A C018 7688 2F00 44F6 0838 71EA B401 2474 F023 661A 0B3E 2091 0060 06BA 9897 321D 0043 3D67 8A95 A84D 4346 1417 00FE 41D0 594D 0888 2682 2042 9C56 A948 1498 C224 A7A4 7001 C010 8D8A 1540 0918 64E9 A508 2888 2044 6493 0102 2B65 3299 F0C0 7D13 E500 0B3A A396 BA15 0BF0 44D6 131F 810D 56D8 4C88 A1C7 9863 12FD F58F 9874 BC05 14A9 C512 26AC 4F76 8180 1CB2 8919 E643 638F F935 AB83 1251 6BE7 B07A 1191 EDB6 110D B08B 20E0 05D0 4332 78D8 04C2 B836 8544 AA5E 7CC9 15C0 1F31 C915 10BB 0101 1046 47FF 49CB 6C44 A961 4088 A907 7C34 A54C 0FFC 35CF 3FFD B439 9393 04FF 1740 6521 2452 012F 1311 280B 2582 1A16 402E 1ACB 44EE 97DD D549 D305 D848 3B56 5967 DDAC 56CA 1215 A00F 1C35 D180 8C61 48A8 818B 610F E0A1 AA44 54D1 0CC7 C401 54D0 8BCC A829 1041 9A23 FB83 6400 C530 A272 311D 0340 039D 33AD 0C8C 7F90 C800 1E07 5F64 2902 088F DA24 8400 7784 9106 1D70 C8F1 C618 2BD0 105D 4D04 7042 4EDD 77E7 BD77 DFBE 01CE 091C 868F 51C6 4272 E4D1 D0D2 13C4 420F 79E6 6D09 800C 3665 4525 3C1F C8E8 CB1E 7070 785F 0C32 A590 031F 6873 2E53 9722 5195 5344 A044 A0C7 7A5A 86AE E316 BEF8 710B 8736 D726 3850 8BF2 2C11 E832 1AE3 8134 A647 64DE 4C47 9453 AF4C C8EF 6E4C 0857 4564 5FF0 4456 1F9F 3122 6C51 1F04 4B53 B9AF CE39 4794 --------------------------------------------------------------------------- LABEL: DSI_PROC IDENTIFIER: 9D035E4D Date/Time: Wed May 31 17:45:15 EDT 2006 Sequence Number: 140 Machine Id: 00CBDEEA4C00 Node Id: tiv570test Class: S Type: PERM Resource Name: SYSVMM Description DATA STORAGE INTERRUPT, PROCESSOR Probable Causes SOFTWARE PROGRAM Failure Causes SOFTWARE PROGRAM Recommended Actions IF PROBLEM PERSISTS THEN DO THE FOLLOWING CONTACT APPROPRIATE SERVICE REPRESENTATIVE Detail Data DATA STORAGE INTERRUPT STATUS REGISTER 0000 0000 0000 0000 SEGMENT REGISTER, SEGREG 0A00 0000 0000 0000 DATA STORAGE INTERRUPT ADDRESS REGISTER 0000 0400 0000 0000 EXVAL 0000 0004 0000 0000 --------------------------------------------------------------------------- _______________________________________________ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info