While trying to investigate one problem[1] I ran into another. Seemingly unrelated in every way other than the OS rev.
titan pcn: [ID 298640 kern.info] NOTICE: pcn: rx ring broken? This machine is 32-bit and not at all modern. It has AMD PCNet ethernet NICs which seem to always get the very shoddy pcn driver. Even the man page for pcn(7D) seems to suggest there have been problems in the past ... Known Problems and Limitations o Occasional data corruption has occurred when pcn and pcscsi drivers in HP Vectra XU 5/90 and Compaq Deskpro XL systems are used under high network and SCSI loads. These drivers do not perform well in a production server. A possible workaround is to disable the pcn device with the system BIOS and use a separate add-in network interface. o The Solaris pcn driver does not support IRQ 4. None the less, I have a machine which is the very difinition of stable workhorse. It is a HP Kayak XU with 768MB of RAM and it refuses to die. Sort of like my Sparc20 in that regard. It is a great test machine for lowest common denominator type things and it also has a multitude ( 5 ) of SCSI bus adapters in it as well as IDE and USB etc etc. It always gets saddled with the pcn driver. It has been that way since Solaris 8 at least. Thus : # modinfo | grep pcn 142 f9b38000 4898 10 1 pcn (PC-Net (Generic) 1.50) That thing is the problem. I set moddebug to 0xe0000000 and kmem_flags to 0xf and booted via the serial console. A long and verbose process let me tell you. I tried to unplumb the pcnX interfaces and modunload that module and then reload it .. to see what it had to say for itself. Not much at it turns out. If I tried to do any sort of work with the interfaces I saw nothing bug hangtime and a message : Feb 4 20:56:37 titan pcn: [ID 298640 kern.info] NOTICE: pcn: rx ring broken? ...then no more packets flowed. No throughput. Not even a ping. I pulled that thing ( the driver, not the NICs ) out of there and then installed Masayuki Murayama's ae driver ( under a BSD License ) which works flawlessly : see http://homepage2.nifty.com/mrym3/taiyodo/eng/ # modinfo | grep pcnet 140 f9ac6000 c05c 236 1 ae (pcnet driver v2.6.0) With that in place everything works flawlessly again. I think that we need a better driver, and an open source one for future OpenSolaris releases ( Project Indiana ) because pcn will not fly very well .. or at all : See BUG 66 http://defect.opensolaris.org/bz/show_bug.cgi?id=66 Any thoughts on the ae driver ? - Dennis Clarke [1] the original problem was something I could not reproduce. http://mail.opensolaris.org/pipermail/opensolaris-discuss/2008-February/038465.html kernel memory allocator: buffer freed to wrong cache buffer was allocated from kmem_alloc_320, caller attempting free to kmem_alloc_8. buffer=d36f2400 bufctl=0 cache: kmem_alloc_8 panic[cpu1]/thread=d3738de0: kernel heap corruption detected d3738cc0 genunix:kmem_error+416 (6, d3036030, d36f24) d3738cf0 genunix:kmem_slab_free+21a (d3036030, d36f2400) d3738d20 genunix:kmem_magazine_destroy+b9 (d3036030, d459ed80,) d3738d58 genunix:kmem_cache_magazine_purge+8d (d3036030) d3738d78 genunix:kmem_cache_magazine_resize+23 (d3036030, 0, 0, 0, ) d3738dc8 genunix:taskq_thread+176 (d36d8f08, 0) d3738dd8 unix:thread_start+8 () syncing file systems... done _______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org