While trying to investigate one problem[1] I ran into another. Seemingly
unrelated in every way other than the OS rev.

  titan pcn: [ID 298640 kern.info] NOTICE: pcn: rx ring broken?

This machine is 32-bit and not at all modern. It has AMD PCNet ethernet NICs
which seem to always get the very shoddy pcn driver. Even the man page for
pcn(7D) seems to suggest there have been problems in the past ...

  Known Problems and Limitations
       o  Occasional data corruption has occurred  when  pcn  and
          pcscsi  drivers in HP Vectra XU 5/90 and Compaq Deskpro
          XL systems are used under high network and  SCSI loads.
          These  drivers  do  not  perform  well  in a production
          server. A possible workaround is to  disable  the   pcn
          device  with  the system BIOS and use a separate add-in
          network interface.

       o  The Solaris pcn driver does not support IRQ 4.

None the less, I have a machine which is the very difinition of stable
workhorse. It is a HP Kayak XU with 768MB of RAM and it refuses to die. Sort
of like my Sparc20 in that regard. It is a great test machine for lowest
common denominator type things and it also has a multitude ( 5 ) of SCSI bus
adapters in it as well as IDE and USB etc etc.

It always gets saddled with the pcn driver. It has been that way since
Solaris 8 at least. Thus :

# modinfo | grep pcn
142 f9b38000   4898  10   1  pcn (PC-Net (Generic) 1.50)

That thing is the problem.

I set moddebug to 0xe0000000 and kmem_flags to 0xf and booted via the serial
console. A long and verbose process let me tell you. I tried to unplumb the
pcnX interfaces and modunload that module and then reload it .. to see what
it had to say for itself.

Not much at it turns out.

If I tried to do any sort of work with the interfaces I saw nothing bug
hangtime and a message :

Feb  4 20:56:37 titan pcn: [ID 298640 kern.info] NOTICE: pcn: rx ring broken?

...then no more packets flowed. No throughput. Not even a ping.

I pulled that thing ( the driver, not the NICs ) out of there and then
installed Masayuki Murayama's ae driver ( under a BSD License ) which works
flawlessly :

see http://homepage2.nifty.com/mrym3/taiyodo/eng/

# modinfo | grep pcnet
140 f9ac6000   c05c 236   1  ae (pcnet driver v2.6.0)

With that in place everything works flawlessly again.

I think that we need a better driver, and an open source one for future
OpenSolaris releases ( Project Indiana ) because pcn will not fly very well
.. or at all :

See BUG 66
http://defect.opensolaris.org/bz/show_bug.cgi?id=66

Any thoughts on the ae driver ?

-
Dennis Clarke

[1] the original problem was something I could not reproduce.

http://mail.opensolaris.org/pipermail/opensolaris-discuss/2008-February/038465.html

kernel memory allocator: buffer freed to wrong cache
buffer was allocated from kmem_alloc_320,
caller attempting free to kmem_alloc_8.
buffer=d36f2400  bufctl=0  cache: kmem_alloc_8

panic[cpu1]/thread=d3738de0: kernel heap corruption detected

d3738cc0 genunix:kmem_error+416 (6, d3036030, d36f24)
d3738cf0 genunix:kmem_slab_free+21a (d3036030, d36f2400)
d3738d20 genunix:kmem_magazine_destroy+b9 (d3036030, d459ed80,)
d3738d58 genunix:kmem_cache_magazine_purge+8d (d3036030)
d3738d78 genunix:kmem_cache_magazine_resize+23 (d3036030, 0, 0, 0, )
d3738dc8 genunix:taskq_thread+176 (d36d8f08, 0)
d3738dd8 unix:thread_start+8 ()

syncing file systems... done

_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to