Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
The following patch fixes the problem for me. Do we want to accept this patch and call it a day or continue investigating the source of the problem? Patch applies to 2.6.24.2, but doesn't apply to 2.6.25-rc. If everyone agrees that this is the right solution, I will resubmit with a proper subjec

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Update: Herbert's patch alters the arguments to alloc_skb_fclone() and skb_reserve() from within sk_stream_alloc_pskb(). This changes the skb_headroom() and skb_tailroom() of the returned skb. I decided to see if I could detect the precise point at which data corruption started to happen. The r

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Matt Carlson wrote: > Hi Tony. Can you give us the output of : > > sudo lspci -vvv - -s 03:01.0' > 03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15) Subsystem: Compaq Computer Corporation NC7770 Gigabit Server Adapter (PCI-X, 10/100/1000

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Herbert Xu wrote: > On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote: > >> Update: when I revert Herbert's patch in addition to applying your >> patch, the iSCSI performance goes back up to 115 MB/s again in both >> directions. So it looks like turnin

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Michael Chan wrote: > On Tue, 2008-02-19 at 17:14 -0500, Tony Battersby wrote: > > >> Update: when I revert Herbert's patch in addition to applying your >> patch, the iSCSI performance goes back up to 115 MB/s again in both >> directions. So it looks like tur

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby
Michael Chan wrote: > On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote: > >> iSCSI >> performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx >> with light tx, >> > > That's strange. The patch should only affect TX performance

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby
Michael Chan wrote: >> The SysKonnect NIC that does not exhibit this problem has a chip that >> says "BCM5411KQM" "TT0128 P2Q" and "56975E". > I think this is the 5700, but please send me the tg3 output that > identifies the chip and the revision. Something like this: > > eth2: Tigon3 [partno(BCM9

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby
Michael Chan wrote: > On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote: > > >> One consequence of Herbert's change is that the chip will see a >> different datastream. The initial skb->data linear area will be >> smaller, and the transition to the fragmented area of pages will be >> quicke

TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread Tony Battersby
11 dma_readq_full: 2188114 dma_read_prioq_full: 162588 tx_comp_queue_full: 0 ring_set_send_prod_index: 2901128 ring_status_update: 218885 nic_irqs: 146494 nic_avoided_irqs: 72391 nic_tx_threshold_hit: 103584 Tony Battersby Cybernetics -- To unsubscribe from this

Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled

2008-01-29 Thread Tony Battersby
Tony Battersby wrote: > "iperf -c 192.168.1.1" never finishes, but "iperf -c 192.168.2.1" does > finish. Press Ctrl-C to abort the hung iperf. Ping 192.168.1.1 does > not respond. Ping 192.168.2.1 does respond, but each ping has almost > exactly 1 second laten

Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled

2008-01-28 Thread Tony Battersby
Brandeburg, Jesse wrote: > make sure to disable the default Linux arp behavior for this kind of > test on PC3 by* > [EMAIL PROTECTED] echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter > [EMAIL PROTECTED] echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_filter > [EMAIL PROTECTED] echo 1 > /proc/sys/net/ipv4/

Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled

2008-01-28 Thread Tony Battersby
> What bus and chipset is in use on the systems with sky2? > I have seen problems when using PCI-X on AMD systems (documented in AMD > errata) > due to multiple outstanding transactions. Motherboard: SuperMicro PDSME Chipset: Intel E7230 Processor: Intel Pentium D 3.4 GHz (note: tried both SMP a

sky2: tx hang on dual-port Yukon XL when rx csum disabled

2008-01-28 Thread Tony Battersby
I also tried booting with "maxcpus=1 pci=nomsi", but that didn't affect the problem. As a temporary workaround, I will use ethtool to turn on rx checksumming and live with the "hw csum failure" messages, since they are better than network lockups. Let me know if I can be of

Re: BUG: sky2: hw csum failure with dual-port copper NIC on SMP

2007-11-14 Thread Tony Battersby
>> Do you want the test program I am using? It is a pretty basic >> send()/recv() program, ~650 lines of C. >> >> Tony >> >> > > Not really, iperf drives the problem fine. > > Thanks for the tip; I hadn't tried iperf before today. I can reproduce the problem with "iperf -s" on the system

Re: BUG: sky2: hw csum failure with dual-port copper NIC on SMP

2007-11-13 Thread Tony Battersby
Stephen Hemminger wrote: > I can reproduce the problem under load with only a single port on 2.6.23. > I haven't been able to reproduce it on 2.6.24-rc2 (latest) but that maybe > because of either insufficient stress or another bug fix correcting the > problem. There is an issue with Yukon XL upda

BUG: sky2: hw csum failure with dual-port copper NIC on SMP

2007-11-13 Thread Tony Battersby
I am getting "hw csum failure" messages with sky2. I have seen this problem reported elsewhere with a fibre NIC, but I am using a copper NIC. It seems to be triggered by SMP. It is easy to reproduce in 2.6.23. 2.6.24-rc2-git3 still has the problem, but it happens less frequently. To reproduce

Re: [PATCH] net: fix kernel_accept() error path

2007-10-04 Thread Tony Battersby
James Morris wrote: > On Thu, 4 Oct 2007, Tony Battersby wrote: > > >> If accept() returns an error, kernel_accept() releases the new socket >> but passes a pointer to the released socket back to the caller. Make it >> pass back NULL instead. >> >>

[PATCH] net: fix kernel_accept() error path

2007-10-04 Thread Tony Battersby
If accept() returns an error, kernel_accept() releases the new socket but passes a pointer to the released socket back to the caller. Make it pass back NULL instead. Signed-off-by: Tony Battersby <[EMAIL PROTECTED]> --- --- linux-2.6.23-rc9/net/socket.c.bak 2007-10-04 15:21:17.0