Guys, I'm really desperate:( Last week I replaced the Intel Dual NIC with a new one of the same kind (82546GB). For a week of low load (6kpps on average) I never saw a single error on the interfaces, but yesterday came the high load and it happened again. So I'm totally out of ideas.
The main problem remains: the minute I get high load (about 14-18kpps, 250000 states, 120Mb traffic), the em0 and em1 taskq processes lock on 100% each and the website becomes unresponsive or very slow. I also started to see errors on the interfaces again. The moment I release some of that load - everything is back to normal. Just to remind you, my hardware is IBM x335 server, 2 x Xeon 3.06GHz CPU, 2GB RAM, Intel Dual NIC PCI-X. By the way, the total CPU load I see at these situations is 40-50%. It's a SMP setup, so the taskq processes lock the 2 out of 4 CPUs available. Should I go on and mess with em drivers? What should I change there if so? Please, please help! Lenny. On Tue, Feb 10, 2009 at 7:49 PM, Lenny <five2one.le...@gmail.com> wrote: > > Hi, > > apparently my last few emails were only between me and Curtis, so I'm > attaching them all. > > > so as far as I understand my problem is whether with one of the cables > (which is less likely, as I see errors on both interfaces), whether with the > NIC itself? > > > Can anyone confirm that? > > > > Thank a lot, > > > Lenny. > > > > > Lenny wrote: > > > > I drew you a diagram you asked for: > http://rapidshare.com/files/195843186/file3.jpg.html > > Hope it makes things clearer, and also explains why I'm a bit skeptical > about the switch/cable issues... > > I ran the command you asked me to and these are the results. > seems OK, doesn't it? > > 2948-cis> show port counters 2/49 > > > Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize > ----- ---------- ---------- ---------- ---------- --------- > 2/49 - 0 0 0 0 > > Port Single-Col Multi-Coll Late-Coll Excess-Col Carri-Sen Runts > Giants > ----- ---------- ---------- ---------- ---------- --------- --------- > --------- > 2/49 0 0 0 0 0 > 0 0 > > Last-Time-Cleared > -------------------------- > Mon Aug 4 2008, 09:03:45 > > > > > 2948-cis> show port counters 2/50 > > Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize > ----- ---------- ---------- ---------- ---------- --------- > 2/50 - 0 0 0 0 > > Port Single-Col Multi-Coll Late-Coll Excess-Col Carri-Sen Runts > Giants > ----- ---------- ---------- ---------- ---------- --------- --------- > --------- > 2/50 0 0 0 0 0 > 0 0 > > Last-Time-Cleared > -------------------------- > Mon Aug 4 2008, 09:03:45 > > > Regarding the NICs - the Broadcom NICs are on PCI bus and I had CPU loaded > with interrupt, so I've never even had a chance to reach this kind of load > without hitting 80% CPU(even with device polling), on the other hand I don't > remember the blank spaces on RRD graphs. This is why I'm not throwing the > Intel Dual NIC out of the equation just yet. > > Curtis LaMasters wrote: > > A static route should be enough. If they are both plugged into the same > LAN you may want to enable the checkbox that says supress ARP messages. Do > you have a little diagram available of this setup? IP's do not have to be > included. I am not versed with CatOS but Google brought me to this > http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note09186a008010e9d5.shtmlthat > says you should do "show port counters". You've tested both Intel and > Broadcom nic's right? This would lead me to a switch or cable issue 100%. > Let me know what the Cisco switch says. Do you have anything plugged into > LAN? > > Curtis LaMasters > http://www.curtis-lamasters.com > http://www.builtnetworks.com > > > On Sun, Feb 8, 2009 at 3:15 PM, Lenny <five2one.le...@gmail.com> wrote: > >> another thing I just thought of: >> >> Is it possible I need a VLAN in my configuration or is the static route >> enough for this? >> >> >> >> Curtis LaMasters wrote: >> >> I would have to say bad hardware or cable, or speed/duplex issue. The >> traffic difference is probably due to blocked traffic. If you have cli >> access to the cisco switch run "show int | i errors" and report the output. >> >> Curtis LaMasters >> http://www.curtis-lamasters.com >> http://www.builtnetworks.com >> >> >> On Sun, Feb 8, 2009 at 2:54 PM, Lenny <five2one.le...@gmail.com> wrote: >> >>> Hi, >>> >>> >>> actually, it's a good point about the errors! >>> >>> I'm way far from "0". >>> >>> >>> WAN: >>> >>> Media 1000baseTX <full-duplex> >>> >>> In/out packets 2865480509/3025905907 (792.79 MB/2.11 GB) >>> >>> In/out errors 6041699/0 >>> Collisions 0 >>> >>> >>> OPT1: >>> >>> Media 1000baseTX <full-duplex> >>> In/out packets 3044923904/2862204565 (1.23 GB/688.88 MB) >>> In/out errors 13720077/0 >>> Collisions 0 >>> >>> >>> also makes me wonder about the difference 2.11GB against 1.23 GB. >>> >>> there are no other connected interfaces... where does it go? >>> >>> >>> anyway, please share your ideas. >>> >>> >>> thank you, >>> >>> >>> Lenny. >>> >>> >>> Curtis LaMasters wrote: >>> >>> I apologize, I was not stating that your network is overly complex, >>> simply that the solutions that the others were stating were more than I >>> think you needed. I have a total of 65 deployed pfSense solutions around >>> the midwest. Nearly any of them that are connected to Cisco have a >>> speed/duplex issue out of the box with autonegotiation. I only wanted to >>> make sure that the simple stuff was out of the way before you got too far >>> deep into customization where upgrades would prove to be dificult. I'm >>> going to asume that you have zero for both collisions and errors on your >>> interfaces on pf under "status>interfaces"? If that is the case and your >>> ISP says all is well, then I can only assume it's another issue require much >>> more complex solutions. >>> >>> Curtis LaMasters >>> http://www.curtis-lamasters.com >>> http://www.builtnetworks.com >>> >>> >>> On Sun, Feb 8, 2009 at 10:05 AM, <five2one.le...@gmail.com> wrote: >>> >>>> Hi, >>>> thanks for answering. >>>> >>>> Actually, the network has not changed and I don't think it's too complex >>>> either. >>>> And I do know that my kind of load is supposed to be handled with "out >>>> of the box" configuration. That's why I'm asking you and not starting >>>> tweaking the sysctl just yet. >>>> >>>> Regarding your suggestion, you're right - I'm not a Cisco guy, but I >>>> asked one of the guys at the ISP to check it for errors and he said >>>> everything's OK. >>>> Plus, when I bypassed the firewall, the Cisco switch was still in the >>>> game. >>>> It's set to auto negotiate and it seemed to be fine with Alteon, so I'd >>>> rather believe it's fine with pfSense too. >>>> >>>> thanks, >>>> >>>> Lenny. >>>> >>>> >>>> >