I  don't want to sound stupid or make this very complex network sound like
it has a simple issue, but the support on this jumped strait from "i have a
speed problem" to some failry complex stuff.  I have sites with the same
throughput needs without doing any type of debugging or CLI changes to the
firewalls.  Lenny, you said that you are connected to a Cisco switch on the
GBIC interfaces, right?  Have you statically set speed and duplex on those
interfaces as well as on PF.  I don't know what your skill level is on Cisco
but check the interface for errors, resets, CRC's and watchdogs.  Do the
same on your pfSense box under the interface diagnostics.  If all are 0's
then I'll shutup but if you have anything besides that, then please consider
these changes.

Curtis LaMasters
http://www.curtis-lamasters.com
http://www.builtnetworks.com


On Sun, Feb 8, 2009 at 7:22 AM, Lenny <five2one.le...@gmail.com> wrote:

> Hi,
>
> so after a long time of trying different things and some tests, I'm back to
> square one, but with some additional info.
> things I've done:
> -Replaced the server. It's also an IBM x335, but with 2 Xeon 3.06GHz now.
> 2GB RAM.
> The only thing that's left from the old one is the Dual Intel NIC. I know
> it may very well be the reason for failures, but before I go buy a new one,
> I wanted to try everything else.
> -I installed pfSense 1.2.2 and left the old configuration.
> -Tried it with and without polling, and with and without checksum
> offloading.
> Nothing helped.
> -I also bypassed the firewall and saw that without it everything works
> perfect.
>
> Now for the things I've noticed yesterday(we had a high load):
> -It was almost impossible to get through the firewall, good thing I had
> polling enabled, so I could ssh and see "top -S".
> I saw that em0 and em1 taskq were 100% CPU each.
> -The RRD graphs showed blank spaces as always in these situations.
> -At one point I noticed that the states came up as high as 997000 out of
> 1000000. So I increased the value to 2000000, but the second I did that it
> dropped to around 450000(weird or what?). Also, it's strange that I still
> have around 250000 states, even when the actual sessions number is near
> 20000(according to Alteon), isn't it supposed to be somewhere near
> 60000-80000 states? And I'm talking about 15 hours after the change.
>
> The load I'm talking about (that was yesterday) 18-20 kpps, around 150Mb/s
> traffic.
>
> I also started reading about "em taskq on freebsd" and I saw a couple of
> other guys having this problem. Those guys were advised to start tweaking
> sysctl and loader.conf. No success stories were published though. But before
> I do that, I was wondering if there is anything else I can do.
>
> The last, but definetely not least is that I realized that a static route
> was on the wrong interface.
> here's how:
>  mysetup goes like this.
>
> [squids]10.0.0.160/27
> <----10.0.0.161[alteon]192.168.5.2<------192.168.5.1[pfSense]11.11.11.11<-----Internet
>
> obviously the IPs are fictional.
> Now the route I had on the firewall is 10.0.0.160/27 through gateway
> 192.168.5.2, but it was on the WAN interface!
> Yesterday I changed the interface to OPT1, which is the one connected to
> the Alteon.
> But I won't be able to see the effect of it till Saturday( this is my
> biggest problem - I can only test it on Saturdays, cause this is when our
> website is loaded). Is there any chance that it was the solution to my
> problem?
>
> Sorry for the long post.
>
> Thanks,
>
> Lenny.
>
>
>
>
>
>
>
> On Sun, Dec 21, 2008 at 12:45 AM, Lenny <five2one.le...@gmail.com> wrote:
>
>> Hi,
>>
>>
>> I'm kind of desperate here, so please try to help me.
>>
>> Here's my problem:
>>
>> I have a setup in production (a very dynamic website).
>>
>> It consists of pfsense-->Alteon Load Balancer-->IBM Bladecenter(with a
>> Squids cluster on it).
>>
>> pfsense is installed on IBM x335 with 2 Xeon 2.4GHz, 2GB RAM, and Dual
>> Intel NIC PCI-X 1Gb.
>>
>> I'm connected with 1Gb to the ISP.
>>
>> The problem is that no matter what I do, I can't get more than 15kpps.
>>
>> After that I start to get a lot of packet loss.
>>
>> At first I was sure that the ISP has me on QoS, because I never saw
>> traffic going over a 100Mb/s,
>>
>> but then to convince me they downloaded some large files from my servers
>> and came up as high as 170Mb/s.
>>
>> So that one was out.
>>
>>
>> Next I changed the NICs (I used the onboard Broadcom at first) and it did
>> save me from the need to
>>
>> do Device Polling, and I have no more interrupt using half the CPU, but
>> not more than that.
>>
>> So I upgraded to 1.2.1 RC3. And still - the most I saw was 14kpps and 102
>> Mb/s.
>>
>> I have 700000 states entered, while I never saw it going over 250000 in
>> reality.
>>
>> The files transfered are rather small, 600KB being the largest.
>>
>> As for the Alteon, at first it was connected via another Broadcom fibre
>> NIC (Alteon only has 1 fibre uplink that's 1Gb),
>>
>> but now that I use an Intel Dual - I connected it to a Cisco Gbic and from
>> there to the Alteon by another fibre Gbic (don't judge me - I don't have a
>> giga switch). I know it's another possible trap, but right now I don't have
>> any other choice.
>>
>>
>> 99% of the traffic is port 80.
>>
>> I don't use NAT. All the IPs are public.
>>
>> WAN is static. LAN is not used. OPT1 is and also static.
>>
>> WAN and OPT1 are on different subnets of course. With additional static
>> route (the squids cluster is on the third subnet).
>>
>> CPU doesn't go over 30%. RAM is about 20-30. I'm talking peaks now.
>>
>> sysctl net.inet.ip.intr_queue_drops shows 0.
>>
>> I have no more than 15 rules while the first one should take care of most
>> of the traffic.
>>
>> I tried Aggressive mode with 1.2 and it didn't help. With the current
>> version I'm using the Normal mode.
>>
>> The biggest problem with our website is that people are starting to hit
>> refresh when the site is not functioning
>>
>> properly and it's kind of killing our web servers. Plus it adds traffic to
>> the firewall, thus loading it even more.
>>
>>
>> Another weird thing I noticed is that when looking at RRD graphs I
>> suddenly see a blank space, like this:
>>
>> ------  ------   --------. And it shows on all the graphs at the same
>> time.
>>
>> I've also noticed that it's about the same time as the load kills the
>> website. Must be related.
>>
>> Quality graphs are not showing. They did in the 1.2 version.
>>
>> SNMP is not enabled. DHCP is (it was on by default and I just left it
>> there).
>>
>>
>> With version 1.2 I had ACPI disabled(long boot), now I have it
>> enabled(seems to work fine with 1.2.1), although I should mention that I
>> never checked the ACPI at BIOS (I saw a post by someone who had this
>> problem).
>>
>>
>> I've read hundreds of topics here and on the forum and I saw that with my
>> setup I can handle a lot more than I do now.
>>
>> So what could be wrong?
>>
>>
>> Please help!
>>
>>
>> Thanks,
>>
>> Lenny.
>>
>>
>> P.S. Sorry for the size of this mail, but I figured I'd rather tell you
>> all the details ahead.
>>
>>
>

Reply via email to