FYI, I can regularly hit 9.3 Gib/s with my Intel X520-DA2's and FreeBSD
10.1. Before 10.1 it was less.
I used to tweak the card settings, but now it's just stock. You may want to
check your settings, the Mellanox may just have better defaults for your
switch.
On Mon, Aug 17, 2015 at 6:41 AM,
keltezéssel, Adrian Chadd írta:
On 12 June 2015 at 10:57, Christopher Forgeron csforge...@gmail.com
wrote:
I agree it shouldn't run out of memory. Here's what mine does under
network
load, or rsync load:
2 0 9 1822M 1834M 0 0 0 014 8 0 0 22750 724
136119
0
before the network outage.
2015.06.12. 14:37 keltezéssel, Christopher Forgeron írta:
rsycn burns memory - I'd say you have a good chance you're running out of
mem before it's replenished.
For vmstat 5 - Don't run it on console. Connect via a second box with
ssh, and run it there - That way
is such a crap.
2015.06.08. 5:01 keltezéssel, Christopher Forgeron írta:
You know what helped me:
'vmstat 5'
Leave that running. If the last thing on the console after a crash/hang is
vmstat showing 8k of memory left, then you're in the same problem-park as
me.
My 10.1 96GiB RAM box is chewing ~8
that em driver is such a crap.
2015.06.08. 5:01 keltezéssel, Christopher Forgeron írta:
You know what helped me:
'vmstat 5'
Leave that running. If the last thing on the console after a crash/hang
is
vmstat showing 8k of memory left, then you're in the same problem-park
as
me.
My 10.1 96GiB
I agree it shouldn't run out of memory. Here's what mine does under network
load, or rsync load:
2 0 9 1822M 1834M 0 0 0 014 8 0 0 22750 724 136119
0 23 77
0 0 9 1822M 1823M 0 0 0 0 0 8 0 0 44317 347 138151
0 16 84
0 0 9 1822M 1761M 0 0
Thanks for keeping up on this Rick. TSO needs to be fixed on FreeBSD, but
it does not seem to be a hot-button topic.
I still suffer from TSO issues, and thus turn it off in a standard build -
but I noticed that machines that were VMWare 5.1-6.0, running the vmx
driver, they did not have problems
A few things:
1) How long before you have this behaviour?
2) What's the output of 'netstat -m' when you have the problem?
3) What is your MTU set to, and do you have TSO on or off?
On Thu, May 21, 2015 at 10:33 AM, Guy Helmer guy.hel...@gmail.com wrote:
I’ve noticed that there have been
I get these messages on every reboot, but I haven't seen it during runtime
yet.
What's your MTU set to? Do you have TSO on bce?
On Fri, May 15, 2015 at 8:50 AM, Dominic Blais dbl...@interplex.ca wrote:
Hi,
I see these logs prior to a server hang in userspace.
May 15 01:05:47 pppoe01
I'd go a step further and say it's the _exact_ same problem.
If you're using anything other than 4k clusters on a heavily loaded system,
you'll probably have issues.
It's not just the MTU - Case in point - I set my MTU to 4000, but since my
iSCSI block size is 8k, I noticed that I still had
Mark, did switching to a MTU of 1500 ever help?
I'm currently reliving a problem with this - I'm down to a MTU of 4000, but
I still see jumbo pages being allocated - I believe it's my iSCSI setup
(using 4k block size, which means the packet is bigger than 4k), but I'm
not sure where it's all
I've been using :
make buildkernel -DKERNFAST
which is quite fast compared to the regular buildkernel. There may be a
faster way yet.
On Thu, Aug 14, 2014 at 3:28 PM, Russell L. Carter rcar...@pinyon.org
wrote:
ps: a quick question about quickly building modules: Suppose I have
a fully
On Wed, Mar 26, 2014 at 9:31 PM, Rick Macklem rmack...@uoguelph.ca wrote:
ie. I've suggested:
ifp-if_hw_tsomax = min(32 * MCLBYTES - (ETHER_HDR_LEN +
ETHER_VLAN_ENCAP_LEN),
IP_MAXPACKET);
- I put the min() in just so it wouldn't break if MCLBYTES is increased
someday.
I like
On Wed, Mar 26, 2014 at 9:35 PM, Rick Macklem rmack...@uoguelph.ca wrote:
I've suggested in the other thread what you suggested in a recent
post...ie. to change the default, at least until the propagation
of driver set values is resolved.
rick
I wonder if we need to worry about
I'm quite sure the problem is on 9.2-RELEASE, not 9.1-RELEASE or earlier,
as a 9.2-STABLE from last year I have doesn't exhibit the problem. New
code in if.c at line 660 looks to be what is starting this, which makes me
wonder how TSO was being handled before 9.2.
I also like Rick's NFS patch
, the
problem still exists. Last night's tests never went above a packet of
65530. Now with lagg enabled, I'm seeing packets of 65543 within 5 minutes,
so we're already breaking.
On Tue, Mar 25, 2014 at 11:33 PM, Christopher Forgeron csforge...@gmail.com
wrote:
On Tue, Mar 25, 2014 at 8:21 PM
at 11:27 PM, Christopher Forgeron csforge...@gmail.com
wrote:
That's interesting. I see here in the r251296 commit Andre says :
Drivers can set ifp-if_hw_tsomax before calling ether_ifattach() to
change the limit.
I wonder if we add your same TSO patch to if_lagg.c before line 356's
Hi Simon,
Try checking out the 9.2 ixgbe tx queue hang' thread here, and see if it
applies to you.
On Tue, Mar 25, 2014 at 1:55 AM, k simon chio1...@gmail.com wrote:
Hi,Lists:
I have got lots of no buffer available on 10-stable with igb nic.
But em and bce works well. And I tried force
fix it for me, but of course that's not a solid
fix.
On Tue, Mar 25, 2014 at 9:16 AM, Markus Gebert
markus.geb...@hostpoint.chwrote:
On 25.03.2014, at 02:18, Rick Macklem rmack...@uoguelph.ca wrote:
Christopher Forgeron wrote:
This is regarding the TSO patch that Rick suggested
was on lagg, but have dropped down to a
single ix for testing.
Thanks for your continued help.
On Mon, Mar 24, 2014 at 10:04 PM, Rick Macklem rmack...@uoguelph.ca wrote:
Markus Gebert wrote:
On 24.03.2014, at 16:21, Christopher Forgeron csforge...@gmail.com
wrote:
This is regarding
Update:
I'm changing my mind, and I believe Rick's TSO patch is fixing things
(sorry). In looking at my notes, it's possible I had lagg on for those
tests. lagg does seem to negate the TSO patch in my case.
kernel.10stable_basicTSO_65535/
- IP_MAXPACKET = 65535;
- manually forced (no if
and carp as well, I'm not sure yet.
On Tue, Mar 25, 2014 at 8:16 PM, Rick Macklem rmack...@uoguelph.ca wrote:
Christopher Forgeron wrote:
Update:
I'm changing my mind, and I believe Rick's TSO patch is fixing
things
(sorry). In looking at my notes, it's possible I had lagg
On Tue, Mar 25, 2014 at 8:21 PM, Markus Gebert
markus.geb...@hostpoint.chwrote:
Is 65517 correct? With Ricks patch, I get this:
dev.ix.0.hw_tsomax: 65518
Perhaps a difference between 9.2 and 10 for one of the macros? My code is:
ifp-if_hw_tsomax = IP_MAXPACKET - (ETHER_HDR_LEN +
-ennvvXS greater 65495
I'll report in on this again once I have new info.
Thanks for reading.
On Mon, Mar 24, 2014 at 2:14 AM, Christopher Forgeron
csforge...@gmail.comwrote:
Hi,
I'll follow up more tomorrow, as it's late and I don't have time for
detail.
The basic TSO patch didn't work
This is regarding the TSO patch that Rick suggested earlier. (With many
thanks for his time and suggestion)
As I mentioned earlier, it did not fix the issue on a 10.0 system. It did
make it less of a problem on 9.2, but either way, I think it's not needed,
and shouldn't be considered as a patch
, Markus Gebert
markus.geb...@hostpoint.chwrote:
On 24.03.2014, at 16:21, Christopher Forgeron csforge...@gmail.com
wrote:
This is regarding the TSO patch that Rick suggested earlier. (With many
thanks for his time and suggestion)
As I mentioned earlier, it did not fix the issue on a 10.0
with networking.
On Mon, Mar 24, 2014 at 1:23 PM, Christopher Forgeron
csforge...@gmail.comwrote:
I think making hw_tsomax a sysctl would be a good patch to commit - It
could enable easy debugging/performance testing for the masses.
I'm curious to hear how your environment is working with a tso turned off
Hi Rick, very helpful as always.
On Sat, Mar 22, 2014 at 6:18 PM, Rick Macklem rmack...@uoguelph.ca wrote:
Christopher Forgeron wrote:
Well, you could try making if_hw_tsomax somewhat smaller. (I can't see
how the packet including ethernet header would be more than 64K with the
patch
On Sat, Mar 22, 2014 at 6:41 PM, Rick Macklem rmack...@uoguelph.ca wrote:
Christopher Forgeron wrote:
#if defined(INET) || defined(INET6)
/* Initialize to max value. */
if (ifp-if_hw_tsomax == 0)
ifp-if_hw_tsomax = IP_MAXPACKET;
KASSERT(ifp-if_hw_tsomax = IP_MAXPACKET
ifp
On Sat, Mar 22, 2014 at 11:58 PM, Rick Macklem rmack...@uoguelph.ca wrote:
Christopher Forgeron wrote:
Also should we not also subtract ETHER_VLAN_ENCAP_LEN from tsomax to
make sure VLANs fit?
I took a look and, yes, this does seem to be needed. It will only be
needed for the case
command I could use to watch these functions and compare
the new ip_len with ip-ip_len or other variables?
On Sun, Mar 23, 2014 at 12:25 PM, Christopher Forgeron csforge...@gmail.com
wrote:
On Sat, Mar 22, 2014 at 11:58 PM, Rick Macklem rmack...@uoguelph.cawrote:
Christopher Forgeron wrote
will resume tomorrow as I feel we're maddeningly
close. It's been ages since I had to do this much TCP, but I'm starting to
get the hang of it again.
Will resume tomorrow...
On Sun, Mar 23, 2014 at 9:47 PM, Rick Macklem rmack...@uoguelph.ca wrote:
Christopher Forgeron wrote
Status Update: Hopeful, but not done.
So the 9.2-STABLE ixgbe with Rick's TSO patch has been running all night
while iometer hammered away at it. It's got over 8 hours of test time on
it.
It's still running, the CPU queues are not clogged, and everything is
functional.
However, my
.
Markus: Do your systems show denied mbufs at boot like mine does?
Turning off TSO works for me, but at a performance hit.
I'll compile Rick's patch (and extra debugging) this morning and let you
know soon.
On Thu, Mar 20, 2014 at 11:47 PM, Christopher Forgeron csforge...@gmail.com
wrote:
BTW
Hi Markus,
Yes, we may have different problems, or perhaps the same problem is
manifesting itself in different ways in our systems.
Have you tried a 10.0-RELEASE system yet? If we were on the same OS
version, we could then compare system specs a bit deeper, and see what is
different. Perhaps
Ah, I understand the difficulties of testing production systems.
However, if you can make a spare tester of the same hardware, that's
perfect - And you can generate all the load you need with benchmark
software like iometer, large NFS copies, or perhaps a small replica of your
network. Synthetic
...@uoguelph.ca wrote:
Christopher Forgeron wrote:
On Thu, Mar 20, 2014 at 7:40 AM, Markus Gebert
markus.geb...@hostpoint.ch wrote:
Possible. We still see this on nfsclients only, but I'm not convinced
that nfs is the only trigger.
Since Christopher is getting a bunch
(Pardon me, for some reason my gmail is sending on my cut-n-pastes if I cr
down too fast)
First set of logs:
Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116
Mar 21 11:07:00 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542
Mar 21 11:07:00 SAN0 kernel: before pklen=65542
Markus,
I don't know why I didn't notice this before.. I copied your cpuset ping
verbatim, not realizing that I should be using 172.16.0.x as that's my
network on the ix's
On this tester box, 10.0.0.1 goes out a different interface, thus it never
reported back any problems.
Now that I've
2.4.8 works under 10.0
On Fri, Mar 21, 2014 at 1:40 PM, Markus Gebert
markus.geb...@hostpoint.chwrote:
On 21.03.2014, at 15:49, Christopher Forgeron csforge...@gmail.com
wrote:
However, if you can make a spare tester of the same hardware, that's
perfect - And you can generate all the load
Update:
I've noticed a fair number of differences in the ixgbe driver between 9.2
and 10.0-RELEASE, even though they have the same 2.5.15 version. Mostly
Netmap integration.
I've loaded up a 9.2-STABLE ixgbe driver from Dec 25th as it was handy (I
had to hack the source a bit since some #def's
Ah, I appreciate your efforts Rick - If you have any final parting hints,
please let me know. I'm opening up access from my IDE so I can look at this
with something a bit more advanced than the default vi.
For the record, I was printing the flags out as an unsigned long, so that
should be
Good point - I'm printing where Rick asked, in the 'before' printf
statement, which comes before the m = m_defrag(*m_headp, M_NOWAIT); command
in ixgbe_xmit
I'm going to be adding more printf's to the code to see if I can find
anything interesting, your suggestions would be welcome.
..and I
, 2014 at 11:47 PM, Christopher Forgeron
csforge...@gmail.com wrote:
BTW - I think this will end up being a TSO issue, not the patch that
Jack applied.
When I boot Jack's patch (MJUM9BYTES removal) this is what netstat -m
shows:
21489/2886/24375 mbufs in use (current
is once it's been
set, and I'll print it out in the trouble spot (ixgbe_xmit) as well.
On Fri, Mar 21, 2014 at 8:44 PM, Rick Macklem rmack...@uoguelph.ca wrote:
Christopher Forgeron wrote:
Hello all,
I ran Jack's ixgbe MJUM9BYTES removal patch, and let iometer hammer
away
move into the base
10.0 code tomorrow.
On Fri, Mar 21, 2014 at 8:44 PM, Rick Macklem rmack...@uoguelph.ca wrote:
Christopher Forgeron wrote:
Hello all,
I ran Jack's ixgbe MJUM9BYTES removal patch, and let iometer hammer
away at the NFS store overnight - But the problem is still
.
On Fri, Mar 21, 2014 at 10:25 PM, Christopher Forgeron csforge...@gmail.com
wrote:
It may be a little early, but I think that's it!
It's been running without error for nearly an hour - It's very rare it
would go this long under this much load.
I'm going to let it run longer, then abort
, Rick Macklem rmack...@uoguelph.ca wrote:
Christopher Forgeron wrote:
It may be a little early, but I think that's it!
It's been running without error for nearly an hour - It's very rare
it
would go this long under this much load.
I'm going to let it run longer, then abort and install
.
On Wed, Mar 19, 2014 at 6:08 PM, Johan Kooijman m...@johankooijman.comwrote:
Hey Christopher,
Can you paste the output of uname -a as a reply to this email? Thx!
On Wed, Mar 19, 2014 at 8:48 PM, Christopher Forgeron
csforge...@gmail.com wrote:
Hello,
I'm replying on the 9.2 ixgbe tx
On Thu, Mar 20, 2014 at 7:40 AM, Markus Gebert
markus.geb...@hostpoint.chwrote:
Possible. We still see this on nfsclients only, but I'm not convinced that
nfs is the only trigger.
Just to clarify, I'm experiencing this error with NFS, but also with iSCSI
- I turned off my NFS server in
Markus,
I just wanted to clarify what dtrace will output in a 'no-error'
situation. I'm seeing the following during a normal ping (no errors) on
ix0, or even on a non-problematic bge NIC:
On Thu, Mar 20, 2014 at 7:40 AM, Markus Gebert
markus.geb...@hostpoint.chwrote:
Also, if you have
, 2014 at 12:50 PM, Christopher Forgeron csforge...@gmail.com
wrote:
Markus,
I just wanted to clarify what dtrace will output in a 'no-error'
situation. I'm seeing the following during a normal ping (no errors) on
ix0, or even on a non-problematic bge NIC:
On Thu, Mar 20, 2014 at 7:40 AM
: after mbcnt=33 pklen=65542 actl=65
On Wed, Mar 19, 2014 at 11:29 PM, Rick Macklem rmack...@uoguelph.ca wrote:
Christopher Forgeron wrote:
Hello,
I can report this problem as well on 10.0-RELEASE.
I think it's the same as kern/183390?
I have two physically identical
BTW,
When I have the problem, this is what I see from netstat -m
4080/2956/7036/6127254 mbuf clusters in use (current/cache/total/max)
4080/2636 mbuf+clusters out of packet secondary zone in use (current/cache)
0/50/50/3063627 4k (page size) jumbo clusters in use
(current/cache/total/max)
Re: cpuset ping
I can report that I do not get any fails with this ping - I have screens of
failed flood pings on the ix0 nic, but these always pass (i have that
cpuset ping looping constantly).
I can't report about the dtrace yet, as I'm running Rick's ixgbe patch, and
there seems to be a .ko
I have found this:
http://lists.freebsd.org/pipermail/freebsd-net/2013-October/036955.html
I think what you're saying is that;
- a MTU of 9000 doesn't need to equal a 9k mbuf / jumbo cluster
- modern NIC drivers can gather 9000 bytes of data from various memory
locations
- The fact that I'm
Any recommendations on what to do? I'm experimenting with disabling TSO
right now, but it's too early to tell if it fixes my problem.
On my 9.2 box, we don't see this number climbing. With TSO off on 10.0, I
also see the number is not climbing.
I'd appreciate any links you may have so I can read
, Christopher Forgeron
csforge...@gmail.com wrote:
I have found this:
http://lists.freebsd.org/pipermail/freebsd-net/2013-October/036955.html
I think what you're saying is that;
- a MTU of 9000 doesn't need to equal a 9k mbuf / jumbo cluster
- modern NIC drivers can gather 9000 bytes of data from
! adapter-rx_mbuf_sz = MJUMPAGESIZE;
/* Prepare receive descriptors and buffers */
if (ixgbe_setup_receive_structures(adapter)) {
On Thu, Mar 20, 2014 at 3:12 PM, Christopher Forgeron
csforge...@gmail.com wrote:
Hi Jack,
I'm on ixgbe 2.5.15
I
.
Jack
On Thu, Mar 20, 2014 at 3:32 PM, Christopher Forgeron
csforge...@gmail.com wrote:
I agree, performance is noticeably worse with TSO off, but I thought it
would be a good step in troubleshooting. I'm glad you're a regular reader
of the list, so I don't have to settle for slow
that I have issues.
I'll be following up tomorrow with info on either outcome.
Thanks for your help.. your rusty networking is still better than mine. :-)
On Thu, Mar 20, 2014 at 11:13 PM, Rick Macklem rmack...@uoguelph.ca wrote:
Christopher Forgeron wrote:
Output from the patch you gave me
routines
Lastly, please note this link:
http://lists.freebsd.org/pipermail/freebsd-net/2012-October/033660.html
It's so old that I assume the TSO leak that he speaks of has been patched,
but perhaps not. More things to look into tomorrow.
On Thu, Mar 20, 2014 at 11:32 PM, Christopher Forgeron
Hello,
I can report this problem as well on 10.0-RELEASE.
I think it's the same as kern/183390?
I have two physically identical machines, one running 9.2-STABLE, and one
on 10.0-RELEASE.
My 10.0 machine used to be running 9.0-STABLE for over a year without any
problems.
I'm not
(Sorry for the formatting on that last message, that was weird)
Today I wanted to test the assertion that this is a NFS issue, since we all
seem to be running NFS.
I shut down my NFS daemon in rc.conf, configured the FreeBSD10 iSCSI ctld,
rebooted, and then ran all my tests exclusively from
Hello,
I'm replying on the 9.2 ixgbe tx queue hang thread, and wanted to tie in
here the fact that I was able to replicate this problem with iSCSI only, no
NFS.
I assume the conversation will continue on the other thread.
Johan Kooijman @ Sat Mar 1 08:55:14 UTC 2014
I think NFS is part of
65 matches
Mail list logo