I tried using the packet_resizer block which improved things a little
bit but not significantly. I figured that as soon as the difference
between the spp (coming from the Radio) and the set packet size in the
resizer block is too large, I get overruns again. I increased the
STR_SINK_FIFOSIZE of the packet_resizer from 11 to 14 and now am able to
run [Radio, spp=64] -> [Resizer, pkt_size=6000] -> [FIFO] -> [Null Sink]
without problems. However, when trying pkt_size 7000 I get overruns
again and same goes for any RFNoC blocks connected in between the Radio
and Resizer blocks using spp=64 .
By outputting the UHD trace messages and digging into some of its code,
I found out that the RFNoC flow control is configured in such a way that
it will send at least one ACK for every two packets on the bus (every
packet for smaller input FIFO or larger pkt_size).
See /uhd/host/lib/rfnoc/graph_impl.cpp:
> // On the same crossbar, use lots of FC packets
> size_t pkts_per_ack = std::min(
> uhd::rfnoc::DEFAULT_FC_XBAR_PKTS_PER_ACK,
> buf_size_pkts - 1
> );
DEFAULT_FC_XBAR_PKTS_PER_ACK is set to 2 in constants.hpp.
From my understanding we should really take the max here instead of the
min. We already calculated buf_size_pkts based on the input FIFO size of
the next downstream block. Do I miss something here?
Furthermore, UHD assumes the maximum packet size (=8000) for the Radio
block although I set spp=64. I will try next week to also set the
pkt_size in the Radio block stream args explicitly or (if that does not
help) in its port definition in the UHD block description .xml file.
I am not absolutely sure, if the FC is the only reason for my problems
but it definitely increase the amount of ACK messages for smaller spp.
Another workaround for me would probably to increase the
STR_SINK_FIFOSIZE for all the RFNoC blocks I'm using.
Regarding the bus clk_rate: It is set to 166.67 MHz in rfnoc-devel at
the moment (=10.67 Gbps raw throughput). Calculating the additional
overhead from CHDR headers when using spp=64 I get 6.80 Gbps (compared
to the raw (un-packetized) 6.40 Gbps). From my understanding, the
crossbar should be capable of handling this amount since I have only one
stream direction through my blocks (please correct me if I'm wrong here).
Thanks for your ideas and feedback!
EJ Kreinar wrote:> Hi Sebastian,
> Do you think that it would suffice to change the packet size at my
last RFNoC block before the host? I will try out the already available
packet_resizer block tomorrow.
Yes, this is probably the easiest solution. But, if you're not opposed
to custom HDL, an alternate option could be to create a modified FFT
block that simply outputs an integer number of FFTs within a single packet.
So the question would be if RFNoC can handle passing packets with spp=64
at 200 MSps between RFNoC blocks
That's a good question... RFNoC blocks all share a crossbar, which runs
at a particular bus_clk rate, so there is a max throughput that the bus
can handle... Each sample on the crossbar is 8 bytes, so you get a total
throughput of bus_clk*8 bytes/second. There's also a header overhead of
16 bytes per packet (or 8 bytes if there's no timestamp).
I'm actually not sure what the current X310 bus_clk rate is set to... I
just noticed a recent commit that supposedly changes bus_clk to 187.5
MHz
(https://github.com/EttusResearch/fpga/commit/d08203f60d3460a170ad8b3550b478113b7c5968
<https://github.com/EttusResearch/fpga/commit/d08203f60d3460a170ad8b3550b478113b7c5968>).
So I'm not exactly clear what the bus_clk was set to before that, or on
the rfnoc-devel branch...
But unless I'm misunderstanding, having multiple RFNoC blocks running at
a full 200 Msps might saturate the bus? Is that correct?
EJ
On Thu, Mar 22, 2018 at 3:33 PM, Sebastian Leutner via USRP-users
<usrp-users@lists.ettus.com <mailto:usrp-users@lists.ettus.com>> wrote:
Hi all,
when working with RFNoC at 200 MSps on the X310
using 10GbE I
experience overruns when using less than 512 samples
per packet (spp).
A simple flow graph [RFNoC Radio] -> [RFNoC FIFO] ->
[Null sink] with
the spp stream arg set at the RFNoC Radio block
shows the following
network utilization:
spp | throughput [Gbps]
------------------------
1024 | 6.49
512 | 6.58
256 | 3.60
64 | 0.70
Although I understand that the total load will
increase a little bit
for smaller packets due to increased overhead
(headers) as seen from
spp=1024 to spp=512, I find it confusing that so
many packets are
dropped for spp <= 256.
Total goodput should be 200 MSps * 4 byte per sample
(sc16) = 800 MBps
= 6.40 Gbps.
Is RFNoC somehow limited to a certain number of
packets per second
(regardless of their size)?
Could this be resolved by increasing the
STR_SINK_FIFOSIZE noc_shell
parameter of any blocks connected to the RFNoC Radio?
I would like to use spp=64 because that is the size
of the RFNoC FFT I
want to use. I am using UHD
4.0.0.rfnoc-devel-409-gec9138eb.
Any help or ideas appreciated!
Best,
Sebastian
This is almost certainly an interrupt-rate issue having
to do with your
ethernet controller, and nothing to do with RFNoC, per se.
If you're on Linux, try:
ethtool --coalesce <device-name-here> adaptive-rx on
ethtool --coalesce <device-name-here> adaptive-tx on
Thanks Marcus for your quick response. Unfortunately, that
did not help. Also, `ethtool -c enp1s0f0` still reports
"Adaptive RX: off TX: off" afterwards. I also tried changing
`rx-usecs` which reported correctly but did not help either.
I am using Intel 82599ES 10-Gigabit SFI/SFP+ controller with
the driver ixgbe (version: 5.1.0-k) on Ubuntu 16.04.
Do you know anything else I could try?
Thanks,
Sebastian
The basic problem is that in order to achieve good performance
at very-high sample-rates, jumbo-frames are required, and using
a very small SPP implies
very small frames, which necessarily leads to poor ethernet
performance.
Do you actually need the FFT results to appear at the host at
"real-time" rates, or can you do an integrate-and-dump within
RFNoC, to reduce host side
traffic?
Yes, I need all the samples. Since it will be a full receiver
implementation in RFNoC the output to the host will be much less
than 6.40 Gbps but still a decent amount and definitely more than
the 0.7 Gbps I was able to achieve with spp=64.
Do you think that it would suffice to change the packet size at my
last RFNoC block before the host? I will try out the already
available packet_resizer block tomorrow.
So the question would be if RFNoC can handle passing packets with
spp=64 at 200 MSps between RFNoC blocks. If this is likely to be a
problem, I could try wrapping all my HDL code into one RFNoC block
and handle the packet resizing at input and output of this block.
However, I would like to avoid this step if possible.
Thanks for your help!
_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com <mailto:USRP-users@lists.ettus.com>
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
<http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com>
_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com