[USRP-users] Re: Replay Block Command Rate Bottleneck at High PRF

Brian Padalino Tue, 19 Aug 2025 08:08:36 -0700

On Tue, Aug 19, 2025 at 4:46 AM Arthur Gerst via USRP-users <
usrp-users@lists.ettus.com> wrote:


> Hi,
>
>
>
> My goal is to transmit a stored signal at pulse repetition frequencies
> (PRF) in the multi-kHz range with the x440. I'm working with a modified
> CG_1600 FPGA (using 100 Gbit Ethernet and 1600 MHz bandwidth) where I
> incorporated the Replay Block.
>
>
>
> From my tests, the Replay Block itself works as expected. I’ve used the
> rfnoc_replay_samples_from_file example to verify both continuous streaming
> and finite sample replay, confirming the output of the radio with a
> spectrum analyzer. I’m able to output the desired signal at 2Gsps, with the
> 40000 samples that I put in the replay block (for now it’s a fix value).
> But I want to have a duty cycle of 0.1 without zero padding the rest of my
> signal.
>
>
>
> However, due to the limited depth of the Replay Block's command queue, I
> must issue a stream timed command for each pulse and at least at the
> desired PRF. If the queue is full, attempting to issue a new command
> results in an error. To avoid using try/catch for this, I’ve added a
> function that peeks at REG_PLAY_CMD_FIFO_SPACE_ADDR and sends the next
> pulse only if space is available. This loop runs in its own thread.
>
>
>
> Despite all of this, I'm seeing many 'L' characters printed pointing out
> that incoming stream commands are considered late by the Replay Block.
>
>
>
> Based on my experience with UHD and Ettus hardware, I suspect the
> bottleneck could lie in one of the following areas:
>
>    - The host cannot issue commands quickly enough
>    - The transport layer is unable to sustain the required command rate
>    (NIC/sockets…)
>    - The Replay Block cannot ingest commands at the desired speed
>
>
>
> I analysed execution times in the loop and issue_stream_command()
> function. They appeared fast enough for my application (tens of
> microseconds). However, the command queue remains nearly empty. Even after
> optimizing the issuing logic by eliminating unnecessary operations and
> reducing loop time by half, the max issuing commands the system can
> generate/consume remains the same (around 1250 commands per seconds).
>
>
>
> This leads me to believe the transport layer may be the limiting factor.
> One suspicion is that if TCP is used for command transmission, the Nagle
> algorithm might be delaying packets by attempting to concatenate them. If
> that’s the case, is there a way to explicitly disable it (e.g. using
> TCP_NODELAY)? Could you maybe briefly explain how the transport layer works
> for RFNoC blocks?
>
>
>
> Alternatively, the issue could come from an entirely different source, and
> I would appreciate any insights or recommendations on how to proceed.
>
>
>
> Here are some code snippets/contexts:
>
>
>
> *In the replay_block_control  :*
>
>     uint64_t get_fifo_fullness(const size_t port){
>
>         return _replay_reg_iface.peek32(REG_PLAY_CMD_FIFO_SPACE_ADDR,
> port); }
>
>
>
> *My loop (might have some issues due to copy/pasting ; added variables for
> better context):*
>
> // Set up the static part of the stream command
>
> uhd::stream_cmd_t
> stream_cmd(uhd::stream_cmd_t::STREAM_MODE_NUM_SAMPS_AND_DONE);
>
> stream_cmd.num_samps = 40000; //Is a fix value for now
>
> stream_cmd.stream_now = false;
>
>
>
> const double prf = 10000; //1000 works but 1500 doesn’t
>
> const auto prf_interval_us = static_cast<int>(1e6 / prf);
>
> const double prf_interval = 1.0 / prf;
>
>
>
> // Loop through each "second" of pulses
>
> for (size_t second = 0; second < number_pulses / prf; ++second) {
>
>     LOG_F(INFO, "Sending commands for second %lu", second);
>
>
>
>     // Loop through pulses within this second
>
>     for (size_t pulse = 0; pulse < prf; ++pulse) {
>
>         // Calculate the scheduled time for this pulse
>
>         stream_cmd.time_spec = uhd::time_spec_t(start_time + second +
> pulse * prf_interval);
>
>
>
>         // Query the FIFO space before sending
>
>         uint32_t fullness = m_streamer.block->get_fifo_fullness(0);
>
>
>
>         if (fullness != 0) {
>
>             // FIFO has space — issue the command
>
>             m_streamer.block->issue_stream_cmd(stream_cmd,
> m_streamer.channel);
>
>             ++n_total_pulses;
>
>
>
>         } else {
>
>             // FIFO is full — back off briefly and retry this pulse
>
>
> std::this_thread::sleep_for(std::chrono::microseconds(prf_interval_us));
>
>             --pulse;  // Retry current pulse
>
>         }
>
>         // Exit early if total pulse count reached
>
>         if (n_total_pulses >= number_pulses) {
>
>             break;
>
>         }
>
>     }
>
> }
>
> LOG_F(INFO, "Finished sending pulses!");
>
>
>
> *Software infos :*
>
> UHD 4.8.0 commit 308126a479ca19dfaebfe4784b375e608788d763
>
> Ubuntu 22.04 jammy
>
>
>
> *Hardware infos :*
>
> CPU Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz
>
> For Data Plane : Ethernet controller: Intel Corporation Ethernet
> Controller E810-C (not used for this part)
>
> For Command : TP-LINK Adaptateur Ethernet Gigabit USB 3.0 (UE306) -> *Cannot
> control the coalescing settings, maybe there is something also here?*
>

Something to note that I wasn't appreciative of before working with the
x440 is that you do not need the 1Gbe PS ethernet connected at all with the
x440. All your UHD operations can go over the 100G connection - even
ssh-ing into the PS side of things.

If you're willing to modify the HDL and rebuild the image, another thing to
note is you could make the FIFO deeper if you want. It's defined here':


https://github.com/EttusResearch/uhd/blob/9ab7d07b37a81f86f5d2fe04fb49ebd259794125/fpga/usrp3/lib/rfnoc/blocks/rfnoc_block_replay/axis_replay.v#L529

axi_fifo_short is a 32 entry SRL based FIFO. If you changed the
instantiation to an axi_fifo (like the one on line 554) with a SIZE
parameter (note it's the log2 of the size) of 10 then you'd get 1024
entries and utilize some BRAM but not a ton. Note you'll need to change the
fullness register readback logic as well, but that seems easy enough to do.

Lastly, and maybe I am reading your loop wrong, but is there a reason why
you aren't using the space you just read to know how many entries you can
shove into the command FIFO? It seems like if you read that you have 20
spaces, you should be able to quickly send 20 of those commands knowing
they won't get lost, and then come back to the start of the loop checking
for fullness again. Otherwise, you're sending out a request packet for the
register information, waiting for that to come back, then sending out a
single timed command packet, waiting for the response acknowledgement for
that, then repeating.

Hopefully these are useful insights.

Good luck.

Brian

_______________________________________________
USRP-users mailing list -- usrp-users@lists.ettus.com
To unsubscribe send an email to usrp-users-le...@lists.ettus.com

[USRP-users] Re: Replay Block Command Rate Bottleneck at High PRF

Reply via email to