Re: [USRP-users] x300 unrecoverable timeouts

Brophy, William via USRP-users Tue, 17 Jul 2018 05:05:54 -0700

Hi Dario,

Are you saying you patched UHD to wait for AKS? Would you be able to provide a 
patch for this?

Thanks
Will

From: Dario Pennisi <da...@iptronix.com>
Sent: Friday, July 13, 2018 1:49 AM
To: Brophy, William <wbro...@lgsinnovations.com>; Keith k <keithko...@gmail.com>
Cc: usrp-users@lists.ettus.com
Subject: Re: [USRP-users] x300 unrecoverable timeouts

Hi,
We recently investigated a similar issue and have a clear understanding on what 
this comes from.
Commands sent by PC to usrp device are responded with an acknowledge. Each 
command has a sequence number and is sent asynchronously. On the receiving side 
there is a check on acknowledge sequence number and if one is lost system will 
basically give up. The reason why packets can get lost is simply that 
communication is using udp which gives no guarantee on packet delivery and 
linux may drop packets, incoming or outgoing, at any time, even worse if you 
are passing through a switch instead of having a 1:1 link.
We fixed this by patching code so that when sending commands we immediately 
wait for acknowledge and if it doesn't get back in time we retry. This of 
course does not allow pipelined command transfers but provides a reliable 
solution as trying to cache commands and resend them if ack is out of sequence 
won't work since resending commands at that point would change the order 
commands are executed and could be potentially very wrong.
Would be great if someone from usrp could discuss this a bit further and come 
out with a better solution...
Best regards,
Dario Pennisi

On Fri, Jul 13, 2018 at 2:29 AM +0200, "Keith k via USRP-users" 
<usrp-users@lists.ettus.com<mailto:usrp-users@lists.ettus.com>> wrote:
Hello Will
This sounds eerily similar to issues I've had using N200s. I basically found 
that working at high rates, using either STREAM_MODE_NUM_SAMPS_AND_DONE or 
using starts and stops was completely unusable. The system would go into an 
unrecoverable set of timeouts or overflows. I had to switch to using non 
interrupted continuous streaming and I had to make sure that the UHD threads 
were isolated to their own cpu cores in order to eliminate being preempted. 
This was the only way I could get stable runtime of the rx side during a long 
running application.

On Thu, Jul 12, 2018 at 1:22 PM, Brophy, William via USRP-users 
<usrp-users@lists.ettus.com<mailto:usrp-users@lists.ettus.com>> wrote:
While working to get coherent streams working, I ran into an issue using an 
x310 with two TwinRX daughterboards.
The issue starts with a series of "ERROR_CODE_OVERFLOW (Out of sequence error)" 
errors. In an attempt to recover from that, the rx streamer is thrown out and 
recreated. The next stream errors change to "ERROR_CODE_TIMEOUT". Once in this 
state, all future streams end with this error.
The x310 is connected over 10G ethernet.

I managed to reproduce this error with an example program based off of 
“rx_multi_samples.cpp”. I had to make the following changes:

  1.  STREAM_MODE_START_CONTINUOUS is now used, ending the stream with 
STREAM_MODE_STOP_CONTINUOUS
  2.  The stream delay was set to .01 (mostly to speed up the rate the error 
would occur)
  3.  Multi stream commands (and stop commands) are issued in repetition 
(start, stop, start, stop, etc.) rather than just one long stream
  4.  Each stream uses a different sampling rate (alternates between 25Msps and 
50Msps)
  5.  A small loop was added to the collect loop to slow down the thread enough 
to get overflow errors (but only sometimes, nothing crazy)
  6.  Once the out of sequence error is encountered 10 times in a row, the rx 
streamer is destroyed and re-created
  7.  Every stream command after step 5 ends in a timeout error

It is also worth pointing out that this does not happen if the sample rate does 
not change. The out of sequence errors will still happen until the rx streamer 
is re-created, but the timeout errors do not occur after that…

I attached the entire example program (with modifications) to this email.
I started it with the args:
rx_multi_samples --args addr=192.168.30.2 --subdev "A:0 A:1 B:0 B:1" --channels 
"0,1,2,3" --dilv --rate 50000000 --nsamps 8000000

Is there something wrong with how we are using the interface? Is there steps we 
can take to either avoid or recover from this state?

I appreciate any help we can get…

Will

_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com<mailto:USRP-users@lists.ettus.com>
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

--
-Keith Kotyk

_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Re: [USRP-users] x300 unrecoverable timeouts

Reply via email to