Hi Dario, Are you saying you patched UHD to wait for AKS? Would you be able to provide a patch for this?
Thanks Will From: Dario Pennisi <da...@iptronix.com> Sent: Friday, July 13, 2018 1:49 AM To: Brophy, William <wbro...@lgsinnovations.com>; Keith k <keithko...@gmail.com> Cc: usrp-users@lists.ettus.com Subject: Re: [USRP-users] x300 unrecoverable timeouts Hi, We recently investigated a similar issue and have a clear understanding on what this comes from. Commands sent by PC to usrp device are responded with an acknowledge. Each command has a sequence number and is sent asynchronously. On the receiving side there is a check on acknowledge sequence number and if one is lost system will basically give up. The reason why packets can get lost is simply that communication is using udp which gives no guarantee on packet delivery and linux may drop packets, incoming or outgoing, at any time, even worse if you are passing through a switch instead of having a 1:1 link. We fixed this by patching code so that when sending commands we immediately wait for acknowledge and if it doesn't get back in time we retry. This of course does not allow pipelined command transfers but provides a reliable solution as trying to cache commands and resend them if ack is out of sequence won't work since resending commands at that point would change the order commands are executed and could be potentially very wrong. Would be great if someone from usrp could discuss this a bit further and come out with a better solution... Best regards, Dario Pennisi On Fri, Jul 13, 2018 at 2:29 AM +0200, "Keith k via USRP-users" <usrp-users@lists.ettus.com<mailto:usrp-users@lists.ettus.com>> wrote: Hello Will This sounds eerily similar to issues I've had using N200s. I basically found that working at high rates, using either STREAM_MODE_NUM_SAMPS_AND_DONE or using starts and stops was completely unusable. The system would go into an unrecoverable set of timeouts or overflows. I had to switch to using non interrupted continuous streaming and I had to make sure that the UHD threads were isolated to their own cpu cores in order to eliminate being preempted. This was the only way I could get stable runtime of the rx side during a long running application. On Thu, Jul 12, 2018 at 1:22 PM, Brophy, William via USRP-users <usrp-users@lists.ettus.com<mailto:usrp-users@lists.ettus.com>> wrote: While working to get coherent streams working, I ran into an issue using an x310 with two TwinRX daughterboards. The issue starts with a series of "ERROR_CODE_OVERFLOW (Out of sequence error)" errors. In an attempt to recover from that, the rx streamer is thrown out and recreated. The next stream errors change to "ERROR_CODE_TIMEOUT". Once in this state, all future streams end with this error. The x310 is connected over 10G ethernet. I managed to reproduce this error with an example program based off of “rx_multi_samples.cpp”. I had to make the following changes: 1. STREAM_MODE_START_CONTINUOUS is now used, ending the stream with STREAM_MODE_STOP_CONTINUOUS 2. The stream delay was set to .01 (mostly to speed up the rate the error would occur) 3. Multi stream commands (and stop commands) are issued in repetition (start, stop, start, stop, etc.) rather than just one long stream 4. Each stream uses a different sampling rate (alternates between 25Msps and 50Msps) 5. A small loop was added to the collect loop to slow down the thread enough to get overflow errors (but only sometimes, nothing crazy) 6. Once the out of sequence error is encountered 10 times in a row, the rx streamer is destroyed and re-created 7. Every stream command after step 5 ends in a timeout error It is also worth pointing out that this does not happen if the sample rate does not change. The out of sequence errors will still happen until the rx streamer is re-created, but the timeout errors do not occur after that… I attached the entire example program (with modifications) to this email. I started it with the args: rx_multi_samples --args addr=192.168.30.2 --subdev "A:0 A:1 B:0 B:1" --channels "0,1,2,3" --dilv --rate 50000000 --nsamps 8000000 Is there something wrong with how we are using the interface? Is there steps we can take to either avoid or recover from this state? I appreciate any help we can get… Will _______________________________________________ USRP-users mailing list USRP-users@lists.ettus.com<mailto:USRP-users@lists.ettus.com> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com -- -Keith Kotyk
_______________________________________________ USRP-users mailing list USRP-users@lists.ettus.com http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com