Re: [ath5k-devel] corrupted data

2010-06-10 Thread Bob Copeland
On Thu, Jun 10, 2010 at 01:04:19PM +0900, Bruno Randolf wrote: great! So the following patch takes care of the reset part. I only boot-tested it so far. (hmm, tasklet_disable() might actually do that indirectly, I think). i dont think so. it just waits for the tasklet to finish and

Re: [ath5k-devel] corrupted data

2010-06-09 Thread Bruno Randolf
On Sunday 06 June 2010 22:07:00 Nick Kossifidis wrote: We read-and-clear RXDP on hw_nic_reset (actually we do that for any pending register writes -we need to ensure that all pending register writes are complete by doing a DMA register read, docs suggest RXDP and that's what we are doing on

Re: [ath5k-devel] corrupted data

2010-06-09 Thread Bob Copeland
On Wed, Jun 9, 2010 at 3:40 AM, Bruno Randolf b...@einfach.org wrote: hey, i guess something like this must be happening... i think we have to avoid tasklets being run concurrently to a reset. so does this patch help? Heh, I made a similar patch this morning before I saw this. I think the

Re: [ath5k-devel] corrupted data

2010-06-09 Thread Robert Brown
I wrote some code to test my hunch. I was under the mistaken impression that each receive tasklet starts looking at receive buffers from the point where the hardware last copied data. That's not the case, as Bob mentioned in one of his emails. All receive tasklets start traversing the receive

Re: [ath5k-devel] corrupted data

2010-06-09 Thread Robert Brown
I don't know anything about work queues, but I did read some documentation about tasklets. Relevant info: tasklets always run on the CPU that schedules them, multiple tasklets can run at the same time on different CPUs. You folks probably know this already bob On Wed, Jun 9, 2010 at 3:07

Re: [ath5k-devel] corrupted data

2010-06-09 Thread Robert Brown
The wireshark experiment I ran earlier is worth considering again. Wireshark saw block 110413 sent, but the receive tasklet saw a stale block, 107223. In the data file, block 110413 appears where block 110412 should be in addition to appearing in its proper spot. The receive tasklet saw data for

Re: [ath5k-devel] corrupted data

2010-06-09 Thread Bruno Randolf
On Wednesday 09 June 2010 22:42:27 Bob Copeland wrote: On Wed, Jun 9, 2010 at 3:40 AM, Bruno Randolf b...@einfach.org wrote: hey, i guess something like this must be happening... i think we have to avoid tasklets being run concurrently to a reset. so does this patch help? Heh, I made

Re: [ath5k-devel] corrupted data

2010-06-09 Thread Nick Kossifidis
2010/6/10 Bruno Randolf b...@einfach.org: On Wednesday 09 June 2010 22:42:27 Bob Copeland wrote: On Wed, Jun 9, 2010 at 3:40 AM, Bruno Randolf b...@einfach.org wrote: hey, i guess something like this must be happening... i think we have to avoid tasklets being run concurrently to a reset.

Re: [ath5k-devel] corrupted data

2010-06-09 Thread Bob Copeland
On Thu, Jun 10, 2010 at 10:21:44AM +0900, Bruno Randolf wrote: hi bob! sounds like a good plan... do you have time to do that? Yeah, I started on this already, will probably have it in a day or two. could we improve background scanning? right now it will make us loose packets... One

Re: [ath5k-devel] corrupted data

2010-06-09 Thread Bruno Randolf
On Thursday 10 June 2010 12:51:07 Bob Copeland wrote: On Thu, Jun 10, 2010 at 10:21:44AM +0900, Bruno Randolf wrote: hi bob! sounds like a good plan... do you have time to do that? Yeah, I started on this already, will probably have it in a day or two. great! could we improve

Re: [ath5k-devel] corrupted data

2010-06-08 Thread Bob Copeland
On Sun, Jun 6, 2010 at 8:16 PM, Robert Brown robert.br...@gmail.com wrote: Sure, but I only see one place in ath5k_rxbuf_setup where it returns an error value (-ENOMEM).  I'm using a 5/18/2010 snapshot.  Should I grab a later version? Are there any cache issues when dealing with hardware

Re: [ath5k-devel] corrupted data

2010-06-08 Thread Bob Copeland
On Mon, Jun 7, 2010 at 10:20 AM, Robert Brown robert.br...@gmail.com wrote: I think corruption of received data can happen whenever ath5k_reset is called when there are pending instances of ath5k_tasklet_rx scheduled, but not yet executed. ath5k_reset calls ath5k_rx_start to initialize the

Re: [ath5k-devel] corrupted data

2010-06-07 Thread Robert Brown
I think corruption of received data can happen whenever ath5k_reset is called when there are pending instances of ath5k_tasklet_rx scheduled, but not yet executed. ath5k_reset calls ath5k_rx_start to initialize the rxbuf queue, then sets RXDP to the head of the queue and turns DMA back on. At

Re: [ath5k-devel] corrupted data

2010-06-06 Thread Bob Copeland
On Sat, Jun 5, 2010 at 3:16 PM, Robert Brown robert.br...@gmail.com wrote: I transferred 1 Gb of data twice while executing the scan command as above. I got 3 corrupted blocks on the first transfer and 2 on the second, which is more than I normally expect.  Usually, I have to transfer 2 or 3 Gb

Re: [ath5k-devel] corrupted data

2010-06-06 Thread Nick Kossifidis
2010/6/6 Nick Kossifidis mickfl...@gmail.com: 2010/6/5 Benoit Papillault benoit.papilla...@free.fr: My feeling is that both RXDP TXDP registers are not properly initialized after a reset causing duplicate in both directions. So I believe you should see duplicate on the TX side as well.

Re: [ath5k-devel] corrupted data

2010-06-06 Thread Benoit Papillault
We read-and-clear RXDP on hw_nic_reset (actually we do that for any pending register writes -we need to ensure that all pending register writes are complete by doing a DMA register read, docs suggest RXDP and that's what we are doing on nic_reset, we need to change that comment-) and restore

Re: [ath5k-devel] corrupted data

2010-06-06 Thread Robert Brown
Sure, but I only see one place in ath5k_rxbuf_setup where it returns an error value (-ENOMEM). I'm using a 5/18/2010 snapshot. Should I grab a later version? Are there any cache issues when dealing with hardware devices like the Atheros chip? For instance, ath5k_rxbuf_setup sets up descriptors

Re: [ath5k-devel] corrupted data

2010-06-05 Thread Benoit Papillault
Le 05/06/2010 14:01, Robert Brown a écrit : Comments below ... On Sat, Jun 5, 2010 at 4:34 AM, Benoit Papillault benoit.papilla...@free.fr mailto:benoit.papilla...@free.fr wrote: It helps for sure. But i'm not seeing in your log where do you receive the duplicate 110413 110414

Re: [ath5k-devel] corrupted data

2010-06-05 Thread Robert Brown
On Sat, Jun 5, 2010 at 8:29 AM, Benoit Papillault benoit.papilla...@free.fr wrote: Le 05/06/2010 14:01, Robert Brown a écrit : Comments below ... On Sat, Jun 5, 2010 at 4:34 AM, Benoit Papillault benoit.papilla...@free.fr mailto:benoit.papilla...@free.fr wrote: It helps for sure. But

Re: [ath5k-devel] corrupted data

2010-06-05 Thread Nick Kossifidis
2010/6/5 Robert Brown robert.br...@gmail.com: On Sat, Jun 5, 2010 at 8:29 AM, Benoit Papillault benoit.papilla...@free.fr wrote: Le 05/06/2010 14:01, Robert Brown a écrit : Comments below ... On Sat, Jun 5, 2010 at 4:34 AM, Benoit Papillault benoit.papilla...@free.fr

Re: [ath5k-devel] corrupted data

2010-06-05 Thread Robert Brown
I can try changing the DMA size. Can you give some motivation for the change? Are you hopeful that it will fix the problem? If so, what's the rationale? bob On Sat, Jun 5, 2010 at 2:56 PM, Nick Kossifidis mickfl...@gmail.com wrote: 2010/6/5 Robert Brown

Re: [ath5k-devel] corrupted data

2010-06-05 Thread Robert Brown
On Sat, Jun 5, 2010 at 4:34 AM, Benoit Papillault benoit.papilla...@free.fr wrote: It might be likely that the problem is caused by background scan done by network manager since this calls ath5k_reset. So, one way to trigger the problem more often is by doing more scanning. Could you try a

Re: [ath5k-devel] corrupted data

2010-06-05 Thread Benoit Papillault
Le 05/06/2010 14:50, Robert Brown a écrit : I think I have a likely explanation for the data corruption. It looks like the hardware is not transferring the data to place the driver expects. I ran Wireshark on a Macintosh portable while transferring a file, so I now know what packets the

Re: [ath5k-devel] corrupted data

2010-06-04 Thread Robert Brown
Here's a progress report on tracking down the data corruption bug. I created a 1 Gb file containing random data, except that every 1000 bytes the file contains followed by a block number in the range 0 to 99. I wrote some code to detect the pattern of Xs and print out

Re: [ath5k-devel] corrupted data

2010-06-04 Thread Robert Brown
Here's another instance of data corruption. One block containing labels 110413 110414 is duplicated in the output file. I've instrumented ath5k_tasklet_rx to print ATH followed by packet block information. Function __ieee80211_rx_handle_packet prints MACI. Function ieee80211_deliver_skb

Re: [ath5k-devel] corrupted data

2010-06-04 Thread Bob Copeland
On Fri, Jun 4, 2010 at 4:24 PM, Robert Brown robert.br...@gmail.com wrote: Here's another instance of data corruption.  One block containing labels 110413 110414 is duplicated in the output file. I've instrumented ath5k_tasklet_rx to print ATH followed by packet block information. Can you

Re: [ath5k-devel] corrupted data

2010-06-04 Thread Robert Brown
On Fri, Jun 4, 2010 at 4:31 PM, Bob Copeland m...@bobcopeland.com wrote: On Fri, Jun 4, 2010 at 4:24 PM, Robert Brown robert.br...@gmail.com wrote: Here's another instance of data corruption. One block containing labels 110413 110414 is duplicated in the output file. I've instrumented

Re: [ath5k-devel] corrupted data

2010-06-04 Thread Bob Copeland
On Fri, Jun 4, 2010 at 4:24 PM, Robert Brown robert.br...@gmail.com wrote: There seems to be stale data in the skb that ath5k_tacklet_rx passes to __ieee80211_rx_handle_packet.  Note that ieee80211_deliver_skb does not pass it along to netif_rx. It'd be interesting to know where mac80211 drops

Re: [ath5k-devel] corrupted data

2010-06-04 Thread Robert Brown
Answers interleaved ... On Fri, Jun 4, 2010 at 5:39 PM, Bob Copeland m...@bobcopeland.com wrote: On Fri, Jun 4, 2010 at 4:24 PM, Robert Brown robert.br...@gmail.com wrote: There seems to be stale data in the skb that ath5k_tacklet_rx passes to __ieee80211_rx_handle_packet. Note that

Re: [ath5k-devel] corrupted data

2010-06-02 Thread Bob Copeland
On Tue, Jun 01, 2010 at 12:32:52PM -0400, Robert Brown wrote: All the calls to ath5k_reset appear to be from ath5k_chan_set. Yes, I'm using Ubuntu 10.04, so there's a network manager running. bob (stating the obvious) These are no doubt from background scanning... -- Bob Copeland %%

Re: [ath5k-devel] corrupted data

2010-06-01 Thread Robert Brown
Yes, I'm using Ubuntu 10.04, so there's a network manager running. bob == On Mon, May 31, 2010 at 11:58 PM, Bruno Randolf b...@einfach.org wrote: On Tuesday 01 June 2010 06:30:52 Robert Brown wrote: All the calls to ath5k_reset appear to be from ath5k_chan_set. so that must come

Re: [ath5k-devel] corrupted data

2010-06-01 Thread Bruno Randolf
On Wednesday 02 June 2010 01:32:52 Robert Brown wrote: Yes, I'm using Ubuntu 10.04, so there's a network manager running. ok, 2 things to try, please: the patch i mentioned before (ASPM) and disabling network-manager. thanks to your information i found that ath5k_reset is not properly

Re: [ath5k-devel] corrupted data

2010-06-01 Thread Luis R. Rodriguez
On Tue, Jun 01, 2010 at 05:53:33PM -0700, Bruno Randolf wrote: On Wednesday 02 June 2010 01:32:52 Robert Brown wrote: Yes, I'm using Ubuntu 10.04, so there's a network manager running. ok, 2 things to try, please: the patch i mentioned before (ASPM) and disabling network-manager.

Re: [ath5k-devel] corrupted data

2010-05-31 Thread Robert Brown
On Sun, May 30, 2010 at 8:51 PM, Bruno Randolf b...@einfach.org wrote: On Saturday 29 May 2010 02:39:38 Robert Brown wrote: My repeated packet detection hack is not perfect and given how often it fires it's probably showing TCP retransmissions or something similar that's totally normal..

Re: [ath5k-devel] corrupted data

2010-05-31 Thread Robert Brown
All the calls to ath5k_reset appear to be from ath5k_chan_set. bob == On Sun, May 30, 2010 at 8:51 PM, Bruno Randolf b...@einfach.org wrote: On Saturday 29 May 2010 02:39:38 Robert Brown wrote: also i am wondering why you have so many resets? (a review of the reset code is

Re: [ath5k-devel] corrupted data

2010-05-31 Thread Bruno Randolf
On Tuesday 01 June 2010 06:30:52 Robert Brown wrote: All the calls to ath5k_reset appear to be from ath5k_chan_set. so that must come thru mac80211 and finally userspace. do you have anything like network-manager running? bruno ___ ath5k-devel

Re: [ath5k-devel] corrupted data

2010-05-30 Thread Bruno Randolf
On Saturday 29 May 2010 02:39:38 Robert Brown wrote: + #define HIST_SIZE 100 + unsigned char history[HIST_SIZE]; ! #define OFFSET 1000 ! if (skb-len OFFSET + HIST_SIZE) { ! unsigned char *data = skb-data + OFFSET; ! if (memcmp(sc-history, data, HIST_SIZE) == 0) { !

Re: [ath5k-devel] corrupted data

2010-05-28 Thread Bruno Randolf
On Friday 28 May 2010 14:49:12 Robert Brown wrote: I wasn't able to match up instances of data corruption with the repeated packets I was printing or with the device resets. Since the corruption doesn't seem to occur when ath5k_tasklet_rx sends the same data twice, something more

Re: [ath5k-devel] corrupted data

2010-05-28 Thread Bruno Randolf
On Friday 28 May 2010 00:14:18 Bob Copeland wrote: Well the invariant is supposed to be that we are always behind where the hardware is. If that holds, the hardware can't loop around and write into the buffer we are currently processing, because we'd have to add the buffer back to the

Re: [ath5k-devel] corrupted data

2010-05-27 Thread Bruno Randolf
On Thursday 27 May 2010 13:29:17 Robert Brown wrote: I tried more data transfer at home. The patch does not solve the problem. I added a debugging message to see if this break statement in the patch is ever executed: + if (ds-ds_link == bf-daddr) +

Re: [ath5k-devel] corrupted data

2010-05-27 Thread Roman Yepishev
On Thu, 2010-05-27 at 18:41 +0900, Bruno Randolf wrote: On Thursday 27 May 2010 17:02:12 Roman Yepishev wrote: I've been following this thread closely for some time as I have exactly the same experience with my Acer Aspire One (AR5001 card): it strikes me that in both cases it happens on

Re: [ath5k-devel] corrupted data

2010-05-27 Thread Robert Brown
Comments interleaved below On Thu, May 27, 2010 at 5:41 AM, Bruno Randolf b...@einfach.org wrote: On Thursday 27 May 2010 17:02:12 Roman Yepishev wrote: Usually this happens when a continuous block of data is transmitted, i.e. during deb packages updates via http. When corruption happens

Re: [ath5k-devel] corrupted data

2010-05-27 Thread Bob Copeland
On Thu, May 27, 2010 at 11:14 AM, Bob Copeland m...@bobcopeland.com wrote: On Wed, May 26, 2010 at 9:15 PM, Bruno Randolf b...@einfach.org wrote: At least the code for alpha -- I looked at the first no-op implementation :) has this comment:    After this call, reads by the cpu to the buffer

Re: [ath5k-devel] corrupted data

2010-05-27 Thread Benoit Papillault
Le 27/05/2010 17:14, Bob Copeland a écrit : That said... reset() rebuilds the buffer list and resets the sclink pointers etc. I'm not so sure it does it properly, though last time I reviewed it I didn't find anything. It worth to check it again. I guess RXDP are reset to NULL pointers

Re: [ath5k-devel] corrupted data

2010-05-27 Thread Robert Brown
I wasn't able to match up instances of data corruption with the repeated packets I was printing or with the device resets. Since the corruption doesn't seem to occur when ath5k_tasklet_rx sends the same data twice, something more interesting is happening. bob

Re: [ath5k-devel] corrupted data

2010-05-26 Thread Bob Copeland
On Tue, May 25, 2010 at 11:04 PM, Bruno Randolf b...@einfach.org wrote: please try this patch and tell us if it helped... bruno commit 0379d1e5a850f8e63832516af4245b9824d8623d Author: Bruno Randolf b...@einfach.org Date:   Wed May 26 11:58:11 2010 +0900    ath5k: better checks for rx

Re: [ath5k-devel] corrupted data

2010-05-26 Thread Robert Brown
Thanks very much for the patch suggestion. I ran some large scp commands today with the patch installed. Several died with error message: Corrupted MAC on input. so it's likely the patch has not fixed the problem. I'll run some more tests at home where I can use FTP and have control over the

Re: [ath5k-devel] corrupted data

2010-05-26 Thread Bruno Randolf
On Wednesday 26 May 2010 21:38:00 Bob Copeland wrote: + /* never process the self-linked entry at the end */ + if (ds-ds_link == bf-daddr) + break; + By this you're saying that the RXDP check just above is insufficient? i don't know. i

Re: [ath5k-devel] corrupted data

2010-05-25 Thread Bob Copeland
On Sun, May 23, 2010 at 11:49 AM, Robert Brown robert.br...@gmail.com wrote: I do think it's worthwhile to think about how a repeated block of data could get into the destination file, assuming something in the networking code is buggy.  TCP/IP should be verifying packet checksums, so isn't it

Re: [ath5k-devel] corrupted data

2010-05-25 Thread Robert Brown
I have not experienced data corruption when transferring data via ethernet. I seem to get one corruption every 2 to 3 Gb over wifi. The symptom is a block of data, roughly 1400 bytes in size, that appears twice in a row in the destination file. The first instance of the repeated data is

Re: [ath5k-devel] corrupted data

2010-05-25 Thread Bob Copeland
On Tue, May 25, 2010 at 1:40 PM, Robert Brown robert.br...@gmail.com wrote: The symptom is a block of data, roughly 1400 bytes in size, that appears twice in a row in the destination file.  The first instance of the repeated data is erroneous.  It does not match the source file.  The second

Re: [ath5k-devel] corrupted data

2010-05-25 Thread Bruno Randolf
please try this patch and tell us if it helped... bruno commit 0379d1e5a850f8e63832516af4245b9824d8623d Author: Bruno Randolf b...@einfach.org Date: Wed May 26 11:58:11 2010 +0900 ath5k: better checks for rx descriptor processing - add check not to process self-linked rx

Re: [ath5k-devel] corrupted data

2010-05-24 Thread Luis R. Rodriguez
On Sat, May 22, 2010 at 08:43:48PM -0700, Robert Brown wrote: It wasn't immediately apparent to me how to undo the Ubuntu backports option Luis suggested in his email, so I opted instead to download source code -- compat-wireless-2010-05-18. I couldn't bring myself to run make install

Re: [ath5k-devel] corrupted data

2010-05-23 Thread Bob Copeland
On Sat, May 22, 2010 at 11:43 PM, Robert Brown robert.br...@gmail.com wrote: What other experiments can I do that will help figure out the cause of this problem? Please show the data in the corruption, and also please enable slab/slub debugging or kmemcheck in your kernel. -- Bob Copeland %%

Re: [ath5k-devel] corrupted data

2010-05-23 Thread Robert Brown
Here's the output of cmp -l good bad for two files that experienced corruption in transfer. I'll investigate turning on the kernel debugging options you suggested. It looks like I may have to compile a kernel to do this, however. I'm not sure right now what's possible with a stock Ubuntu

Re: [ath5k-devel] corrupted data

2010-05-17 Thread Bruno Randolf
On Tuesday 18 May 2010 11:29:27 Robert Brown wrote: I'm running Ubuntu Linux 10.04 on an Acer Aspire ZG5 netbook that contains Atheros wifi hardware. There's a sticker on the back that says Atheros AR5BXB63. The uname command reports the kernel version as 2.6.32-22-generic. I'm using the

Re: [ath5k-devel] corrupted data

2010-05-17 Thread Luis R. Rodriguez
On Mon, May 17, 2010 at 7:51 PM, Bruno Randolf b...@einfach.org wrote: On Tuesday 18 May 2010 11:29:27 Robert Brown wrote: I'm running Ubuntu Linux 10.04 on an Acer Aspire ZG5 netbook that contains Atheros wifi hardware.  There's a sticker on the back that says Atheros AR5BXB63.  The uname