On Thu, Jun 10, 2010 at 01:04:19PM +0900, Bruno Randolf wrote:
great!
So the following patch takes care of the reset part. I only boot-tested
it so far.
(hmm, tasklet_disable() might actually
do that indirectly, I think).
i dont think so. it just waits for the tasklet to finish and
On Sunday 06 June 2010 22:07:00 Nick Kossifidis wrote:
We read-and-clear RXDP on hw_nic_reset (actually we do that for any
pending register writes -we need to ensure that all pending register
writes are complete by doing a DMA register read, docs suggest RXDP
and that's what we are doing on
On Wed, Jun 9, 2010 at 3:40 AM, Bruno Randolf b...@einfach.org wrote:
hey, i guess something like this must be happening...
i think we have to avoid tasklets being run concurrently to a reset.
so does this patch help?
Heh, I made a similar patch this morning before I saw this. I think
the
I wrote some code to test my hunch. I was under the mistaken impression
that
each receive tasklet starts looking at receive buffers from the point where
the
hardware last copied data. That's not the case, as Bob mentioned in one of
his
emails. All receive tasklets start traversing the receive
I don't know anything about work queues, but I did read some documentation
about tasklets. Relevant info: tasklets always run on the CPU that
schedules
them, multiple tasklets can run at the same time on different CPUs. You
folks
probably know this already
bob
On Wed, Jun 9, 2010 at 3:07
The wireshark experiment I ran earlier is worth considering again.
Wireshark
saw block 110413 sent, but the receive tasklet saw a stale block, 107223.
In the data file, block 110413 appears where block 110412 should be in
addition
to appearing in its proper spot. The receive tasklet saw data for
On Wednesday 09 June 2010 22:42:27 Bob Copeland wrote:
On Wed, Jun 9, 2010 at 3:40 AM, Bruno Randolf b...@einfach.org wrote:
hey, i guess something like this must be happening...
i think we have to avoid tasklets being run concurrently to a reset.
so does this patch help?
Heh, I made
2010/6/10 Bruno Randolf b...@einfach.org:
On Wednesday 09 June 2010 22:42:27 Bob Copeland wrote:
On Wed, Jun 9, 2010 at 3:40 AM, Bruno Randolf b...@einfach.org wrote:
hey, i guess something like this must be happening...
i think we have to avoid tasklets being run concurrently to a reset.
On Thu, Jun 10, 2010 at 10:21:44AM +0900, Bruno Randolf wrote:
hi bob!
sounds like a good plan... do you have time to do that?
Yeah, I started on this already, will probably have it in a day or two.
could we improve background scanning? right now it will make us loose
packets...
One
On Thursday 10 June 2010 12:51:07 Bob Copeland wrote:
On Thu, Jun 10, 2010 at 10:21:44AM +0900, Bruno Randolf wrote:
hi bob!
sounds like a good plan... do you have time to do that?
Yeah, I started on this already, will probably have it in a day or two.
great!
could we improve
On Sun, Jun 6, 2010 at 8:16 PM, Robert Brown robert.br...@gmail.com wrote:
Sure, but I only see one place in ath5k_rxbuf_setup where it returns
an error value (-ENOMEM). I'm using a 5/18/2010 snapshot. Should I
grab a later version?
Are there any cache issues when dealing with hardware
On Mon, Jun 7, 2010 at 10:20 AM, Robert Brown robert.br...@gmail.com wrote:
I think corruption of received data can happen whenever
ath5k_reset is called when there are pending instances of ath5k_tasklet_rx
scheduled, but not yet executed.
ath5k_reset calls ath5k_rx_start to initialize the
I think corruption of received data can happen whenever
ath5k_reset is called when there are pending instances of ath5k_tasklet_rx
scheduled, but not yet executed.
ath5k_reset calls ath5k_rx_start to initialize the rxbuf queue, then sets
RXDP to the head of the queue and turns DMA back on. At
On Sat, Jun 5, 2010 at 3:16 PM, Robert Brown robert.br...@gmail.com wrote:
I transferred 1 Gb of data twice while executing the scan command as above.
I got 3 corrupted blocks on the first transfer and 2 on the second, which is
more than I normally expect. Usually, I have to transfer 2 or 3 Gb
2010/6/6 Nick Kossifidis mickfl...@gmail.com:
2010/6/5 Benoit Papillault benoit.papilla...@free.fr:
My feeling is that both RXDP TXDP registers are not properly
initialized after a reset causing duplicate in both directions. So I
believe you should see duplicate on the TX side as well.
We read-and-clear RXDP on hw_nic_reset (actually we do that for any
pending register writes -we need to ensure that all pending register
writes are complete by doing a DMA register read, docs suggest RXDP
and that's what we are doing on nic_reset, we need to change that
comment-) and restore
Sure, but I only see one place in ath5k_rxbuf_setup where it returns
an error value (-ENOMEM). I'm using a 5/18/2010 snapshot. Should I
grab a later version?
Are there any cache issues when dealing with hardware devices like
the Atheros chip? For instance, ath5k_rxbuf_setup sets up descriptors
Le 05/06/2010 14:01, Robert Brown a écrit :
Comments below ...
On Sat, Jun 5, 2010 at 4:34 AM, Benoit Papillault
benoit.papilla...@free.fr mailto:benoit.papilla...@free.fr wrote:
It helps for sure. But i'm not seeing in your log where do you
receive the duplicate 110413 110414
On Sat, Jun 5, 2010 at 8:29 AM, Benoit Papillault benoit.papilla...@free.fr
wrote:
Le 05/06/2010 14:01, Robert Brown a écrit :
Comments below ...
On Sat, Jun 5, 2010 at 4:34 AM, Benoit Papillault
benoit.papilla...@free.fr mailto:benoit.papilla...@free.fr wrote:
It helps for sure. But
2010/6/5 Robert Brown robert.br...@gmail.com:
On Sat, Jun 5, 2010 at 8:29 AM, Benoit Papillault
benoit.papilla...@free.fr wrote:
Le 05/06/2010 14:01, Robert Brown a écrit :
Comments below ...
On Sat, Jun 5, 2010 at 4:34 AM, Benoit Papillault
benoit.papilla...@free.fr
I can try changing the DMA size. Can you give some motivation
for the change? Are you hopeful that it will fix the problem?
If so, what's the rationale?
bob
On Sat, Jun 5, 2010 at 2:56 PM, Nick Kossifidis mickfl...@gmail.com wrote:
2010/6/5 Robert Brown
On Sat, Jun 5, 2010 at 4:34 AM, Benoit Papillault benoit.papilla...@free.fr
wrote:
It might be likely that the problem is caused by background scan done by
network manager since this calls ath5k_reset. So, one way to trigger the
problem more often is by doing more scanning.
Could you try a
Le 05/06/2010 14:50, Robert Brown a écrit :
I think I have a likely explanation for the data corruption. It looks like
the hardware is not transferring the data to place the driver expects.
I ran Wireshark on a Macintosh portable while transferring a file, so I now
know what packets the
Here's a progress report on tracking down the data corruption bug.
I created a 1 Gb file containing random data, except that every 1000 bytes
the file contains followed by a block number in the
range 0 to 99. I wrote some code to detect the pattern of Xs and print
out
Here's another instance of data corruption. One block containing labels
110413 110414 is duplicated in the output file.
I've instrumented ath5k_tasklet_rx to print ATH followed by packet block
information. Function __ieee80211_rx_handle_packet prints MACI. Function
ieee80211_deliver_skb
On Fri, Jun 4, 2010 at 4:24 PM, Robert Brown robert.br...@gmail.com wrote:
Here's another instance of data corruption. One block containing labels
110413 110414 is duplicated in the output file.
I've instrumented ath5k_tasklet_rx to print ATH followed by packet block
information.
Can you
On Fri, Jun 4, 2010 at 4:31 PM, Bob Copeland m...@bobcopeland.com wrote:
On Fri, Jun 4, 2010 at 4:24 PM, Robert Brown robert.br...@gmail.com
wrote:
Here's another instance of data corruption. One block containing labels
110413 110414 is duplicated in the output file.
I've instrumented
On Fri, Jun 4, 2010 at 4:24 PM, Robert Brown robert.br...@gmail.com wrote:
There seems to be stale data in the skb that ath5k_tacklet_rx passes to
__ieee80211_rx_handle_packet. Note that ieee80211_deliver_skb does not pass
it along to netif_rx.
It'd be interesting to know where mac80211 drops
Answers interleaved ...
On Fri, Jun 4, 2010 at 5:39 PM, Bob Copeland m...@bobcopeland.com wrote:
On Fri, Jun 4, 2010 at 4:24 PM, Robert Brown robert.br...@gmail.com
wrote:
There seems to be stale data in the skb that ath5k_tacklet_rx passes to
__ieee80211_rx_handle_packet. Note that
On Tue, Jun 01, 2010 at 12:32:52PM -0400, Robert Brown wrote:
All the calls to ath5k_reset appear to be from ath5k_chan_set.
Yes, I'm using Ubuntu 10.04, so there's a network manager running.
bob
(stating the obvious)
These are no doubt from background scanning...
--
Bob Copeland %%
Yes, I'm using Ubuntu 10.04, so there's a network manager running.
bob
==
On Mon, May 31, 2010 at 11:58 PM, Bruno Randolf b...@einfach.org wrote:
On Tuesday 01 June 2010 06:30:52 Robert Brown wrote:
All the calls to ath5k_reset appear to be from ath5k_chan_set.
so that must come
On Wednesday 02 June 2010 01:32:52 Robert Brown wrote:
Yes, I'm using Ubuntu 10.04, so there's a network manager running.
ok, 2 things to try, please: the patch i mentioned before (ASPM) and disabling
network-manager.
thanks to your information i found that ath5k_reset is not properly
On Tue, Jun 01, 2010 at 05:53:33PM -0700, Bruno Randolf wrote:
On Wednesday 02 June 2010 01:32:52 Robert Brown wrote:
Yes, I'm using Ubuntu 10.04, so there's a network manager running.
ok, 2 things to try, please: the patch i mentioned before (ASPM) and
disabling
network-manager.
On Sun, May 30, 2010 at 8:51 PM, Bruno Randolf b...@einfach.org wrote:
On Saturday 29 May 2010 02:39:38 Robert Brown wrote:
My repeated packet detection hack is not perfect and given how often it
fires it's probably showing TCP retransmissions or something similar
that's totally normal..
All the calls to ath5k_reset appear to be from ath5k_chan_set.
bob
==
On Sun, May 30, 2010 at 8:51 PM, Bruno Randolf b...@einfach.org wrote:
On Saturday 29 May 2010 02:39:38 Robert Brown wrote:
also i am wondering why you have so many resets? (a review of the reset
code is
On Tuesday 01 June 2010 06:30:52 Robert Brown wrote:
All the calls to ath5k_reset appear to be from ath5k_chan_set.
so that must come thru mac80211 and finally userspace. do you have anything
like network-manager running?
bruno
___
ath5k-devel
On Saturday 29 May 2010 02:39:38 Robert Brown wrote:
+ #define HIST_SIZE 100
+ unsigned char history[HIST_SIZE];
! #define OFFSET 1000
! if (skb-len OFFSET + HIST_SIZE) {
! unsigned char *data = skb-data + OFFSET;
! if (memcmp(sc-history, data, HIST_SIZE) == 0) {
!
On Friday 28 May 2010 14:49:12 Robert Brown wrote:
I wasn't able to match up instances of data corruption with the repeated
packets I was printing or with the device resets.
Since the corruption doesn't seem to occur when ath5k_tasklet_rx sends the
same data twice, something more
On Friday 28 May 2010 00:14:18 Bob Copeland wrote:
Well the invariant is supposed to be that we are always behind where the
hardware is. If that holds, the hardware can't loop around and write into
the buffer we are currently processing, because we'd have to add the buffer
back to the
On Thursday 27 May 2010 13:29:17 Robert Brown wrote:
I tried more data transfer at home. The patch does not solve the problem.
I added a debugging message to see if this break statement in the patch is
ever
executed:
+ if (ds-ds_link == bf-daddr)
+
On Thu, 2010-05-27 at 18:41 +0900, Bruno Randolf wrote:
On Thursday 27 May 2010 17:02:12 Roman Yepishev wrote:
I've been following this thread closely for some time as I have exactly
the same experience with my Acer Aspire One (AR5001 card):
it strikes me that in both cases it happens on
Comments interleaved below
On Thu, May 27, 2010 at 5:41 AM, Bruno Randolf b...@einfach.org wrote:
On Thursday 27 May 2010 17:02:12 Roman Yepishev wrote:
Usually this happens when a continuous block of data is transmitted,
i.e. during deb packages updates via http.
When corruption happens
On Thu, May 27, 2010 at 11:14 AM, Bob Copeland m...@bobcopeland.com wrote:
On Wed, May 26, 2010 at 9:15 PM, Bruno Randolf b...@einfach.org wrote:
At least the code for alpha -- I looked at the first no-op implementation :)
has this comment:
After this call, reads by the cpu to the buffer
Le 27/05/2010 17:14, Bob Copeland a écrit :
That said... reset() rebuilds the buffer list and resets the sclink pointers
etc. I'm not so sure it does it properly, though last time I reviewed it
I didn't find anything.
It worth to check it again. I guess RXDP are reset to NULL pointers
I wasn't able to match up instances of data corruption with the repeated
packets I was printing or with the device resets.
Since the corruption doesn't seem to occur when ath5k_tasklet_rx sends the
same data twice, something more interesting is happening.
bob
On Tue, May 25, 2010 at 11:04 PM, Bruno Randolf b...@einfach.org wrote:
please try this patch and tell us if it helped...
bruno
commit 0379d1e5a850f8e63832516af4245b9824d8623d
Author: Bruno Randolf b...@einfach.org
Date: Wed May 26 11:58:11 2010 +0900
ath5k: better checks for rx
Thanks very much for the patch suggestion.
I ran some large scp commands today with the patch installed.
Several died with error message: Corrupted MAC on input. so it's likely
the patch has not fixed the problem. I'll run some more tests at home
where I can use FTP and have control over the
On Wednesday 26 May 2010 21:38:00 Bob Copeland wrote:
+ /* never process the self-linked entry at the end */
+ if (ds-ds_link == bf-daddr)
+ break;
+
By this you're saying that the RXDP check just above
is insufficient?
i don't know. i
On Sun, May 23, 2010 at 11:49 AM, Robert Brown robert.br...@gmail.com wrote:
I do think it's worthwhile to think about how a repeated block of data could
get into the destination file, assuming something in the networking code is
buggy. TCP/IP should be verifying packet checksums, so isn't it
I have not experienced data corruption when transferring data via ethernet.
I seem to get one corruption every 2 to 3 Gb over wifi.
The symptom is a block of data, roughly 1400 bytes in size, that appears
twice in a row in the destination file. The first instance of the repeated data
is
On Tue, May 25, 2010 at 1:40 PM, Robert Brown robert.br...@gmail.com wrote:
The symptom is a block of data, roughly 1400 bytes in size, that appears
twice in a row in the destination file. The first instance of the repeated
data
is erroneous. It does not match the source file. The second
please try this patch and tell us if it helped...
bruno
commit 0379d1e5a850f8e63832516af4245b9824d8623d
Author: Bruno Randolf b...@einfach.org
Date: Wed May 26 11:58:11 2010 +0900
ath5k: better checks for rx descriptor processing
- add check not to process self-linked rx
On Sat, May 22, 2010 at 08:43:48PM -0700, Robert Brown wrote:
It wasn't immediately apparent to me how to undo the Ubuntu backports option
Luis suggested in his email, so I opted instead to download source code --
compat-wireless-2010-05-18.
I couldn't bring myself to run make install
On Sat, May 22, 2010 at 11:43 PM, Robert Brown robert.br...@gmail.com wrote:
What other experiments can I do that will help figure out the cause of this
problem?
Please show the data in the corruption, and also please enable
slab/slub debugging or kmemcheck in your kernel.
--
Bob Copeland %%
Here's the output of cmp -l good bad for two files that experienced
corruption in transfer.
I'll investigate turning on the kernel debugging options you suggested. It
looks like I may have to compile a kernel to do this, however. I'm not sure
right now what's possible with a stock Ubuntu
On Tuesday 18 May 2010 11:29:27 Robert Brown wrote:
I'm running Ubuntu Linux 10.04 on an Acer Aspire ZG5 netbook that contains
Atheros wifi hardware. There's a sticker on the back that says Atheros
AR5BXB63. The uname command reports the kernel version as
2.6.32-22-generic. I'm using the
On Mon, May 17, 2010 at 7:51 PM, Bruno Randolf b...@einfach.org wrote:
On Tuesday 18 May 2010 11:29:27 Robert Brown wrote:
I'm running Ubuntu Linux 10.04 on an Acer Aspire ZG5 netbook that contains
Atheros wifi hardware. There's a sticker on the back that says Atheros
AR5BXB63. The uname
57 matches
Mail list logo