On 02/08/2015 02:25 PM, Jörg Krause wrote:
On Fr, 2015-02-06 at 11:06 -0700, Stephen Warren wrote:
On 02/05/2015 06:06 PM, Jörg Krause wrote:
On Do, 2015-02-05 at 15:23 -0700, Stephen Warren wrote:

b) In ci_bounce(), the bounce buffer is only allocated if the
user-buffer is already aligned, and if a large-enough bounce buffer
wasn't previously allocated. If ci_req->b_buf was uninitialized it could
be non-zero (thus preventing the expected aligned allocation) yet not
actually aligned enough.

I can reproduce this issue now. After some "timeout sending packets to
usb ethernet" messages, the bounce buffer somehow gets corrupted.
ci_bounce() is called with an unaligned input buffer length
'req->length=66', but the bounce buffer length
'ci_req->b_len=1140305940' or in hex 'ci_req->b_len=0x43f7b014'. This
bounce buffer length is obviously an address, as the following
misaligned error message shows: "CACHE: Misaligned operation at range
[43f7b010, 43f7b070]".

Ah, I hadn't realized that was [start, length] rather than [start, end].

The question is: How is ci_req->b_len getting corrupted? Is it simply
never initialized, or does something trash that value later?

ci_ep_alloc_request() appears to calloc() the whole struct ci_req, so I
imagine an initialization/allocating error isn't happening.

The only issue there might be some code somehow creating its own struct
usb_request instead of calling into the controller's ->alloc_request()
function. I vaguely recall fixing some of those, but might have missed
some in protocols that I didn't test (i.e. anything other than USB Mass
Storage or DFU, although I might have very briefly tested netconsole once?).

I would suggest adding a whole ton of printfs() to catch where ci_reqs
are being allocated, and where ci_req->b_len is getting written in which
ci_req objects, and then mapping that back to the ci_req that the cache
alignment error message complains about. Sorry, this will be a bit painful.

If the ci_req is always at the same address on different boots of the
code, that will make it easier, especially if you have a debugger with a
data watchpoint, or can write some code to use any data watchpoint
self-hosted debug capability in your CPU.

I think I found the answer.

I used a lot of debug messages and tried to understand the involved
drivers. I will try to sum up my investigations.

NetLoop is entered first for the ping protocol and then for the
netconsole protocol:
         --- NetLoop Entry (PING)
         --- NetLoop Entry (NETCONSOLE)

ci_udc is probed and ci_ep_alloc_request() is called for EP0:
         ci_ep_alloc_request usb_ep:0x43fd0028 ci_req:0x43b83220

In eth_bind() ci_ep_alloc_request() is called for status EP:
         ci_ep_alloc_request usb_ep:0x43fd00a0 ci_req:0x43b7b4a8

Everything is fine:
         using ci_udc, OUT ep- IN ep- STATUS ep-
         MAC 00:19:b8:00:00:02
         HOST MAC 00:19:b8:00:00:01

In eth_set_config() ci_ep_alloc_request() is invoked again for tx_req
and rx_req:
         ci_ep_alloc_request usb_ep:0x43fd0050 ci_req:0x43b7b568
         ci_ep_alloc_request usb_ep:0x43fd0078 ci_req:0x43b7b5b0

Everything is fine:
         high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet
         USB network up!

Now the NetLoop is available for netconsole:
         --- NetState set to 0 (CONTINUE)
         --- NetLoop Init (NETCONSOLE)
         --- NetLoop ARP handler set (43fa3b14)
         [..]
         Got ARP REPLY, set eth addr (00:19:b8:00:00:01)
         --- NetState set to 2 (SUCCESS)
         --- NetLoop UDP handler set (00000000)
         --- NetLoop ARP handler set (00000000)
         --- NetLoop timeout handler cancelled
         --- NetLoop Success! (NETCONSOLE)

Now it's pings turn in the NetLoop. The if (eth_is_on_demand_init())
branch is executed. eth_halt() and later eth_disconnect() and
eth_reset_config() are invoked. There the in and out EPs are freed:
         ci_ep_free_request ci_req:0x43b7b568
         ci_ep_free_request ci_req:0x43b7b5b0

That sounds like a problem.

I would assume that once netconsole is enabled, the Ethernet connection is no longer on-demand, but should be initialized once, and then never taken down.

Take a look at the definition of eth_is_on_demand_init() in include/net.h. It has a special case for CONFIG_NETCONSOLE, based on variable net_loop_last_protocol. Perhaps that isn't getting set correctly, or perhaps the variable is getting set to something other than netconsole, thus causing the connection to get taken down?

However, netconsole tries to send every printf() as an UDP packet to the
host.
         sending UDP to 10.0.0.1/00:19:b8:00:00:01

But the usb_request [1] pointer in usb_eth_send is NULL after the free
request:
         usb_eth_send usb_request: 00000000 length: 82

usb_eth_send() calls ci_ep_queue() which calls ci_bounce() and because
of the corrupted ci_req pointer I get a misaligned cache.

So, as far as I understand, the main problem is that netconsole does not
knows that the connection is disconnected by the NetLoop?

What are your suggestions to this issue? Should we add an addition check
for an initialized usb_request pointer in usb_eth_send()? Or somewhere
else?

It might be a useful debugging/validation aid to add a check somewhere for valid non-free struct usb_requests. However, it sounds like that's just a symptom; the underlying problem is that the Ethernet connection is taken down at all, while netconsole is active.
_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Reply via email to