Benjamin Herrenschmidt writes:
Ok so you'll have to make up a workaround in prom_init that looks for
OHCI's in the device-tree and disable them.
Check if the OHCI node has some existing f-code words you can use for
that with dev /path-to-ohci words in OF for example. If not, you may
need
On Wed, Oct 27, pac...@kosh.dhis.org wrote:
|1. How do I locate all usb nodes in the device tree?
|
|2. How do I know if a particular usb node is OHCI?
In the installed system, run 'lspci | grep -i usb', this gives the pci
bus numbers. Then run 'find /sys -name devspec', and look or the bus
Since then, the silence has been deafening.
My assumption now is that this is not ever getting fixed. I'm certainly not
able to fix it. I'm not a even kernel programmer! I got far enough to
diagnose the cause just with the add more printk's and boot it again
technique. Hundreds of reboots
Benjamin Herrenschmidt writes:
On Wed, 2010-10-20 at 13:33 -0500, pac...@kosh.dhis.org wrote:
Just try :-) quiesce is something that afaik only apple ever
implemented anyways. It uses hooks inside their OF to shut down all
drivers that do bus master (among other HW sanitization tasks).
On Tue, 2010-10-19 at 22:23 -0500, pac...@kosh.dhis.org wrote:
The diff fragment above applied inside prom_close_stdin, but there are
some
prom_printf calls after prom_close_stdin. Calling prom_printf after
closing
stdout sounds like it could be bad. If I moved it down below all the
Benjamin Herrenschmidt writes:
On Tue, 2010-10-19 at 22:23 -0500, pac...@kosh.dhis.org wrote:
The diff fragment above applied inside prom_close_stdin, but there are
some
prom_printf calls after prom_close_stdin. Calling prom_printf after
closing
stdout sounds like it could be bad. If
On Wed, 2010-10-20 at 13:33 -0500, pac...@kosh.dhis.org wrote:
Just try :-) quiesce is something that afaik only apple ever
implemented anyways. It uses hooks inside their OF to shut down all
drivers that do bus master (among other HW sanitization tasks).
I booted a version with a
From there, you might be able to close onto the culprit a bit more, for
example, try using the DABR register to set data access breakpoints
shortly before the corruption spot. AFAIK, On those old 32-bit CPUs, you
can set whether you want it to break on a real or a virtual address.
I
On Tue, 19 Oct 2010, Helmut Grohne wrote:
On Mon, Oct 18, 2010 at 11:55:44PM +0200, Thomas Gleixner wrote:
I might be completely one off as usual, but this thing reminds me of a
bug I stared at yesterday night:
This problem is completely unrelated. My problem was caused by using
On Mon, Oct 18, 2010 at 11:55:44PM +0200, Thomas Gleixner wrote:
I might be completely one off as usual, but this thing reminds me of a
bug I stared at yesterday night:
This problem is completely unrelated. My problem was caused by using
binutils-gold.
Helmut
Benjamin Herrenschmidt writes:
I thought of that, but as far as I can tell, this CPU doesn't have DABR.
AFAIK, the 7447 is just a derivative of the 7450 design which -does-
have a DABR ... Unless it's broken :-)
Hmm. gdb resorts to single-stepping when I set a watchpoint while debugging
I made a new discovery.
And this nails it :-)
So then I ran
dd if=/dev/mem bs=4 count=1 skip=$((0xfc5c080/4)) | od -t x4
a few times very fast, plucking the first affected word directly out of
memory by its physical address. The result:
The low 16 bits are always zero as before. The
On Tue, 2010-10-19 at 13:10 -0500, pac...@kosh.dhis.org wrote:
So what type of driver, firmware, or hardware bug puts a 16-bit 1000Hz
timer
in memory, and does it in little-endian instead of the CPU's native
byte
order? And why does it stop doing it some time during the early init
scripts,
On Tue, 2010-10-19 at 22:47 +0200, Segher Boessenkool wrote:
It looks like it is the frame counter in an USB OHCI HCCA.
16-bit, 1kHz update, offset x'80 in a page.
So either the kernel forgot to call quiesce on it, or the firmware
doesn't implement that, or the firmware messed up some
Benjamin Herrenschmidt writes:
On Tue, 2010-10-19 at 22:47 +0200, Segher Boessenkool wrote:
It looks like it is the frame counter in an USB OHCI HCCA.
16-bit, 1kHz update, offset x'80 in a page.
So either the kernel forgot to call quiesce on it, or the firmware
doesn't implement
On Wed, Oct 13, 2010 at 12:52:05PM -0500, pac...@kosh.dhis.org wrote:
Mel Gorman writes:
On Mon, Oct 11, 2010 at 02:00:39PM -0700, Andrew Morton wrote:
It's corruption of user memory, which is unusual. I'd be wondering if
there was a pre-existing bug which 6dda9d55bf545013597 has
Mel Gorman writes:
A bit but I still don't know why it would cause corruption. Maybe this is
still
a caching issue but the difference in timing between list_add and
list_add_tail
is enough to hide the bug. It's also possible there are some registers
ioremapped after the memmap array and
On Mon, 18 Oct 2010 12:33:31 +0100
Mel Gorman m...@csn.ul.ie wrote:
A bit but I still don't know why it would cause corruption. Maybe this is
still
a caching issue but the difference in timing between list_add and
list_add_tail
is enough to hide the bug. It's also possible there are some
On Wed, 2010-10-13 at 15:40 +0100, Mel Gorman wrote:
This is somewhat contrived but I can see how it might happen even on one
CPU particularly if the L1 cache is virtual and is loose about checking
physical tags.
How sensitive/vulnerable is PPC32 to such things?
I can not tell you
On Mon, 2010-10-18 at 12:37 -0700, Andrew Morton wrote:
Well, you've spotted a bug so I'd say we fix it asap.
It's a bit of a shame that we lose the only known way of reproducing a
different bug, but presumably that will come back and bite someone
else
one day, and we'll fix it then :(
On Mon, 2010-10-18 at 14:10 -0500, pac...@kosh.dhis.org wrote:
I've been flailing around quite a bit. Here's my latest result:
Since I can view the corruption with md5sum /sbin/e2fsck, I know it's in a
clean cached page. So I made an extra copy of /sbin/e2fsck, which won't be
loaded into
Benjamin Herrenschmidt writes:
You can do something fun... like a timer interrupt that peeks at those
physical addresses from the linear mapping for example, and try to find
out when they get set to the wrong value (you should observe the load
from disk, then the corruption, unless they end
On Mon, 18 Oct 2010, Andrew Morton wrote:
On Mon, 18 Oct 2010 12:33:31 +0100
Mel Gorman m...@csn.ul.ie wrote:
A bit but I still don't know why it would cause corruption. Maybe this is
still
a caching issue but the difference in timing between list_add and
list_add_tail
is enough
On Mon, Oct 11, 2010 at 02:00:39PM -0700, Andrew Morton wrote:
(cc linuxppc-dev@lists.ozlabs.org)
On Mon, 11 Oct 2010 15:30:22 +0100
Mel Gorman m...@csn.ul.ie wrote:
On Sat, Oct 09, 2010 at 04:57:18AM -0500, pac...@kosh.dhis.org wrote:
(What a big Cc: list... scripts/get_maintainer.pl
Mel Gorman writes:
On Mon, Oct 11, 2010 at 02:00:39PM -0700, Andrew Morton wrote:
It's corruption of user memory, which is unusual. I'd be wondering if
there was a pre-existing bug which 6dda9d55bf545013597 has exposed -
previously the corruption was hitting something harmless.
(cc linuxppc-dev@lists.ozlabs.org)
On Mon, 11 Oct 2010 15:30:22 +0100
Mel Gorman m...@csn.ul.ie wrote:
On Sat, Oct 09, 2010 at 04:57:18AM -0500, pac...@kosh.dhis.org wrote:
(What a big Cc: list... scripts/get_maintainer.pl made me do it.)
This will be a long story with a weak conclusion,
26 matches
Mail list logo