"Stephen C. Tweedie" wrote:
>
> The original multi-page buffers came from the map_user_kiobuf
> interface: they represented a user data buffer. I'm not wedded to
> that format --- we can happily replace it with a fine-grained sg list
>
Could you change that interface?
<<< from Linus mail:
christophe barbe wrote:
>
> I've missed the thread "avoiding bad sleeps" last week. I've had a similar problem
> and I would like to discuss the solution I've used to avoid it.
>
> I want to wake up a sleeping process from an IRQ handler. In the process, if I use
> a interruptible_sleep_on(), I
"Stephen C. Tweedie" wrote:
>
> You simply cannot do physical disk IO on
> non-sector-aligned memory or in chunks which aren't a multiple of
> sector size.
Why not?
Obviously the disk access itself must be sector aligned and the total
length must be a multiple of the sector length, but there sh
Thomas Stewart wrote:
>
> Right, i patched the via-diag and its showing more regs.
>
> I applyed Manfred's patch but that changed nothing.
> Then I applyed your patch and still changed nothing as you suspected.
> But there are regs that are different.
>
Several regs are just the wakeup frames,
Thomas Stewart wrote:
>
> >
> > CmdReset is not instant, it may need a delay. There is also a "force
> > software reset" operation that sounds good, I assume that one also
> > could use a delay so I gave it 6ms.
> >
6 ms is quite long:
I added a reset into tx_timeout, and that function should no
Urban Widmark wrote:
>
> The "transmit timed out" message is simply saying that we told the card to
> send something but it hasn't generated an interrupt or anything allowing
> the driver to know the packet was actually sent.
>
check via_rhine_tx_timeout():
the function is basically empty.
>
>
Manfred Spraul wrote:
>
> But I think we can change the bug description:
>
> If an io apic io redirection entry is unmasked while the irq pin is
> active, then the io apic sends out the interrupt as edge triggered, but
> nevertheless sets the IRR bit.
>
I found anoth
eously delivered as edge-triggered but the
+ * IRR bit gets set nevertheless.
* As a result the I/O unit expects an EOI message but it will never
* arrive and further interrupts are blocked for the source.
*
@@ -126,12 +126,8 @@
* a level-triggered interrupt and to revert the mode when unma
Alan Cox wrote:
>
> > And one more point for the Janitor's list:
> > Get rid of superflous irqsave()/irqrestore()'s - in 90% of the cases
> > either spin_lock_irq() or spin_lock() is sufficient. That's both faster
> > and better readable.
>
> Expect me to drop any submissions that do this. I'd r
>
>I'm working at a customer site with custom hardware. The 2.4.0 series
> kernel almost works out of the box, but the machine has 52 PCI busses.
> Plans are to produce a 4-way box which would have over 80 PCI busses. The
> file include/asm-i386/mpspec.h allocates space for 32 busses in th
"Maciej W. Rozycki" wrote:
>
> I'll implement an 82489DX update in a few days, but for now I'd like
> everyone interested to test the following patch as much as possible. It
> applies to 2.4.0, 2.4.0-ac12 and 2.4.1-pre11 cleanly.
>
I'm not totally convinced that this fixes all problems:
No loc
David Woodhouse wrote:
>
> -static struct tq_struct dummy_task;
> +static struct tq_struct dummy_task /* = all zero */;
>
That comment is superflous - that's just C.
The non-obvious part is
+static struct tq_struct dummy_task; /* remains zero, run_task_queue()
supports tqs.routine==NULL*/
BUT:
David Woodhouse wrote:
>
> TIOCMIWAIT does restore_flags() before interruptible_sleep_on(). It's
> broken too.
>
Yes, and I found a second bug: it doesn't sti() immediately after
interruptible_sleep_on(), thus cli() doesn't reacquire the global irq
lock --> the atomic copy won't be atomic on SMP.
Arnaldo Carvalho de Melo wrote:
>
> Em Sun, Jan 28, 2001 at 05:14:37PM +0100, Manfred Spraul escreveu:
> > >
> > > Anything which uses sleep_on() has a 90% chance of being broken. Fix
> > > them all, because we want to remove sleep_on() and friends in 2.5.
> &
Is is intentional that tummy_task is not initialized?
Ok, it won't crash because the current __run_task_queue() implementation
doesn't call tq->routine if it's NULL, but IMHO it's ugly.
Additionally I don't like the loop in flush_scheduled_tasks(), what
about replacing it with a locked semaphore
>
> Anything which uses sleep_on() has a 90% chance of being broken. Fix
> them all, because we want to remove sleep_on() and friends in 2.5.
>
Then you can add 'calling schedule() with disabled local interrupts()'
to your list.
--
Manfred
-
To unsubscribe from this list: send the lin
> + *
> + * Changed the slow-down I/O port from 0x80 to 0x19. 0x19 is a
> + * DMA controller scratch register. [EMAIL PROTECTED]
>*/
>
What about making that a config option?
default: delay with 'outb 0x80', other options could be
udelay(n); (n=1,2,3)
outb 0x19
0x80 is
> Yes, Linux is __very__ not right doing this. RFC requires to accept
> ACK, URG and RST on any segment adjacent to window, even if window
> is zero.
>
Interesting: I checked the RFC 793 and came to the conclusion that Linux
is correct. ("special allowance should be made to accept valid ACKs" not
I checked RFC793, and AFAICS Solaris is the culprit:
it sends out invalid packets, Linux ignores them and thus Linux doesn't
receive acks.
Which Solaris version do you use?
* The last valid ack from the Solaris computer is for byte 1583721, win
8760 (line 2078)
* No packet after line 2078 from
I read through the tcpdump, and it seems that Linux completely ignores
packets with out-of-window sequence numbers:
* the solaris computers (dynamic...) sends further data although the
Linux box (static) says 'win 0'.
See lines 2067, 2069, 2076, ...
2066 16:31:43.108759 eth0 > static.8664 > dyn
Donald Becker wrote:
>
> > > However, natsemi.c's spinlock needs to be retained, and
> > > extended into start_tx(), because this driver has
> > > a race which has cropped up in a few others:
> > > ...
> > > if (np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN - 1) {
> > > /* WIN
I've attached Holger's testcase (ext2, SMP, raid5)
boot with "mem=64M" and run the attached script.
The script creates and deletes 9 directories with 10.000 in each dir.
Neil, could you run it? I don't have an raid 5 array - SMP+ext2 without
raid5 is ok.
Holger, what's your ext2 block size, and d
Russell King wrote:
>
> Manfred Spraul writes:
> > Not yet, but that would be a 2 line patch (currently it's hardcoded to
> > BYTES_PER_WORD align or L1_CACHE_BYTES, depending on the HWCACHE_ALIGN
> > flag).
>
> I don't think there's a problem then.
Russell King wrote:
>
> Johannes Erdfelt writes:
> > They need to be visible via DMA. They need to be 16 byte aligned. We
> > also have QH's which have similar requirements, but we don't use as many
> > of them.
>
> Can we get away from the "16 byte aligned" and make it "n byte aligned"?
> I bel
>
> TD's are around 32 bytes big (actually, they may be 48 or even 64 now, I
> haven't checked recently). That's a waste of space for an entire page.
>
> However, having every driver implement it's own slab cache seems a
> complete waste of time when we already have the code to do so in
> mm
[Paul Mackerras and linux-ppp added to the cc list]
It seems that the MPPP reconstruction queue got corrupted:
ppp_mp_reconstruct() called kfree_skb(), and within kfree_skb() the call
to skb->destructor() crashed:
skb->destructor was 0x01010101.
>
>
> I reported this a few months ago without
Holger Kiehl wrote:
> > I'm running your test with 48 MB ram, 12500 files, 9 processes in a 156
> > MB partition (swapoff, here is the test partition ;-).
> > With 192MB Ram I don't see the corruption.
> >
> I am not sure if I understand you correctly: with 48MB you do get
> corruption and with 1
> Another thing I notice is that the responsiveness of the machine
> decreases dramatically as the test progresses until it is nearly
> useless. After the test is done everything is back to normal.
> The same behavior was observed under 2.2.18.
That's expected: ext2 performs linear searches throu
AFAICS mptctl_lock() and mptctl_unlock() are just buggy implementations
of down() and up().
At least the 'current->state = TASK_UNINTERRUPTIBLE' must be moved into
the while(1) loop, or both function could be replaced with a semaphore.
--
Manfred
-
To unsubscribe from this list: send the
The problem is clear:
rtl8139_resume() unconditionally restarts the hardware, even if the
network was not yet started.
The hardware immediately notices something, and sends an interrupt.
The oops happens during rtl8139_open():
the function calls request_irq(), but assumes that the interrupts are
> The only way I have found so far is to write have two FIFO buffers in the
> driver (in and out) and use a daemon running in user space to manage the
> disk access.
Have you thought about using mmap and raw-io?
* the kernel driver allocates a fifo (probably a ring?) buffer. The
driver implemen
It seems that noone uses a Ne2000 compatible pci NIC with a newer
motherboard (every K7 board, Intel 8xx boards, via apollo pro 133), but
I've set up a tiny web site that describes my problem:
colorfullife.com/~manfred/io_apic
--
Manfred
-
To unsubscribe from this list: send the
Russell King wrote:
>
> Doesn't the NCR53C9x SCSI drivers use disable_irq() a lot? Do they have
> any problems?
>
It seems that a certain timing is necessary: one flood ping or a single
ncp usually doesn't trigger any problems, but 2 concurrent flood pings
hang the network after 5-10 seconds. I
Frank de Lange wrote:
>
> It could be that people using those cards are not the ones who tend
> to go for the (somewhat tricky) BP6 board...
>
I doubt that it's BP6 specific: I have the problem with a Gigabyte BXD
board and I doubt that Ingo used an BP6. Perhaps 82093AA specific (the
IO APIC ch
Linus Torvalds wrote:
>
> It may well not be disable_irq() that is buggy. In fact, there's good
> reason to believe that it's a hardware problem.
>
Perhaps a problem with the 82093AA external IO APIC used for 440BX
board? I haven't seen any reports from newer Intel boards (the ICH2
includes an I
Alan Cox wrote:
>
> > Could you disable both bandaids? I disabled them, no problems so far.
> > Now back to the disable_irq_nosync().
>
> Ok so it looks like the disable_irq code is buggy. Unfortunately its not
> just used for these drivers they are just the heaviest users.
>
> Given that we ca
Frank de Lange wrote:
>
> On Fri, Jan 12, 2001 at 09:54:31PM +0100, Manfred Spraul wrote:
> > I have found one combination that doesn't hang with the unpatched
> > 8390.c, but network throughput is down to 1/2. I hope that's due to the
> > debugging changes.
Ingo Molnar wrote:
>
>
> okay - i just wanted to hear a definitive word from you that this fixes
> your problem, because this is what we'll have to do as a final solution.
> (barring any other solution.)
>
Ingo, is that possible?
The current fix is "disable_irq_nosync() and enable_irq() cause
Linus Torvalds wrote:
>
>
> I'd like to know _which_ of the two makes a difference (or does it only
> trigger with both of them enabled)? And even then I'm not sure that it is
> "the" solution - both changes to io-apic handling had some reason for
> them. Ingo, what was the focus-cpu thing?
>
Frank de Lange wrote:
>
> On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote:
> > I removed the disable_irq lines from 8390.c, and that fixed the problem:
> > no hang within 2 minutes - the test is still running.
> >
> > Frank, could you double check it?
Linus wrote:
> Does this seem to happen mainly with drivers that use "disable_irq()"
> and "enable_irq()"? I know the ne drivers do (through the 8390 module),
> and some others do too (3c59x).
I removed the disable_irq lines from 8390.c, and that fixed the problem:
no hang within 2 minutes - t
Ingo Molnar wrote:
>
> we *already* reorder vector numbers and spread them out as much as
> possible. We do this in 2.2 as well. We did this almost from day 1 of
> IO-APIC support. If any manually allocated IRQ vector creates a '3 vectors
> in the same 16-vector region' situation then thats a bug
Alan Cox wrote:
>
> > Frank, could you try what happens with the NMI oopser disabled?
> >
> > The second major difference I'm immediately aware of is the number of
> > the reschedule/tlb flush/etc interrupt: 2.2 uses the lowest priority,
> > 2.4 the highest priority.
>
> Im trying to remember wh
Frank de Lange wrote:
>
> On Fri, Jan 12, 2001 at 06:16:36PM +0100, Manfred Spraul wrote:
> > I would first concentrate on the differences between 2.2 and 2.4:
> >
> > Frank, could you try what happens with the NMI oopser disabled?
>
> Here's the result
>
> [EMAIL PROTECTED] said:
> > IRR for interrupt 19 is set, that means the IO APIC has sent the
> > interrupt to a cpu but not yet received the corresponding EOI.
>
> OK, but couldn't we reset it by sending an extra EOI when the drivers
> decide that they've missed interrupts?
How?
You se
Let's decode it:
> IO APIC #2..
> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
> 12 0FF 0F 0 1 0 1 0 1 1 91
> 13 0FF 0F 0 1 1 1 0 1 1 99
IRR for interrupt 19 is set, that means the IO APIC has sent the
interrupt to a cpu but not yet received the corresponding EOI.
That bit is read
d 6 receivers, and one of them was unreachable for 50
minutes. It seems the sendmail resend the message to all receivers,
although the first 5 were successful.
One retry every 10 minutes --> 6 duplicates
Sorry,
Manfred Spraul
-
To unsubscribe from this list: send the line "unsubsc
Trond Myklebust wrote:
>
>
> As for the issue of casting 'fh->data' as a 'struct knfsd' then that
> is a perfectly valid operation.
>
No it isn't.
fh->data is an array of characters, thus without any alignment
restrictions.
'struct knfsd' begins with a pointer, thus it must be 4 or 8 byte
align
Frank de Lange wrote:
>
> Hi'all,
>
> Ever since I put two ethernet-cards (cheap Winbond W89C940 based PCI NE2K
> clones) in my BP-6 system, I've been experiencing intermittent network hangs.
>
Which driver do you use? The driver in 2.4.0 contains several bugfixes.
If that driver still hangs th
Andi Kleen wrote:
>
> On Wed, Jan 10, 2001 at 03:40:50PM -0500, Jonathan Earle wrote:
> > Where do I go from here? Is there info somewhere to help with this? Is
> > this a bigger job than it looks on the surface?
>
> Try http://www.firstfloor.org/~andi/softnet
>
I would ask someone from znxz
Ingo Molnar wrote:
>
> On Wed, 10 Jan 2001, Manfred Spraul wrote:
>
> > That means sendmsg() changes the page tables? I measures
> > smp_call_function on my Dual Pentium 350, and it took around 1950 cpu
> > ticks.
>
> well, this is a performance problem if
> > In user space, how do you know when its safe to reuse the buffer that
> > was handed to sendmsg() with the MSG_NOCOPY flag? Or does sendmsg()
> > with that flag block until the buffer isn't needed by the kernel any
> > more? If it does block, doesn't that defeat the use of non-blocking
> >
sct wrote:
> We've already got measurements showing how insane this is. Raw IO
> requests, plus internal pagebuf contiguous requests from XFS, have to
> get broken down into page-sized chunks by the current ll_rw_block()
> API, only to get reassembled by the make_request code. It's
> *enormous
I would try to:
* implement foo_poll() in the kernel driver.
* the user space app calls select() or poll().
WaitForSingleObject should be easy to replace.
WaitForMultipleObjects could be tricky if you wait for different events
(e.g. wait until either the kernel driver has new data, or another
p
>
> +#ifdef DEBUG_SLAB
> + if (retval < 0 ) {
> + if(kmem_cache_destroy(uhci_desc_kmem))
Why only #ifdef DEBUG_SLAB?
AFAICS the driver should always destroy it's slab cache.
Please cc, I'm not subscribed to linux-kernel.
--
Manfred
-
To unsubscribe from this list: send
601 - 655 of 655 matches
Mail list logo