Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait /notify + callback chains

2001-02-05 Thread Manfred Spraul
"Stephen C. Tweedie" wrote: > > The original multi-page buffers came from the map_user_kiobuf > interface: they represented a user data buffer. I'm not wedded to > that format --- we can happily replace it with a fine-grained sg list > Could you change that interface? <<< from Linus mail:

Re: IRQ and sleep_on

2001-02-05 Thread Manfred Spraul
christophe barbe wrote: > > I've missed the thread "avoiding bad sleeps" last week. I've had a similar problem > and I would like to discuss the solution I've used to avoid it. > > I want to wake up a sleeping process from an IRQ handler. In the process, if I use > a interruptible_sleep_on(), I

Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait /notify + callback chains

2001-02-05 Thread Manfred Spraul
"Stephen C. Tweedie" wrote: > > You simply cannot do physical disk IO on > non-sector-aligned memory or in chunks which aren't a multiple of > sector size. Why not? Obviously the disk access itself must be sector aligned and the total length must be a multiple of the sector length, but there sh

Re: d-link dfe-530 tx (bug-report)

2001-02-05 Thread Manfred Spraul
Thomas Stewart wrote: > > Right, i patched the via-diag and its showing more regs. > > I applyed Manfred's patch but that changed nothing. > Then I applyed your patch and still changed nothing as you suspected. > But there are regs that are different. > Several regs are just the wakeup frames,

Re: d-link dfe-530 tx (bug-report)

2001-02-05 Thread Manfred Spraul
Thomas Stewart wrote: > > > > > CmdReset is not instant, it may need a delay. There is also a "force > > software reset" operation that sounds good, I assume that one also > > could use a delay so I gave it 6ms. > > 6 ms is quite long: I added a reset into tx_timeout, and that function should no

Re: d-link dfe-530 tx (bug-report)

2001-02-04 Thread Manfred Spraul
Urban Widmark wrote: > > The "transmit timed out" message is simply saying that we told the card to > send something but it hasn't generated an interrupt or anything allowing > the driver to know the packet was actually sent. > check via_rhine_tx_timeout(): the function is basically empty. > >

Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups

2001-02-03 Thread Manfred Spraul
Manfred Spraul wrote: > > But I think we can change the bug description: > > If an io apic io redirection entry is unmasked while the irq pin is > active, then the io apic sends out the interrupt as edge triggered, but > nevertheless sets the IRR bit. > I found anoth

Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups

2001-02-02 Thread Manfred Spraul
eously delivered as edge-triggered but the + * IRR bit gets set nevertheless. * As a result the I/O unit expects an EOI message but it will never * arrive and further interrupts are blocked for the source. * @@ -126,12 +126,8 @@ * a level-triggered interrupt and to revert the mode when unma

Re: [ANNOUNCE] Kernel Janitor's TODO list

2001-01-31 Thread Manfred Spraul
Alan Cox wrote: > > > And one more point for the Janitor's list: > > Get rid of superflous irqsave()/irqrestore()'s - in 90% of the cases > > either spin_lock_irq() or spin_lock() is sufficient. That's both faster > > and better readable. > > Expect me to drop any submissions that do this. I'd r

Re: Request: increase in PCI bus limit

2001-01-31 Thread Manfred Spraul
> >I'm working at a customer site with custom hardware. The 2.4.0 series > kernel almost works out of the box, but the machine has 52 PCI busses. > Plans are to produce a 4-way box which would have over 80 PCI busses. The > file include/asm-i386/mpspec.h allocates space for 32 busses in th

Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups

2001-01-29 Thread Manfred Spraul
"Maciej W. Rozycki" wrote: > > I'll implement an 82489DX update in a few days, but for now I'd like > everyone interested to test the following patch as much as possible. It > applies to 2.4.0, 2.4.0-ac12 and 2.4.1-pre11 cleanly. > I'm not totally convinced that this fixes all problems: No loc

Re: flush_scheduled_tasks() question

2001-01-29 Thread Manfred Spraul
David Woodhouse wrote: > > -static struct tq_struct dummy_task; > +static struct tq_struct dummy_task /* = all zero */; > That comment is superflous - that's just C. The non-obvious part is +static struct tq_struct dummy_task; /* remains zero, run_task_queue() supports tqs.routine==NULL*/ BUT:

Re: [ANNOUNCE] Kernel Janitor's TODO list

2001-01-28 Thread Manfred Spraul
David Woodhouse wrote: > > TIOCMIWAIT does restore_flags() before interruptible_sleep_on(). It's > broken too. > Yes, and I found a second bug: it doesn't sti() immediately after interruptible_sleep_on(), thus cli() doesn't reacquire the global irq lock --> the atomic copy won't be atomic on SMP.

Re: [ANNOUNCE] Kernel Janitor's TODO list

2001-01-28 Thread Manfred Spraul
Arnaldo Carvalho de Melo wrote: > > Em Sun, Jan 28, 2001 at 05:14:37PM +0100, Manfred Spraul escreveu: > > > > > > Anything which uses sleep_on() has a 90% chance of being broken. Fix > > > them all, because we want to remove sleep_on() and friends in 2.5. > &

flush_scheduled_tasks() question

2001-01-28 Thread Manfred Spraul
Is is intentional that tummy_task is not initialized? Ok, it won't crash because the current __run_task_queue() implementation doesn't call tq->routine if it's NULL, but IMHO it's ugly. Additionally I don't like the loop in flush_scheduled_tasks(), what about replacing it with a locked semaphore

Re: [ANNOUNCE] Kernel Janitor's TODO list

2001-01-28 Thread Manfred Spraul
> > Anything which uses sleep_on() has a 90% chance of being broken. Fix > them all, because we want to remove sleep_on() and friends in 2.5. > Then you can add 'calling schedule() with disabled local interrupts()' to your list. -- Manfred - To unsubscribe from this list: send the lin

Re: Linux Post codes during runtime, possibly OT

2001-01-26 Thread Manfred Spraul
> + * > + * Changed the slow-down I/O port from 0x80 to 0x19. 0x19 is a > + * DMA controller scratch register. [EMAIL PROTECTED] >*/ > What about making that a config option? default: delay with 'outb 0x80', other options could be udelay(n); (n=1,2,3) outb 0x19 0x80 is

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-24 Thread Manfred Spraul
> Yes, Linux is __very__ not right doing this. RFC requires to accept > ACK, URG and RST on any segment adjacent to window, even if window > is zero. > Interesting: I checked the RFC 793 and came to the conclusion that Linux is correct. ("special allowance should be made to accept valid ACKs" not

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-23 Thread Manfred Spraul
I checked RFC793, and AFAICS Solaris is the culprit: it sends out invalid packets, Linux ignores them and thus Linux doesn't receive acks. Which Solaris version do you use? * The last valid ack from the Solaris computer is for byte 1583721, win 8760 (line 2078) * No packet after line 2078 from

Re: Linux 2.2.16 through 2.2.18preX TCP hang bug triggered by rsync

2001-01-23 Thread Manfred Spraul
I read through the tcpdump, and it seems that Linux completely ignores packets with out-of-window sequence numbers: * the solaris computers (dynamic...) sends further data although the Linux box (static) says 'win 0'. See lines 2067, 2069, 2076, ... 2066 16:31:43.108759 eth0 > static.8664 > dyn

Re: [PATCH] Re: Q: natsemi.c spinlocks

2001-01-22 Thread Manfred Spraul
Donald Becker wrote: > > > > However, natsemi.c's spinlock needs to be retained, and > > > extended into start_tx(), because this driver has > > > a race which has cropped up in a few others: > > > ... > > > if (np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN - 1) { > > > /* WIN

Re: [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+

2001-01-21 Thread Manfred Spraul
I've attached Holger's testcase (ext2, SMP, raid5) boot with "mem=64M" and run the attached script. The script creates and deletes 9 directories with 10.000 in each dir. Neil, could you run it? I don't have an raid 5 array - SMP+ext2 without raid5 is ok. Holger, what's your ext2 block size, and d

Re: Inefficient PCI DMA usage (was: [experimental patch] UHCI updates)

2001-01-21 Thread Manfred Spraul
Russell King wrote: > > Manfred Spraul writes: > > Not yet, but that would be a 2 line patch (currently it's hardcoded to > > BYTES_PER_WORD align or L1_CACHE_BYTES, depending on the HWCACHE_ALIGN > > flag). > > I don't think there's a problem then.

Re: Inefficient PCI DMA usage (was: [experimental patch] UHCI updates)

2001-01-21 Thread Manfred Spraul
Russell King wrote: > > Johannes Erdfelt writes: > > They need to be visible via DMA. They need to be 16 byte aligned. We > > also have QH's which have similar requirements, but we don't use as many > > of them. > > Can we get away from the "16 byte aligned" and make it "n byte aligned"? > I bel

Re: Inefficient PCI DMA usage (was: [experimental patch] UHCI updates)

2001-01-20 Thread Manfred Spraul
> > TD's are around 32 bytes big (actually, they may be 48 or even 64 now, I > haven't checked recently). That's a waste of space for an entire page. > > However, having every driver implement it's own slab cache seems a > complete waste of time when we already have the code to do so in > mm

[2.4.1-pre8] MPP related OPPS

2001-01-19 Thread Manfred Spraul
[Paul Mackerras and linux-ppp added to the cc list] It seems that the MPPP reconstruction queue got corrupted: ppp_mp_reconstruct() called kfree_skb(), and within kfree_skb() the call to skb->destructor() crashed: skb->destructor was 0x01010101. > > > I reported this a few months ago without

Re: More filesystem corruption under 2.4.1-pre8 and SW Raid5

2001-01-19 Thread Manfred Spraul
Holger Kiehl wrote: > > I'm running your test with 48 MB ram, 12500 files, 9 processes in a 156 > > MB partition (swapoff, here is the test partition ;-). > > With 192MB Ram I don't see the corruption. > > > I am not sure if I understand you correctly: with 48MB you do get > corruption and with 1

More filesystem corruption under 2.4.1-pre8 and SW Raid5

2001-01-19 Thread Manfred Spraul
> Another thing I notice is that the responsiveness of the machine > decreases dramatically as the test progresses until it is nearly > useless. After the test is done everything is back to normal. > The same behavior was observed under 2.2.18. That's expected: ext2 performs linear searches throu

2.4.0-ac9: bug in drivers/message/fusion/mptctl.c

2001-01-17 Thread Manfred Spraul
AFAICS mptctl_lock() and mptctl_unlock() are just buggy implementations of down() and up(). At least the 'current->state = TASK_UNINTERRUPTIBLE' must be moved into the while(1) loop, or both function could be replaced with a semaphore. -- Manfred - To unsubscribe from this list: send the

Re: Oops in rtl8139, and more

2001-01-15 Thread Manfred Spraul
The problem is clear: rtl8139_resume() unconditionally restarts the hardware, even if the network was not yet started. The hardware immediately notices something, and sends an interrupt. The oops happens during rtl8139_open(): the function calls request_irq(), but assumes that the interrupts are

Re: Question regarding driver developement

2001-01-14 Thread Manfred Spraul
> The only way I have found so far is to write have two FIFO buffers in the > driver (in and out) and use a daemon running in user space to manage the > disk access. Have you thought about using mmap and raw-io? * the kernel driver allocates a fifo (probably a ring?) buffer. The driver implemen

Re: Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...)

2001-01-13 Thread Manfred Spraul
It seems that noone uses a Ne2000 compatible pci NIC with a newer motherboard (every K7 board, Intel 8xx boards, via apollo pro 133), but I've set up a tiny web site that describes my problem: colorfullife.com/~manfred/io_apic -- Manfred - To unsubscribe from this list: send the

Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...)

2001-01-13 Thread Manfred Spraul
Russell King wrote: > > Doesn't the NCR53C9x SCSI drivers use disable_irq() a lot? Do they have > any problems? > It seems that a certain timing is necessary: one flood ping or a single ncp usually doesn't trigger any problems, but 2 concurrent flood pings hang the network after 5-10 seconds. I

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Manfred Spraul
Frank de Lange wrote: > > It could be that people using those cards are not the ones who tend > to go for the (somewhat tricky) BP6 board... > I doubt that it's BP6 specific: I have the problem with a Gigabyte BXD board and I doubt that Ingo used an BP6. Perhaps 82093AA specific (the IO APIC ch

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Manfred Spraul
Linus Torvalds wrote: > > It may well not be disable_irq() that is buggy. In fact, there's good > reason to believe that it's a hardware problem. > Perhaps a problem with the 82093AA external IO APIC used for 440BX board? I haven't seen any reports from newer Intel boards (the ICH2 includes an I

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Manfred Spraul
Alan Cox wrote: > > > Could you disable both bandaids? I disabled them, no problems so far. > > Now back to the disable_irq_nosync(). > > Ok so it looks like the disable_irq code is buggy. Unfortunately its not > just used for these drivers they are just the heaviest users. > > Given that we ca

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Manfred Spraul
Frank de Lange wrote: > > On Fri, Jan 12, 2001 at 09:54:31PM +0100, Manfred Spraul wrote: > > I have found one combination that doesn't hang with the unpatched > > 8390.c, but network throughput is down to 1/2. I hope that's due to the > > debugging changes.

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Manfred Spraul
Ingo Molnar wrote: > > > okay - i just wanted to hear a definitive word from you that this fixes > your problem, because this is what we'll have to do as a final solution. > (barring any other solution.) > Ingo, is that possible? The current fix is "disable_irq_nosync() and enable_irq() cause

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Manfred Spraul
Linus Torvalds wrote: > > > I'd like to know _which_ of the two makes a difference (or does it only > trigger with both of them enabled)? And even then I'm not sure that it is > "the" solution - both changes to io-apic handling had some reason for > them. Ingo, what was the focus-cpu thing? >

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Manfred Spraul
Frank de Lange wrote: > > On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote: > > I removed the disable_irq lines from 8390.c, and that fixed the problem: > > no hang within 2 minutes - the test is still running. > > > > Frank, could you double check it?

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Manfred Spraul
Linus wrote: > Does this seem to happen mainly with drivers that use "disable_irq()" > and "enable_irq()"? I know the ne drivers do (through the 8390 module), > and some others do too (3c59x). I removed the disable_irq lines from 8390.c, and that fixed the problem: no hang within 2 minutes - t

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Manfred Spraul
Ingo Molnar wrote: > > we *already* reorder vector numbers and spread them out as much as > possible. We do this in 2.2 as well. We did this almost from day 1 of > IO-APIC support. If any manually allocated IRQ vector creates a '3 vectors > in the same 16-vector region' situation then thats a bug

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Manfred Spraul
Alan Cox wrote: > > > Frank, could you try what happens with the NMI oopser disabled? > > > > The second major difference I'm immediately aware of is the number of > > the reschedule/tlb flush/etc interrupt: 2.2 uses the lowest priority, > > 2.4 the highest priority. > > Im trying to remember wh

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Manfred Spraul
Frank de Lange wrote: > > On Fri, Jan 12, 2001 at 06:16:36PM +0100, Manfred Spraul wrote: > > I would first concentrate on the differences between 2.2 and 2.4: > > > > Frank, could you try what happens with the NMI oopser disabled? > > Here's the result

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Manfred Spraul
> > [EMAIL PROTECTED] said: > > IRR for interrupt 19 is set, that means the IO APIC has sent the > > interrupt to a cpu but not yet received the corresponding EOI. > > OK, but couldn't we reset it by sending an extra EOI when the drivers > decide that they've missed interrupts? How? You se

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Manfred Spraul
Let's decode it: > IO APIC #2.. > NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: > 12 0FF 0F 0 1 0 1 0 1 1 91 > 13 0FF 0F 0 1 1 1 0 1 1 99 IRR for interrupt 19 is set, that means the IO APIC has sent the interrupt to a cpu but not yet received the corresponding EOI. That bit is read

Apology for duplicates (was Re: Compatibility issue with 2.2.19pre7 (fwd))

2001-01-11 Thread Manfred Spraul
d 6 receivers, and one of them was unreachable for 50 minutes. It seems the sendmail resend the message to all receivers, although the first 5 were successful. One retry every 10 minutes --> 6 duplicates Sorry, Manfred Spraul - To unsubscribe from this list: send the line "unsubsc

Re: Compatibility issue with 2.2.19pre7

2001-01-11 Thread Manfred Spraul
Trond Myklebust wrote: > > > As for the issue of casting 'fh->data' as a 'struct knfsd' then that > is a perfectly valid operation. > No it isn't. fh->data is an array of characters, thus without any alignment restrictions. 'struct knfsd' begins with a pointer, thus it must be 4 or 8 byte align

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-10 Thread Manfred Spraul
Frank de Lange wrote: > > Hi'all, > > Ever since I put two ethernet-cards (cheap Winbond W89C940 based PCI NE2K > clones) in my BP-6 system, I've been experiencing intermittent network hangs. > Which driver do you use? The driver in 2.4.0 contains several bugfixes. If that driver still hangs th

Re: Porting network driver to 2.4.0

2001-01-10 Thread Manfred Spraul
Andi Kleen wrote: > > On Wed, Jan 10, 2001 at 03:40:50PM -0500, Jonathan Earle wrote: > > Where do I go from here? Is there info somewhere to help with this? Is > > this a bigger job than it looks on the surface? > > Try http://www.firstfloor.org/~andi/softnet > I would ask someone from znxz

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-10 Thread Manfred Spraul
Ingo Molnar wrote: > > On Wed, 10 Jan 2001, Manfred Spraul wrote: > > > That means sendmsg() changes the page tables? I measures > > smp_call_function on my Dual Pentium 350, and it took around 1950 cpu > > ticks. > > well, this is a performance problem if

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-10 Thread Manfred Spraul
> > In user space, how do you know when its safe to reuse the buffer that > > was handed to sendmsg() with the MSG_NOCOPY flag? Or does sendmsg() > > with that flag block until the buffer isn't needed by the kernel any > > more? If it does block, doesn't that defeat the use of non-blocking > >

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-09 Thread Manfred Spraul
sct wrote: > We've already got measurements showing how insane this is. Raw IO > requests, plus internal pagebuf contiguous requests from XFS, have to > get broken down into page-sized chunks by the current ll_rw_block() > API, only to get reassembled by the make_request code. It's > *enormous

[OT] Re: WaitForSingleObject in linux????..

2001-01-08 Thread Manfred Spraul
I would try to: * implement foo_poll() in the kernel driver. * the user space app calls select() or poll(). WaitForSingleObject should be easy to replace. WaitForMultipleObjects could be tricky if you wait for different events (e.g. wait until either the kernel driver has new data, or another p

RE: usb-uhci forgets to destroy kmem entries

2000-09-15 Thread Manfred Spraul
> > +#ifdef DEBUG_SLAB > + if (retval < 0 ) { > + if(kmem_cache_destroy(uhci_desc_kmem)) Why only #ifdef DEBUG_SLAB? AFAICS the driver should always destroy it's slab cache. Please cc, I'm not subscribed to linux-kernel. -- Manfred - To unsubscribe from this list: send

<    2   3   4   5   6   7