Omnibus reply here to some half-dozen of the replies; I'll try to get all the attributions right.
First, a big thank you to everyone who replied. I was not expecting this many, nor this helpful, responses to something about NetBSD as old as 5.2! [me - all the double-quoted text here is mine] >> db{0}> tr >> breakpoint() at netbsd:breakpoint+0x5 >> comintr() at netbsd:comintr+0x53a >> Xintr_ioapic_edge7() at netbsd:Xintr_ioapic_edge7+0xeb >> --- interrupt --- >> x86_pause() at netbsd:x86_pause >> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x16 >> Xintr_ioapic_level5() at netbsd:Xintr_ioapic_level5+0xf3 >> --- interrupt --- >> x86_pause() at netbsd:x86_pause+0x2 >> cdev_poll() at netbsd:cdev_poll+0x6d [Taylor R Campbell] > This stack trace suggests that you're in the middle of a busy-wait > loop somewhere inside your d_poll routine I didn't mention that because I knew it was false (and neglected to say anything about it - my fault). Why am I sure? Because the driver in question uses nopoll as its poll routine. :) > It may be helpful to build with all debug options enabled. DIAGNOSTIC and DEBUG are both on already. None of the LOCK* options are, though - see Paul Goyette's response, below. >> On reflection, I think I know why. Userland's syscall handler took >> the mutex in preparation for cv_wait_sig(), the interrupt happens, >> my code is called (verified with a printf), and it tries to take the >> same mutex so it can cv_broadcast(). > cv_wait_sig releases the lock while it waits. I'm positing an interrupt that strikes after taking the mutex and before enterint cv_wait_sig. (See also your other message, below.) >> I can of course provide more information if it would help, but I'm >> not sure what would be useful to mention here. > Can you provide the code you're using? Hard to guess otherwise. Heh. Fair enough. First, a brief sketch of my intent.... What I was/am trying to do was to take the lpt driver and give it a mode in which it uses the four output and five input bits in the control and status registers for low-data-rate parallel digital input and output. The design is that, on output, userland write()s a buffer of even length and the kernel shoves alternate bytes out the data and control ports, with no particular timing constraints (in the anticipated use case there will be only one pair of bytes per write, and writes are only occasional); for input, I have a callout that, once per hz, checks to see if the status bits have changed, and, if so, saves them in a buffer for read(). (The input data rate is low - in the anticipated application, those input bits are driven off mechanical switches operated by human actions - and only changes matter.) I actually may change this to provide, somehow, different interfaces for the control bits and the data bits; I'm not sure the "alternate bytes" paradigm is all that good a one for my anticipated use. Rather than replace the lpt driver entirely, I had it grow another flag bit in the device minor number: 0x100 indicates this `raw' mode. As for the code.... ftp.rodents-montreal.org:/mouse/misc/lpt/ holds the code. base/ has the code I started with (which is stock 5.2, for these files); new/ has my current version. lptreg.h is identical; lptvar.h and lpt.c have changes. diffs is output from "diff -u -r base new". (All these are relative to sys/dev/ic/.) >> So I wrote some code using a condvar and a mutex, and the system >> promptly deadlocked. [...] [Joerg Sonnenberger] > Did you set the IPL for your mutex correctly? I don't know. I tried to... > Adaptive mutexes must not be shared with interrupts. ...I used IPL_VM, because splvm() is the same as spllpt(), and because the manpage says that results in a spin mutex, which I thought was what I needed. Is there some way to tell what IPL the hardware interrupt for the device comes in on? [Taylor R Campbell again, different email] > Note that if you use a mutex in a thread _and_ an interrupt handler, > you must initialize the mutex with the ipl at which the interrupt > handler runs. That way, while you hold the lock, it will block the > interrupt handler too, which avoids the scenario you described. Oh! That's very important; it was not clear to me from the manpage that mutex_enter() implies spl*(). Yes, then, as you say, my scenario is impossible if the mutex is correctly set up. [Brian Buhrow] > 1. [...]. Mutexes that use spin locks can't be used in interrupt > context. Sure you don't have that backwards? I _think_ mutex(9) says that spin mutexes are the only ones that _can_ be used from an interrupt. > 2. Initialize your convar with cv_init(). Done. (I think.) > 4. If you run into lock contention when debugging your code, pay > careful attention to who holds the lock at the time of the panic. Or, in my case, hang? I don't know who holds it; I'm not sure how to find out. I should probably go peek under the hood. [me] >> [...my scenario outline...] [Brad Spencer] > I recently worked with this for a driver I have written to provide > entropy to the kernel random number generator subsystem from quantum > event sources. [...] It sounds as though you're doing something similar enough that the locking issues should be identical - and, looking at your code, it looks as though you're doing basically the same thing I am. This seems to me to indicate that my problem most likely is with how I'm initializing something - based on what Taylor R Campbell said, quite likely the mutex. [Paul Goyette] > You might well find LOCKDEBUG to be your friend here! Is there some list of such options? sys/arch/amd64/conf/GENERIC has only two lines containing "LOCK", neither of which looks relevant here: # options INTEL_ONDEMAND_CLOCKMOD #options IPFILTER_DEFAULT_BLOCK # block all packets by default /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B