I've been spending LOTS of time to investigate various devicess sources, to understand some questions I've had, like:
- Why NetBSD/arm has no bus_space_mmap(4)? - Why tty locking is messy? - Why sys/dev/wscons has so many #ifdef's? (Modular unfriendly!) - How dk(4) is enumerated? : After absorbed myself 3 days now, I think I've figured out almost all of problems I've had and how I can fix these. Before going directly to the answer, let me summarize problems I've found: * a) Device enumeration is unstable / unpredictable dk(4) is a pseudo device, and its instances are numbered in the order it's created. This is fine when you manually / explicitly add wedges(4) by using "dkctl addwedge". This is not fine, if I have a gpt(4) disk label which has ordered partitions. I expect disks to be created in the order I write in the gpt(4) disk label. It's annoying the numbering changes when I add a new disk. Same for raidframe(4). b) Consistent device topology management is missing The reason why NetBSD/arm has no bus_space_mmap(9) has turned out to be the fact that we have no consistent (MI) way to manage physical address space of devices. NetBSD/mips has a working bus_space_mmap(9) in sys/arch/mips/mips/bus_space_alignstride_chipdep.c. It defines address windows and manage it by itself. Who wants to reimplement it on all cpus/ports/platforms? Considering physical address space is a pretty much simple concept - a single linear address space. And we already manage (kind of) tree of devices in autoconf(9). Do we want to manage such a topology in many places? No. c) Control / data flow is unclear I've never remembered what wscons command/device to configure wscons to add screen, load font, change encoding. It's a total mess. I don't know how the ioctl I send via wscons command is delivered to device. Same for data. Even by looking at sys/dev/wscons. Why it it so complicated? Our tty locking code has so many hacks. See grep XXX sys/kern/tty*. And we have to fix all serial devices. How should serial devices deal with tty lock? How ioctl works? How its callback is called and when? How to avoid deadlock? This is almost hopeless. Same for network devices's ioctl handling. d) Abstraction of combined/aggregated device is inconsistent We have some *special* devices that combine/aggregate multiple devices and make it look like a single device. For example wsmux(4), ccd(4), raidframe(4), lvm(4), bridge(4), agr(4), ... Now these do almost random way to manage its components, and its behaviour is hard to guess. You have to learn how to add/delete components to some combining device, its limitation, etc. The enumeration of these is also hard to predict. e) Random way of abstraction We have many non-real devices used to abstract real devices. For example audio, tty, wsdisplay, network interfaces, wedges, scsipi, com and friends, usb, pseudo devices, ... We have to learn how to use them and their behavior respectively. Developers have to decide how your device is represented to user. If you write a serial device, you have to implement all the syscall nobs, buffer management, tty interaction. You'll surely end up having a big modified copy of com.c, which is almost impossible to maintain. * I want to fix all of these. Goals: - Intuitivity Behavior should be simple enough for users to guess without looking into code. - Predictability / stability Device numbers don't change surprisingly. When you plug device A and B in slot 1 and 2, they should be shown in that order. When you add disk B @ slot 2, the number of disk A @ slot 1 must not change. - Simplicity, clarity, consistency Common code is concentrated in single place. Each driver implements only its hardware accessors. No scattered ioctl handlings. * A possible solution I'm thinking of is: 1) Introduce devfs 2) Natural device numbering 3) "Functional" device instances abstraction 4) "Real" and "pseudo" device trees * 1) Introducing devfs devfs is a pseudo filesystem which shows device topology in a mount point. There's (unfinished) branch mjf-devfs. devfs helps to identify devices uniquely. wd0 on my DELL OptiPlex 745 looks like: /dev/mainbus/pci0/ppb0/piixide0/atabus0/wd0 2) Natural device numbering Device number in devfs is enumerated locally in the attachment. Numbers are *naturally* assigned; should match physical bus/slot numbers so that users can make sure which is which. This is important especially for block devices. Think when you plug a USB floppy and newfs it. I *believe* *all* *real* devices can be represented by this scheme. 3) "Functional" device instances abstraction This is thw way audio(4) and video(4) are doing. These are non-real devices, but "functional" in that it provides a predefined function. Topology like: /dev/mainbus0/.../pci0/azalia0/audio0 is intuitive by viewing this as - azalia(4) implements a function audio(4) - audio(4) is an "abstracted" function represented to users This also helps users to understand how its internal works. Users basically accesses the device via audio(4) interface. audio(4) is responsible to interact with real hardware drivers. Both control and data are transfered via audio(4). It's also easily guessed if user forcibly control real device (azalia(4)), audio(4) would be *surprised*, and some inconsistency will be expected to happen. wscons(4)/tty(4) could be abstracted like: /dev/mainbus0/.../pci0/vga0/display0/screen0/vt100emul0/termios0/tty0 This might look redundant, but each device instance's *responsibility* is very clear. tty(4) is *the* device you interact when you use it as a tty. Pretty much straightforward. When you send a tty ioctl, it goes to tty(4), which may be delivered to upper layers. To add a new screen, it's obvious that the device we should ask to is display(4). We can guess how control/data is delivered. We can also guess forcibly deleting a screen causes its child devices problems, because topology is visible. wscons(4) without emulation would look like: /dev/mainbus0/.../pci0/vga0/display0/screen1 We don't need a detailed manual page how screen0 / screen1 are interfaced, because it's obvious. Other possible examples: /dev/mainbus0/.../wd0/disk0/diskpart0 /dev/mainbus0/.../fxp0/ether0/net0 /dev/mainbus0/.../com0/serial0/termios0/tty0 4) "Real" and "pseudo" device trees Real devices stem from the mainbus0, and one of the real bus root there, like /mainbus0/pci0 or /mainbus0/obio0 or /mainbus0/acpi0. Non-real, "functional" devices describe above stem from one of leaf "real" devices, like vga0 or azalia0. These can co-exist in one tree because functional devices don't break tree. Pseudo devices don't have parent, because its creation is arbitrary. It's created when you want. The device number doesn't make sense much. You don't usually need to bother what pty(4) device you're using. There're cases where devcie numbers of pseudo devices matter. Disks and aggregated devices. You don't want raidframe(4)'s partitions to change after reboot. Same for bridge/tap configuration used for Xen after you added a new NIC onto your machine. This should be addressed by representing pseudo deivce tree, like: /dev/pseudobus0/raid0/disk0/diskpart0 /component0 /component1 : Where component devices are symlinks to the real instances /dev/pseudobus0/raid0/component0 -> /dev/mainbus/pci0/ppb0/piixide0/atabus0/wd1/disk0/diskpart0 and the reverse /dev/mainbus/pci0/ppb0/piixide0/atabus0/wd1/disk0/diskpart0/raid0 -> /dev/pseudobus0/raid0 So that you can uniquely ideintify the pseudo device by looking up from the real device: /dev/mainbus/pci0/ppb0/piixide0/atabus0/wd1/disk0/diskpart0/raid0/disk0/diskpart0 Other exapmples: /dev/pseudobus0/bridged0/net0 /dev/pseudobus0/ppp0/pppldisc0/tty0 * The last one might need more thoughts, but the point is most things can be represented with 2 trees. I don't think we need netgraph(4) if we once get device including network interfaces topology visible and make their hooks *a little* more flexible. Tree is the best structure, because everyone is familar with it. Masao P.S. I've read 0 line of devfs code yet. -- Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635