> I wrote the following rant some time ago and posted it somewhere > I'll throw it in here for some more fuel....
> NO MORE "undefined behaviour"!!! Pick something sane and stick to it! > The problem with modern "Standard" C is that instead of refining > the definition of the abstract machine to match the most common > and/or logical behaviour of existing implementations, the standards > committee chose to throw the baby out with the bath water and make > whole swaths of conditions into so-called "undefined behaviour" > conditions. Unfortunately for your argument, they did this because there are "existing implementations" that disagree severely over the points in question. A spec that mandated such things as the "pointers are really just memory addresses" model you sketch below would, at best, simply get ignored by implementors on machines that don't match it. Perhaps that's what you'd want. Personally, I prefer the actual choice. > An excellent example are the data-flow optimizations that are now > commonly abused to elide security/safety-sensitive code: > int > foo(struct bar *p) > { > char *lp = p->s; > > if (p == NULL || lp == NULL) { > return -1; > } This code is, and always has been, broken; it is accessing p->s before it knows that p isn't nil. If you're really unlucky you'll be on a machine where there are device registers at address 0 and you'll poke a device register with that read. If you're less lucky you'll be on MS-DOS or a PDP-11 or some such and silently and harmlessly get a meaningless value for lp. If you're lucky you'll get a segfault or moral equivalent. Anyone who thinks this sort of sloppiness is appropriate in security/safety-sensitive code please stay far, far away from anything that might run on my machines. Yes, an optimizer _might_ defer the fetch of lp, but it also might not, for any of many reasons; relying on its doing so is extremely brittle, most definitely not appropriate for anything security/safety-sensitive. That said, I do agree that simply dropping the p==NULL check but preserving the fetch of lp is, if anything, even more broken; it is gross abuse of the latitude permitted by the undefined-behaviour rules. But that is a quality-of-implementation issue. > Worse yet this example stems from actual Linux kernel code [...] Good gods. I'm gladder than ever I don't run Linux. > [...], yet again any programmer worth their salt knows that the > address of an field in a struct is simply the sum of the struct's > base address and the offset of the field, [...] That's what a mediocre C programmer thinks. A good one knows there is a difference between the abstract machine and the implementation and realizes that, while that is a common implementation, it is far from the only possible one, and it is inappropriate to rely on it being an accurate description (except in code not intended to be portable, like a kernel's pmap layer). > Worst of all consider this example: > size_t o = offsetof(p, s); > And then consider an extremely common example of "offsetof()" [...] Such an implementation of offsetof() is nonportable, exactly because it assumes things like your sketch based on the "pointers are just memory addresses" model. Providing it in application code constitutes nonportable code, just as much as assuming shorts are 18 bits does. (What, you mean you're not on a 36-bit machine? What sort of weird hardware are you using?) An implementation may provide it, yes, if - IF! - it knows the associated compiler handles that code such that offsetof() returns the correct result. But what would you expect it to do in, say, Zeta-C? Or do you think Zeta-C should not exist? > or possibly (for those who know that pointers are not always "just" > integers): > #define offsetof(type, member) ((size_t)(unsigned long)((&((type > *)0)->member) - (type *)0)) That has never worked in C since, oh, I dunno, V7? and probably never will; it tries to subtract pointers that point to different types. What I think of as the usual implementation along those lines would be something like ((size_t)((char *)&((type *)0)->member - (char *)0)) (note the lack of an intermediate cast to unsigned long; size_t may be wider than unsigned long, though admittedly it's unlikely offsetof() will need to return a value greater than the largest unsigned long). /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B