[sqlite] Article about pointer abuse in SQLite

James K. Lowden Sat, 19 Mar 2016 01:11:55 -0400

On Fri, 18 Mar 2016 16:33:56 -0600
Scott Robison <scott at casaderobison.com> wrote:


> I'd rather have code that might use some "undefined behavior" and
> generates the right answer than code that always conformed to defined
> behavior yet was logically flawed. 

Code that falls under undefined behavior *is* logically flawed, by
definition.  Whether or not it works, it's not specified to.  The
compiler may have generated perfectly correct machine code, but another
compiler or some future version of your present compiler may not.  

You might share my beef with the compiler writers, though: lots things
that are left undefined shouldn't be.  Because hardware architecture
varies, some practices that do work and have worked and are expected to
work on a wide variety of machines are UB.  A recent thread on using
void* for a function pointer is an example: dlsym(2) returns a function
pointer defined as void*, but the C standard says void* can only refer
to data, not functions!  

Machines exist for which the size of a function pointer is not 
sizeof(void*).  Source code that assumes they are the same size is not
portable to those architectures.  Fine.  But a particular compiler
generates code for a particular architecture.  On x86 hardware, all
pointers have always been and will always be the same size.  All
Linux/Posix code relies on that, too, along with a host of other
assumptions. If that ever changed, a boat load of code would have to be
changed.  Why does the compiler writer feel it's in his interest or
mine to warn me about that not-happening eventuality?  For the machine
I'm compilng for, the code is *not* in error.  For some future machine,
maybe it will be; let's leave that until then.  

I was looking at John Regehr's blog the other day.  I think it was
there that I learned that the practice of dropping UB code on the floor
has been going on longer than I'd realized; it's just that gcc has been
more aggressive in recent years.  I think it was there I saw this
construction:

        if( p < p + n)
                error

where p is a pointer.  On lots of architectures, for large n, p + n can
be negative.  The test works.  Or did.  The C standard says that's
UB, though. It doesn't promise the pointer will go negative.  It doesn't
promise it won't.  It doesn't promise not to tell your mother about
it.  And, in one recent version, it doesn't compile it.  Warning?  No.
Error? No.  Machine code?  No!  It's UB, so no code is generated (ergo,
no error handling)!  Even though the hardware instructions that would
be -- that used to be -- generated work as implied by the code.    

Postel's Law is to be liberal in what you accept and conservative in
what you emit.  The compilers have been practicing the opposite,
thwarting common longstanding practice just because they "can".  

Dan Bernstein is calling for a new C compiler that is 100%
deterministic: no UB.  All UB per the standard would be defined by the
compiler.  And maybe a few goodies, like zero-initialized automatic
(stack) variables.  

Such a compiler would enjoy great popularity, even if it imposed, say,
a 5% performance penalty, because C programmers would have greater
confidence in their code working as expected. They'd have some
assurance that the compiler wouldn't cut them off at the knees in its
next release.  As he says, there's not real choice between fast and
correct  If the "always defined befavior" compiler got off the ground,
may it would finally drive gcc & friends in the direction of working
with their users for a change.  Or make them irrelevant.  

--jkl

[sqlite] Article about pointer abuse in SQLite

Reply via email to