John Cowan <jcowan at reutershealth dot com> wrote:

> Most languages other than C define a string as a sequence of
> characters rather than a sequence of non-null characters.  The
> repertoire of characters than can exist in strings usually has a lower
> bound, but its full magnitude is implementation-specific.  In Java,
> exceptionally, the repertoire is defined by the standard rather than
> the implementation, and it includes U+0000.  In any case, I can think
> of no language other than C which does not support strings containing
> U+0000 in most implementations.

In Pascal, which I learned before C, strings were implemented as a count
of characters followed by the characters themselves.  Unfortunately, the
count was a single byte, and the resulting maximum string length of 255
was a much greater inconvenience in real life than C's prohibition
against a string containing 0x00.  I don't know if modern Pascal
implementations are the same way.

A 32-bit length count, followed by an array of N arbitrary Unicode
characters, would probably be the best implementation today.

I'd still like to know what practical, real-world TEXT-related benefits
would derive from allowing U+0000 in strings of TEXT in a C program.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/



Reply via email to