On Mon, 25 Apr 2016 02:31:25 +0100 Simon Slavin <slavins at bigfraud.org> wrote:
> > These are different concerns, and they don't really pose any > > difficulty. Given an encoding, a column of N characters can take > > up to x * N bytes. Back in the day, "x" was 1. Now it's something > > else. No big deal. > > No. Unicode uses different numbers of bytes to store different > characters. You cannot tell from the number of bytes in a string how > many characters it encodes, and the programming required to work out > the string length is complicated. "up to", I said. You're right that you can't know the byte-offset for a letter in a UTF-8 string. What I'm saying is that given an encoding and a string, you *do* know the maximum number of bytes required. >From the DBMS's point of view, a string of known size and encoding can be managed with a fixed length buffer. > I would definitely be reading the documentation for the SQL engine I > was using. Well, yeah. :-) It's well to know how the software you're using works, whether it's the DBMS or something else. Although I have to say I've never had to worry about the size of my database as a function of string size. When size matters, rows dominate, and large numbers of rows never seem to come with big strings. --jkl