> On May 15, 2:03 pm, leoboiko <leobo...@gmail.com> wrote: > while one with 71 UTF-8 > bytes might not (if they’re all non-GSM, say, ‘ç’ repeated 71 times).
Sorry, that was a bad example: 71 ‘ç’s take up 142 bytes in UTF-8, not 71. Consider instead 71 ‘^’ (or ‘\’, ‘[’ &c.). These take one byte in UTF-8, but their shortest encoding in SMS is two-byte (in GSM). So the 71-byte UTF-8 string would take more than 140 bytes as SMS and not fit an SMS. Why that matters? Consider a twitter update like this: @d00d: in the console, type "cat ~/file.sql | tr [:upper:] [:lower:] | less". then you cand read the sql commands without the annoying caps That looks like a perfectly reasonable 140-character UTF-8 string, so Twitter won't truncate it or warn about sending a short version. But its SMS encoding would take some 147 bytes, so the last words would be truncated. -- Leonardo Boiko http://namakajiri.net