I'd be interested to see a document that details the standards for this as well.
On May 15, 12:01 pm, leoboiko <leobo...@gmail.com> wrote: > > On May 15, 2:03 pm, leoboiko <leobo...@gmail.com> wrote: > > while one with 71 UTF-8 > > bytes might not (if they’re all non-GSM, say, ‘ç’ repeated 71 times). > > Sorry, that was a bad example: 71 ‘ç’s take up 142 bytes in UTF-8, not > 71. > > Consider instead 71 ‘^’ (or ‘\’, ‘[’ &c.). These take one byte in > UTF-8, but their shortest encoding in SMS is two-byte (in GSM). So > the 71-byte UTF-8 string would take more than 140 bytes as SMS and not > fit an SMS. > > Why that matters? Consider a twitter update like this: > > @d00d: in the console, type "cat ~/file.sql | tr [:upper:] > [:lower:] | less". then you cand read the sql commands without the > annoying caps > > That looks like a perfectly reasonable 140-character UTF-8 string, so > Twitter won't truncate it or warn about sending a short version. But > its SMS encoding would take some 147 bytes, so the last words would be > truncated. > > -- > Leonardo Boikohttp://namakajiri.net