[twitter-dev] Re: How to count to 140: or, characters vs. bytes vs. entities, third strike

Eric Martin Fri, 15 May 2009 14:08:17 -0700

I'd be interested to see a document that details the standards for
this as well.


On May 15, 12:01 pm, leoboiko <leobo...@gmail.com> wrote:
> > On May 15, 2:03 pm, leoboiko <leobo...@gmail.com> wrote:
> > while one with 71 UTF-8
> > bytes might not (if they’re all non-GSM, say, ‘ç’ repeated 71 times).
>
> Sorry, that was a bad example: 71 ‘ç’s take up 142 bytes in UTF-8, not
> 71.
>
> Consider instead 71 ‘^’ (or ‘\’, ‘[’ &c.).  These take one byte in
> UTF-8, but their shortest encoding in SMS is two-byte (in GSM).  So
> the 71-byte UTF-8 string would take more than 140 bytes as SMS and not
> fit an SMS.
>
> Why that matters? Consider a twitter update like this:
>
>     @d00d: in the console, type "cat ~/file.sql | tr [:upper:]
> [:lower:] | less".  then you cand read the sql commands without the
> annoying caps
>
> That looks like a perfectly reasonable 140-character UTF-8 string, so
> Twitter won't truncate it or warn about sending a short version.  But
> its SMS encoding would take some 147 bytes, so the last words would be
> truncated.
>
> --
> Leonardo Boikohttp://namakajiri.net

[twitter-dev] Re: How to count to 140: or, characters vs. bytes vs. entities, third strike

Reply via email to