> 2) 140 Unicode _multi-byte_ characters: <http://twitter.com/atebits/
> status/1286199010>
> 
> What's curious is that Loren's example with 140 characters uses the
> Unicode 27A1 glyph. It uses 3 bytes in UTF-8. Why didn't it get
> truncated? This seems to contradict Alex's statement in the thread
> mentioned above.
> 
> As people start to use things like Emoji, tinyarro.ws and generally
> figure out that Unicode (UTF-8) is a valid type of data on Twitter,
> our clients should adapt and display more accurate "characters
> remaining" counts. I can count bytes instead of characters, but I'm
> not sure if I should or not.

FWIW, I had a number of users complain about truncated UTF-8 sequences a
while back, which may be a symptom of this same problem. TTYtter now counts
bytes explicitly, and this seems to have dealt with the issue.

-- 
------------------------------------ personal: http://www.cameronkaiser.com/ --
  Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckai...@floodgap.com
-- Make welfare as hard to get as building permits. ---------------------------

Reply via email to