> 2) 140 Unicode _multi-byte_ characters: <http://twitter.com/atebits/ > status/1286199010> > > What's curious is that Loren's example with 140 characters uses the > Unicode 27A1 glyph. It uses 3 bytes in UTF-8. Why didn't it get > truncated? This seems to contradict Alex's statement in the thread > mentioned above. > > As people start to use things like Emoji, tinyarro.ws and generally > figure out that Unicode (UTF-8) is a valid type of data on Twitter, > our clients should adapt and display more accurate "characters > remaining" counts. I can count bytes instead of characters, but I'm > not sure if I should or not.
FWIW, I had a number of users complain about truncated UTF-8 sequences a while back, which may be a symptom of this same problem. TTYtter now counts bytes explicitly, and this seems to have dealt with the issue. -- ------------------------------------ personal: http://www.cameronkaiser.com/ -- Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckai...@floodgap.com -- Make welfare as hard to get as building permits. ---------------------------