It's been nearly 6 months. Has this question been answered? If so I missed it.
On Tue, Mar 24, 2009 at 9:36 PM, Alex Payne<a...@twitter.com> wrote: > > Unfortunately, nothing definitive. We're still looking into this. > > On Tue, Mar 24, 2009 at 07:56, Craig Hockenberry > <craig.hockenbe...@gmail.com> wrote: >> >> Any news from the Service Team? I'd really like to get the counters >> right in an upcoming release... >> >> -ch >> >> On Mar 6, 12:18 pm, Alex Payne <a...@twitter.com> wrote: >>> I'm taking this email to our Service Team, the folks who work on the >>> back-end of the service. The whole "message body changing as it moves >>> from cache to backing store" thing is totally unacceptable. Answers >>> soon. >>> >>> On Fri, Mar 6, 2009 at 09:43, Craig Hockenberry >>> >>> >>> >>> <craig.hockenbe...@gmail.com> wrote: >>> >>> > Some discussion about this thread popped up on Twitter yesterday: >>> >>> > <http://groups.google.com/group/twitter-development-talk/browse_thread/ >>> > thread/44be91d5ec5850fa> >>> >>> > Alex states that it's 140 bytes per tweet. So, of course, Loren >>> > Brichter and I tried to prove that. With the following results: >>> >>> > 1) 140 characters that including ones that include HTML entities: >>> > <http://twitter.com/gnitset/status/1286202252> >>> >>> > At the time of posting, this tweet showed up on the site and in feeds >>> > with all 140 characters. After a few hours, the "<" was converted to >>> > "<", increasing the count per character from one to four bytes and >>> > decreasing the tweet length from 140 characters to 69. (You can see >>> > this truncation at the end of the tweet: the "&" is from "<") >>> >>> > Presumably, this happens as tweets in the memcache are written though >>> > to the backing store. >>> >>> > I also see a lot of Twitter clients that don't realize how special the >>> > < and > entities are. It took me a LONG time to figure out what >>> > was going on here. >>> >>> > 2) 140 Unicode _multi-byte_ characters: <http://twitter.com/atebits/ >>> > status/1286199010> >>> >>> > What's curious is that Loren's example with 140 characters uses the >>> > Unicode 27A1 glyph. It uses 3 bytes in UTF-8. Why didn't it get >>> > truncated? This seems to contradict Alex's statement in the thread >>> > mentioned above. >>> >>> > As people start to use things like Emoji, tinyarro.ws and generally >>> > figure out that Unicode (UTF-8) is a valid type of data on Twitter, >>> > our clients should adapt and display more accurate "characters >>> > remaining" counts. I can count bytes instead of characters, but I'm >>> > not sure if I should or not. >>> >>> > No one likes a truncated tweet: we need an explicit statement on how >>> > to count and submit multi-byte characters and entities. >>> >>> > -ch >>> >>> -- >>> Alex Payne - API Lead, Twitter, Inc.http://twitter.com/al3x >> > > > > -- > Alex Payne - API Lead, Twitter, Inc. > http://twitter.com/al3x >