I have been wondering too.  If its a character, it should be a
character, weather it's an 'A', 'À' or '的'

On Mar 24, 9:36 pm, Alex Payne <[email protected]> wrote:
> Unfortunately, nothing definitive. We're still looking into this.
>
> On Tue, Mar 24, 2009 at 07:56, Craig Hockenberry
>
>
>
> <[email protected]> wrote:
>
> > Any news from the Service Team? I'd really like to get the counters
> > right in an upcoming release...
>
> > -ch
>
> > On Mar 6, 12:18 pm, Alex Payne <[email protected]> wrote:
> >> I'm taking this email to our Service Team, the folks who work on the
> >> back-end of the service. The whole "message body changing as it moves
> >> from cache to backing store" thing is totally unacceptable. Answers
> >> soon.
>
> >> On Fri, Mar 6, 2009 at 09:43, Craig Hockenberry
>
> >> <[email protected]> wrote:
>
> >> > Some discussion about this thread popped up on Twitter yesterday:
>
> >> > <http://groups.google.com/group/twitter-development-talk/browse_thread/
> >> > thread/44be91d5ec5850fa>
>
> >> > Alex states that it's 140 bytes per tweet. So, of course, Loren
> >> > Brichter and I tried to prove that. With the following results:
>
> >> > 1) 140 characters that including ones that include HTML entities:
> >> > <http://twitter.com/gnitset/status/1286202252>
>
> >> > At the time of posting, this tweet showed up on the site and in feeds
> >> > with all 140 characters. After a few hours, the "<" was converted to
> >> > "&lt;", increasing the count per character from one to four bytes and
> >> > decreasing the tweet length from 140 characters to 69. (You can see
> >> > this truncation at the end of the tweet: the "&" is from "&lt;")
>
> >> > Presumably, this happens as tweets in the memcache are written though
> >> > to the backing store.
>
> >> > I also see a lot of Twitter clients that don't realize how special the
> >> > &lt; and &gt; entities are. It took me a LONG time to figure out what
> >> > was going on here.
>
> >> > 2) 140 Unicode _multi-byte_ characters: <http://twitter.com/atebits/
> >> > status/1286199010>
>
> >> > What's curious is that Loren's example with 140 characters uses the
> >> > Unicode 27A1 glyph. It uses 3 bytes in UTF-8. Why didn't it get
> >> > truncated? This seems to contradict Alex's statement in the thread
> >> > mentioned above.
>
> >> > As people start to use things like Emoji, tinyarro.ws and generally
> >> > figure out that Unicode (UTF-8) is a valid type of data on Twitter,
> >> > our clients should adapt and display more accurate "characters
> >> > remaining" counts. I can count bytes instead of characters, but I'm
> >> > not sure if I should or not.
>
> >> > No one likes a truncated tweet: we need an explicit statement on how
> >> > to count and submit multi-byte characters and entities.
>
> >> > -ch
>
> >> --
> >> Alex Payne - API Lead, Twitter, Inc.http://twitter.com/al3x
>
> --
> Alex Payne - API Lead, Twitter, Inc.http://twitter.com/al3x

Reply via email to