In this case, this isn't the Streaming API. That encoding is almost
certainly what was presented to Twitter, probably exactly as encoded
by the client. In this case, I'd complain to: 
http://mobileways.de/products/gravity/gravity/

If you request the Tweet via the REST API, you'll see the same data
and the same encoding error.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.


On Dec 1, 7:22 pm, braver <delivera...@gmail.com> wrote:
> Gardenhose apparently returns illegal Unicode, as confirmed by
> PostgreSQL and Perl's Encode, a very trusted, high-mileage code.  We
> surely can trap illegal Unicode errors but need to know whether you're
> aware of it, the rationale, and plan of action, if any. -- Alexy
>
> On Nov 21, 5:10 pm, braver <delivera...@gmail.com> wrote:
>
> > I've tried loading the gardenhose via Perl's JSON, and it fails on
> > quite a few Asian ones with \uffff in them, e.g. the tweet if
> > 5277460813:
>
> > {"text":"RT @RealLamarOdom \uffffIf you haven't heard it, go 
> > towww.richsoilclothing.comandlook under \"updates\". Tell me what you
> > think. It's hot!",...}
>
> > Is it the artifact of downloading, or Twitter serves illegal UTF8?
> > Here's an example of what Perl says about it, for another tweet:
>
> > *** json ENCODING error: malformed or illegal unicode character in
> > string [ Artest l], cannot convert to JSON at /home/alexyk/twitter/
> > loader/jwilter.pl line 30, <> line 44817003.
>
> >  {"in_reply_to_screen_name":null,"text":"RT @TheLakersNation
> > \uffffArtest looked great. Lamar dominated the boards. Kobe is Kobe.
> > And most importantly, the Lakers take the WIN!","source":"<a href=
> > \"http://mobileways.de/gravity\"; rel=\"nofollow\">Gravity</
> > a>","in_reply_to_user_id":null,"in_reply_to_status_id":null,"truncated":fal 
> > se,"geo":null,"created_at":"Mon
> > Nov 02 05:55:49 +0000 2009","user":
> > {"profile_background_tile":false,"profile_sidebar_border_color":"BDDCAD","f 
> > ollowing":null,"statuses_count":
> > 243,"followers_count":33,"profile_image_url":"http://a3.twimg.com/
> > profile_images/406146987/Real_Force_normal.jpg","friends_count":
> > 93,"description":"My Love:Kobe Bryant,Los Angeles
> > Lakers,NBA,Twitter,Music,Movie.I Love This Game.Determination:Let's
> > again!","location":"CN","geo_enabled":false,"profile_background_color":"9AE 
> > 4E8","screen_name":"Real_Force","favourites_count":
> > 4,"verified":false,"notifications":null,"profile_text_color":"333333","time 
> > _zone":"Beijing","protected":false,"url":"http://
> > hi.baidu.com/real_force/","created_at":"Wed Sep 09 12:41:22 +0000
> > 2009","profile_link_color":"0084B4","name":"Zhang
> > Yuhao","profile_background_image_url":"http://a1.twimg.com/
> > profile_background_images/36003404/
> > photo_manipulation_photo_art_the_mansion.jpg","id":
> > 72842359,"utc_offset":
> > 28800,"profile_sidebar_fill_color":"DDFFCC"},"favorited":false,"id":
> > 5357163705}
>
> > PostgreSQL shows similar annoyance on its text field in UTF8.  Pls
> > clarify what do you do to unicode here!
> > Cheers,
> > Alexy

Reply via email to