In this case, this isn't the Streaming API. That encoding is almost certainly what was presented to Twitter, probably exactly as encoded by the client. In this case, I'd complain to: http://mobileways.de/products/gravity/gravity/
If you request the Tweet via the REST API, you'll see the same data and the same encoding error. -John Kalucki http://twitter.com/jkalucki Services, Twitter Inc. On Dec 1, 7:22 pm, braver <delivera...@gmail.com> wrote: > Gardenhose apparently returns illegal Unicode, as confirmed by > PostgreSQL and Perl's Encode, a very trusted, high-mileage code. We > surely can trap illegal Unicode errors but need to know whether you're > aware of it, the rationale, and plan of action, if any. -- Alexy > > On Nov 21, 5:10 pm, braver <delivera...@gmail.com> wrote: > > > I've tried loading the gardenhose via Perl's JSON, and it fails on > > quite a few Asian ones with \uffff in them, e.g. the tweet if > > 5277460813: > > > {"text":"RT @RealLamarOdom \uffffIf you haven't heard it, go > > towww.richsoilclothing.comandlook under \"updates\". Tell me what you > > think. It's hot!",...} > > > Is it the artifact of downloading, or Twitter serves illegal UTF8? > > Here's an example of what Perl says about it, for another tweet: > > > *** json ENCODING error: malformed or illegal unicode character in > > string [ Artest l], cannot convert to JSON at /home/alexyk/twitter/ > > loader/jwilter.pl line 30, <> line 44817003. > > > {"in_reply_to_screen_name":null,"text":"RT @TheLakersNation > > \uffffArtest looked great. Lamar dominated the boards. Kobe is Kobe. > > And most importantly, the Lakers take the WIN!","source":"<a href= > > \"http://mobileways.de/gravity\" rel=\"nofollow\">Gravity</ > > a>","in_reply_to_user_id":null,"in_reply_to_status_id":null,"truncated":fal > > se,"geo":null,"created_at":"Mon > > Nov 02 05:55:49 +0000 2009","user": > > {"profile_background_tile":false,"profile_sidebar_border_color":"BDDCAD","f > > ollowing":null,"statuses_count": > > 243,"followers_count":33,"profile_image_url":"http://a3.twimg.com/ > > profile_images/406146987/Real_Force_normal.jpg","friends_count": > > 93,"description":"My Love:Kobe Bryant,Los Angeles > > Lakers,NBA,Twitter,Music,Movie.I Love This Game.Determination:Let's > > again!","location":"CN","geo_enabled":false,"profile_background_color":"9AE > > 4E8","screen_name":"Real_Force","favourites_count": > > 4,"verified":false,"notifications":null,"profile_text_color":"333333","time > > _zone":"Beijing","protected":false,"url":"http:// > > hi.baidu.com/real_force/","created_at":"Wed Sep 09 12:41:22 +0000 > > 2009","profile_link_color":"0084B4","name":"Zhang > > Yuhao","profile_background_image_url":"http://a1.twimg.com/ > > profile_background_images/36003404/ > > photo_manipulation_photo_art_the_mansion.jpg","id": > > 72842359,"utc_offset": > > 28800,"profile_sidebar_fill_color":"DDFFCC"},"favorited":false,"id": > > 5357163705} > > > PostgreSQL shows similar annoyance on its text field in UTF8. Pls > > clarify what do you do to unicode here! > > Cheers, > > Alexy