Most of the tweets here are spams: http://twitturk.com/tweet/search?q=lol
On Sat, Nov 27, 2010 at 3:33 PM, Adam Green <140...@gmail.com> wrote: > All of your sample spam tweets are from suspended accounts, yet the > tweets were only sent yesterday. That means that the spammers behavior > was so aggressive that they were suspended quickly by a Twitter > algorithm. I doubt that a human at Twitter read your email and went > through each tweet suspending the accounts. Have you checked to see > how quickly these spam accounts get canceled for other spam tweets? > You could hold back tweets from unknown users for 24 hours, and then > check all new users through the API to see if they are suspended. If > they aren't suspended, you can whitelist them in your system. > > What is really weird is that I also checked the URLs in these tweets > and they resolve to an empty page. They return a header with an HTTP > code of 200, and no content at all. That can't be an accident. Either > they are sending empty responses to everyone, or they could tell from > my IP that they didn't want to send anything to me. Why would a > spammer do that? They only benefit if someone clicks on their links > and buys something, or gets infected somehow. Could you be the subject > of some kind of attack? You use the word "community." Would anyone > want to disrupt your community? Is this a community that is in one > geographic area that can be detected by IP? Very interesting... > > Anyway, you can use URL resolution to test new users. When you get a > tweet from a new user with a URL, check the URL, and blacklist them if > it resolves to an empty page. If you only have to do this for new > users, it won't be too processor intensive. > > > On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru <furkank...@gmail.com> wrote: > > The text in these spam tweets are not easy to recognize. > > They do not repeat. They are mixed of different words and they contain a > > link. > > They seem to be sent via web. > > > > The ranking and discarding some mentions will not completely resolve the > > problem. > > Because our mention data and trending words data both were affected. We > > donot want to eliminate tweets from innocent people who have few > followers. > > > > The simplest way seems to be just ignoring the tweets coming from outside > of > > the community. > > But those tweets were helping us to extend our network. > > > > > > > > On Fri, Nov 26, 2010 at 6:42 PM, Adam Green <140...@gmail.com> wrote: > >> > >> As long as you aren't trying to capture and deliver *all* tweets, > >> there are a couple of good ways to cut out spammers. One thing I do is > >> save all mentions for all users in a database of tweets. When a tweet > >> comes in from the streaming API, I collect @mentions, and store them > >> with the screen name of the tweet's author and the screen name > >> mentioned. Then I can rank users based on the number of different > >> accounts that mention them. If you only use the tweets from the top N% > >> of users, the quality improves a lot. I find that the top 80% is > >> usually enough of a screen to get good quality. > >> > >> Another trick is blocking duplicates from each user. The API only > >> blocks duplicates that repeat immediately, but if a spammer has a list > >> of tweets, and cycles through them, all the tweets get through. I > >> compare all new tweets with the other tweets from that user. This is > >> very expensive if you have a big database. This can be made less > >> intensive by limiting the comparison to just the tweets from that user > >> in the last few days. You can also run this with a separate process > >> that doesn't slow down you main tweet parsing loop. Most spammers are > >> so simplistic that they just repeat the same tweet over and over. In a > >> real spammy set of keywords, if I find more than a few duplicates from > >> a user, I just stop saving their tweets. > >> > >> > >> On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru <furkank...@gmail.com> > >> wrote: > >> > > >> > Word "lol" is the most common in these spam tweets. We receive 400 > spam > >> > tweets per hour now tracking 100K people. > >> > > >> > We plan to delete all of the tweets containing "lol" word. It is also > >> > used > >> > by our users (Turkish people) writing in English though. > >> > > >> > Any better suggestions? > >> > > >> > >> -- > >> Adam Green > >> Twitter API Consultant and Trainer > >> http://140dev.com > >> @140dev > >> > >> -- > >> Twitter developer documentation and resources: > http://dev.twitter.com/doc > >> API updates via Twitter: http://twitter.com/twitterapi > >> Issues/Enhancements Tracker: > >> http://code.google.com/p/twitter-api/issues/list > >> Change your membership to this group: > >> http://groups.google.com/group/twitter-development-talk > > > > > > > > -- > > Furkan Kuru > > > > -- > > Twitter developer documentation and resources: > http://dev.twitter.com/doc > > API updates via Twitter: http://twitter.com/twitterapi > > Issues/Enhancements Tracker: > > http://code.google.com/p/twitter-api/issues/list > > Change your membership to this group: > > http://groups.google.com/group/twitter-development-talk > > > > > > -- > Adam Green > Twitter API Consultant and Trainer > http://140dev.com > @140dev > > -- > Twitter developer documentation and resources: http://dev.twitter.com/doc > API updates via Twitter: http://twitter.com/twitterapi > Issues/Enhancements Tracker: > http://code.google.com/p/twitter-api/issues/list > Change your membership to this group: > http://groups.google.com/group/twitter-development-talk > -- Furkan Kuru -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk