Re: [twitter-dev] Trying to get rid of twitter spammers

Furkan Kuru Sat, 27 Nov 2010 05:57:25 -0800

Most of the tweets here are spams:

http://twitturk.com/tweet/search?q=lol




On Sat, Nov 27, 2010 at 3:33 PM, Adam Green <140...@gmail.com> wrote:

> All of your sample spam tweets are from suspended accounts, yet the
> tweets were only sent yesterday. That means that the spammers behavior
> was so aggressive that they were suspended quickly by a Twitter
> algorithm. I doubt that a human at Twitter read your email and went
> through each tweet suspending the accounts. Have you checked to see
> how quickly these spam accounts get canceled for other spam tweets?
> You could hold back tweets from unknown users for 24 hours, and then
> check all new users through the API to see if they are suspended. If
> they aren't suspended, you can whitelist them in your system.
>
> What is really weird is that I also checked the URLs in these tweets
> and they resolve to an empty page. They return a header with an HTTP
> code of 200, and no content at all. That can't be an accident. Either
> they are sending empty responses to everyone, or they could tell from
> my IP that they didn't want to send anything to me. Why would a
> spammer do that? They only benefit if someone clicks on their links
> and buys something, or gets infected somehow. Could you be the subject
> of some kind of attack? You use the word "community." Would anyone
> want to disrupt your community? Is this a community that is in one
> geographic area that can be detected by IP? Very interesting...
>
> Anyway, you can use URL resolution to test new users. When you get a
> tweet from a new user with a URL, check the URL, and blacklist them if
> it resolves to an empty page. If you only have to do this for new
> users, it won't be too processor intensive.
>
>
> On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru <furkank...@gmail.com> wrote:
> > The text in these spam tweets are not easy to recognize.
> > They do not repeat. They are mixed of different words and they contain a
> > link.
> > They seem to be sent via web.
> >
> > The ranking and discarding some mentions will not completely resolve the
> > problem.
> > Because our mention data and trending words data both were affected. We
> > donot want to eliminate tweets from innocent people who have few
> followers.
> >
> > The simplest way seems to be just ignoring the tweets coming from outside
> of
> > the community.
> > But those tweets were helping us to extend our network.
> >
> >
> >
> > On Fri, Nov 26, 2010 at 6:42 PM, Adam Green <140...@gmail.com> wrote:
> >>
> >> As long as you aren't trying to capture and deliver *all* tweets,
> >> there are a couple of good ways to cut out spammers. One thing I do is
> >> save all mentions for all users in a database of tweets. When a tweet
> >> comes in from the streaming API, I collect @mentions, and store them
> >> with the screen name of the tweet's author and the screen name
> >> mentioned. Then I can rank users based on the number of different
> >> accounts that mention them. If you only use the tweets from the top N%
> >> of users, the quality improves a lot. I find that the top 80% is
> >> usually enough of a screen to get good quality.
> >>
> >> Another trick is blocking duplicates from each user. The API only
> >> blocks duplicates that repeat immediately, but if a spammer has a list
> >> of tweets, and cycles through them, all the tweets get through. I
> >> compare all new tweets with the other tweets from that user. This is
> >> very expensive if you have a big database. This can be made less
> >> intensive by limiting the comparison to just the tweets from that user
> >> in the last few days. You can also run this with a separate process
> >> that doesn't slow down you main tweet parsing loop. Most spammers are
> >> so simplistic that they just repeat the same tweet over and over. In a
> >> real spammy set of keywords, if I find more than a few duplicates from
> >> a user, I just stop saving their tweets.
> >>
> >>
> >> On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru <furkank...@gmail.com>
> >> wrote:
> >> >
> >> > Word "lol" is the most common in these spam tweets. We receive 400
> spam
> >> > tweets per hour now tracking 100K people.
> >> >
> >> > We plan to delete all of the tweets containing "lol" word. It is also
> >> > used
> >> > by our users (Turkish people) writing in English though.
> >> >
> >> > Any better suggestions?
> >> >
> >>
> >> --
> >> Adam Green
> >> Twitter API Consultant and Trainer
> >> http://140dev.com
> >> @140dev
> >>
> >> --
> >> Twitter developer documentation and resources:
> http://dev.twitter.com/doc
> >> API updates via Twitter: http://twitter.com/twitterapi
> >> Issues/Enhancements Tracker:
> >> http://code.google.com/p/twitter-api/issues/list
> >> Change your membership to this group:
> >> http://groups.google.com/group/twitter-development-talk
> >
> >
> >
> > --
> > Furkan Kuru
> >
> > --
> > Twitter developer documentation and resources:
> http://dev.twitter.com/doc
> > API updates via Twitter: http://twitter.com/twitterapi
> > Issues/Enhancements Tracker:
> > http://code.google.com/p/twitter-api/issues/list
> > Change your membership to this group:
> > http://groups.google.com/group/twitter-development-talk
> >
>
>
>
> --
> Adam Green
> Twitter API Consultant and Trainer
> http://140dev.com
> @140dev
>
> --
> Twitter developer documentation and resources: http://dev.twitter.com/doc
> API updates via Twitter: http://twitter.com/twitterapi
> Issues/Enhancements Tracker:
> http://code.google.com/p/twitter-api/issues/list
> Change your membership to this group:
> http://groups.google.com/group/twitter-development-talk
>



-- 
Furkan Kuru

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Re: [twitter-dev] Trying to get rid of twitter spammers

Reply via email to