How about a competition to develop spam-detection algorithms :)
Pascal
On Feb 24, 2011, at 10:38 PM, Dewald Pretorius wrote:
> Apart from implementing reCAPTCHA on tweet submission, follow, and
> unfollow, I can't see what Twitter can do to prevent that kind of
> abuse (can you imagine the revol
I'm no OAuth expert, but did you make sure your system time is properly
synchronized with a regional NTP server?
Pascal
On Feb 12, 2011, at 3:13 AM, Winson wrote:
> Hi there.
>
> Using WP 3.0.5and WP to Twitter 2.2.6 on a CentOS server, which I
> don't manage at all. I've got the error "OAuth
Hello noname,
the search api is rate limited and only allows a non-disclosed amount of
queries per hour. You will need to look into the streaming api: consume the
sample stream and extract geodata. This also gives you tweets from all over the
world.
Have a look at
http://dev.twitter.com/pages
Just curious:
the limit is on sending, not receiving. Why exactly would one want to send more
than 250 tweets for one incident? Wouldn't that many messages overwhelm any
helpful agency and actually have a detrimental effect?
Pascal
On Jul 29, 2010, at 2:06 AM, Bess wrote:
> There is no way t
http://countdowntooauth.com/
On Jul 29, 2010, at 1:22 AM, chinaski007 wrote:
>
> Any word on if this is still planned?
>
> Any further extensions?
>
> Or is the drop-dead deadline still August 16th?
Ben,
did you account for UTC time?
http://apiwiki.twitter.com/Return-Values
Pascal
On 26.Jul2010, at 18:21, Ben Juneau wrote:
> The dates are incorrect on my website... http://www.bjuneau.com
>
> I'm not sure what I'm doing wrong here? Any help is much appreciated.
Yes. You can't trust anything on twitter. Hope for good, valid results, prepare
for anything else.
Pascal
On 23.Jul2010, at 15:03, luisg wrote:
> This means that the count property is not something that you can
> trust, right?
>
> Luis
Hi Luis,
yes, that's what I mean. You can either get the second page, or just request
some more, as in:
> http://api.twitter.com/1/statuses/home_timeline.xml?count=25
Pascal
On 23.Jul2010, at 11:40, luisg wrote:
> Hi Pascal,
>
> Thanks for your reply.
> What you mean with cursors?
>
> I ha
Hi Luis,
I might be wrong there, but I think this is the way it works because of
twitter's caching and distribution architecture. You can never assume to get
the full amount of tweets or users - some might be filtered, deleted or
whatnot. If you need more, just get the next page/set using curso
Stale caches.
Pascal
On 22.Jul2010, at 23:38, soung3 wrote:
> It's not only suspended users, but also users that are no longer
> found. Why does Twitter return ids of users that no longer exist?
Hi Terrence,
if you use the tracking stream, you will get limit notices telling you how many
tweets you missed (because the stream didn't provide the volume). This should
be a way to determine absolute N. Notice that this still means that spam tweets
might be filtered.
Pascal
On 22.Jul2010, a
Patrick,
given this explanation, you will need a lot of streams.
60k users with an average of 100 friends (low estimate)
Let's guess that every user shares 50% of the friends with others
This gives us: 60.000 * 100 *.5 = 3 million
In order to track 3 million users with the follow stream, you ne
Hello Rostand,
I did my master's thesis using twitter data. My recommendation:
- You can not will not and should not get DMs. They are *private*. Even if you
do a closed study with 300 consenting people, it's unethical. If you're in the
US, the ethics committee of your university will have you
Thomas,
last time I heard from the project, they were busy sorting the technical
details out and still not sure who would even get access. It'll probably be
open to a selected group of researchers first.
Pascal
On Jul 18, 2010, at 8:16 PM, Tomas Roggero wrote:
> Hi Pascal
>
> What I'm doing
Tom,
at least you know that the library of congress has a backup :)
Pascal
On Jul 18, 2010, at 7:07 , Tom Roggero wrote:
> I've tried your script on Mac, it only works for first 3 pages that's
> weird (i'm running darwin ports for xml functions)...
> Anyway, tried manually do it through firefox
In addition to the note from Taylor, I think it's a good idea to remind people
that stream contents are identical - it's absolutely no use and a waste of
resources to consume more than one sample stream. Just pick the largest one -
that will contain all messages you can get.
Pascal
On Jul 16,
Tomo,
John replied on another thread just minutes after you:
> I hoped we'd have an email out on Thursday about this, but I'd imagine it'll
> go out on Friday. There isn't a problem with your client.
Pascal
On Jul 16, 2010, at 6:14 , Tomo Osumi wrote:
> Dear John,
>
> Could I have any update
# Idle musing
Inflation adjustment?
# end
Pascal
On Jul 15, 2010, at 17:14 , John Kalucki wrote:
> This is a known issue. We'll have an email about the Gardenhose and Spritzer
> later today.
>
> -John Kalucki
> http://twitter.com/jkalucki
> Infrastructure, Twitter Inc.
Nicholas,
Did you just publish your account credentials?
Pascal
On Jul 15, 2010, at 14:02 , Nicholas Kingsley wrote:
> INC send$, "Authorization: Basic FishyMcFlipFlop:burpmachine\r\n"
Hi,
those are probably accounts which twitter filtered out. The docs are pretty
clear on this and give practical advice:
http://dev.twitter.com/pages/streaming_api_concepts
> Both the Streaming API and the Search API filter statuses created by a small
> proportion of accounts based upon status
Michael,
you can find out how to check here:
http://help.twitter.com/entries/15790-how-to-contest-account-suspension
Pascal
On Jul 12, 2010, at 10:32 , microcosmic wrote:
> Hello Pascal.
>
> It's not the case that our account is disabled. Or is there a "hidden"
> message that is saying "accou
Hello Michael,
just an idea:
try to log into the twitter website with your account and see whether it was
disabled for spam.
Pascal
On Jul 11, 2010, at 17:34 , microcosmic wrote:
> Hello there.
>
> Since Friday I am not able to send tweets to e.g. #Studentenjob or
> #Nebenjob anymore. I found
Larry,
moreover, I assume you checked I/O and CPU load. But even if that's not the
issue, you should absolutely check if you have simplejson with c extension
installed. The python included version is 1.9 which is decidedly slower than
the new 2.x branch. You might see json decoding load drop by
Larry,
have you decoupled the processing code from tweepy's StreamListener, for
example using a Queue.Queue oder some message queue server?
Pascal
On Jul 8, 2010, at 17:31 , Larry Zhang wrote:
> Hi everyone,
>
> I have a program calling the statuses/sample method of a garden hose
> of the Str
Shan,
as far as I know twitter has been reluctant to state definite numbers, so
you'll have to experiment and implement a backoff mechanism in your app. Here
is the relevant part of the docs:
> Search API Rate Limiting
> The Search API is rate limited by IP address. The number of search request
Just wanted to add,
it's a sad thing etags see hardly any use today. Back when the graph methods
weren't paginated, you could just send a request with the etag header set and
it would come back not modified, a very efficient thing to do. It won't give
you the difference between arbitrary points
Just a sidenote: This can be coincidental. Unless you try several dozen times
with each client, no valid inference can be drawn from the tests.
Pascal
On Jul 6, 2010, at 18:46 , Johnson wrote:
> I notice that the rate limit is application specific. I've tried a few
> clients, some of them goes t
With "multi-level loosely-coordinated best-effort distributed cache" you
certainly got the naming, all that's left is the cache invalidation. :)
Pascal
On Jul 6, 2010, at 18:10 , John Kalucki wrote:
> These lockouts are almost certainly due to a performance optimization
> intended to reduce ne
Quoting John Kalucki:
> We haven't announced our plans for streaming and oAuth, beyond stating that
> User Streams will only be on oAuth.
Right now, basic auth and oAuth both work on streaming, and that won't change
when basic for REST turns off. Since there's no set shutdown date yet for
bas
Interesting. Your method is similar to the breadth-first crawl that many people
do (for example, see the academic paper by Kwak et al. 2010).
You have to keep in mind, however, that you are only crawling the giant
component of the network, the connected part. If there are any turkish users
who
Google Translate is easy, but *very* inaccurate. I tested it on a set of 30.000
tweets, and more than 60% were unreliably classified (google will tell you the
confidence of the classification inline).
Don't rely on that for language detection unless you pretty much don't care!
On Jul 4, 2010, a
John,
yes, thanks a lot for the design proposal - that is what inspired my own
system. I am not primarily filtering by language, however, but by country, so
I'm using time zone and location data together with a list of cities from
http://www.geonames.org/
The manual cross-check in my thesis sh
Hi Lucas,
as someone who approached a similar problem, my recommendation would be to
track users. In order to get results quickly (rather than every few hours via
user timeline calls), you need streaming access, which is a bit more
complicated. I implemented such a system in order to track the
http://status.twitter.com/post/750140886/site-tweaks
On Jul 1, 2010, at 9:49 , PiPS wrote:
> Hi.
>
> I am developing on twitter client.
>
> My client uses xAuth.
>
> But.. My Client API is 175
>
>
> That was before 350.
>
> Why was suddenly reduced by half?
>
Good to know. Did you mean to say "consume … streaming results"? I don't really
see where you use the stream here.
Also, please note that it's not a good idea to work with "since_id" and
"max_id" any more, because those will soon be (already are?) NON-SEQUENTIAL.
This means you will lose tweets
As stated in the API WIKI, the number of search results you can get at any
given point in time for one search term is indeed ~1500.
(http://apiwiki.twitter.com/Twitter-Search-API-Method:-search)
There are several ways to go beyond that.
a) Do perpetual searches (say, one every day), and merge th
Thanks for the notification.
What will this mean for etag checks?
I currently fetch a large number of graphs in regular intervals. Any
check that returns a 304 should incur little cost.
Will I need to crawl all the pages and check for their 304?
If I get a 304 on the first one, can I assume that
37 matches
Mail list logo