The Streaming API and the Search indexer both tee off the same point
in the new status event pipeline. New statuses are born in the web
containers and queued for a cluster of processes that begin the
offline processing pipeline. This first process does many things,
including routing statuses to various subsystems (timelines, SMS,
various backing stories, etc. etc.). This process also determines if
the updating user is public or protected and if they are "filtered
from search". If the user is both public and unfiltered, the status is
enqueued to both Search and Streaming.

See also: 
http://apiwiki.twitter.com/Streaming-API-Documentation#ResultQualitynbsp

The Streaming API then syndicates all these statuses. The Search
system may or may not sort, filter, and otherwise rank statuses for
relevance based on various heuristics, including, but not limited to:
phase of the moon, state of tides, the DJIA, etc.

Roughly:

Complete corpus search: Streaming
Low-latency results: Streaming
Accurate keyword counts: Streaming (tally both statuses and limit
messages)
Complex queries: Search
Historical queries: Search

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.



On Nov 3, 11:02 am, Jeffrey Greenberg <jeffreygreenb...@gmail.com>
wrote:
> It would help if John Kalucki (hello) would clarify the difference
> between what is visible via streaming as opposed to what is visible
> via search.
>
> I've been operating under the assumption that streaming is warranted
> when an app needs a different or more powerful search than the current
> one (e.g. nested boolean expressions), or is interested in seeing
> tweets before they are filtered out by twitter's spam detection
> (dealing with the tweet removal protocol, etc).)  As a developer it
> would help us if you could paint out what the twitter data pipeline
> looks like, and where the various apis plug in, so that we know what
> we get when we plug in there.  I assume, for instance, that search is
> farther downstream than the various firehose/stream apis, but I've
> little idea (or documentation) on what steps the data is as it moves
> down the pipe....
>
> Would Twitter be open to shedding some light?.
> jeffrey greenberg
> tweettronics.com
>
> On Nov 3, 9:59 am, Fabien Penso <fabienpe...@gmail.com> wrote:
>
> > I agree, however it would help a lot because instead of doing :
>
> > for keyword in all_keywords
> >  if tweet.match(keyword)
> >   //matched, notify users
> >  end
> > end
>
> > we could do
>
> > for keyword in keywords_matched
> >  // same as above
> > end
>
> > for matching 5,000 keywords, it would bring the first loop from 5,000
> > to probably 1 or 2.
> > You know what you matched, so it's quiet easy for you just to include
> > row data of matched keywords, I don't need anything fancy. Just space
> > separated keywords would help _so much_.
>
> > On Tue, Nov 3, 2009 at 3:15 PM, John Kalucki <jkalu...@gmail.com> wrote:
>
> > > The assumption is that client services will, in any case, have to
> > > parse and route statuses to potentially multiple end-users. Providing
> > > this sort of hint wouldn't eliminate the need to parse the status and
> > > would likely result in duplicate effort. We're aware that we are, in
> > > some use cases, externalizing development effort, but the uses cases
> > > for the Streaming API are so many, that it's hard to define exactly
> > > how much this feature would help and therefore how much we're
> > > externalizing.
>
> > > -John Kalucki
> > >http://twitter.com/jkalucki
> > > Services, Twitter Inc.
>
> > > On Nov 3, 1:53 am, Fabien Penso <fabienpe...@gmail.com> wrote:
> > >> Hi.
>
> > >> Would it be possible to include the matched keywords in another field
> > >> within the result from the streaming/keyword API?
>
> > >> It would prevent matching those myself when matching for multiple
> > >> internal users, to spread the tweets to the legitimate users, which
> > >> can be time consuming and tough to do on lots of users/keywords.
>
> > >> Thanks.

Reply via email to