There are people here at Twitter who know this stuff inside and out. I just
haven't, yet, roped them in for a fix. Once we have a fix in hand, we'll
publish recommendations for everyone. Whatever our streaming servers have to
do, your streaming clients have to do, and we might as well pool our
efforts.

-John Kalucki
http://twitter.com/jkalucki
Infrastructure, Twitter Inc.



On Wed, Apr 7, 2010 at 12:08 PM, <zn...@comcast.net> wrote:

>
> ----- "John Kalucki" <j...@twitter.com> wrote:
>
> > We break the status text into tokens by whitespace and punctuation,
> > then apply the tokens to a hashmap of tracked terms. If the language
> > doesn't have whitespace, the only thing that will match is the entire
> > Tweet.
> >
> > I know that Search has struggled with this as well. I take it that the
> > solutions aren't easy. At some point we'll have to figure something
> > similar out for Streaming. I've filed a story to add support for these
> > languages in Track.
> >
> > -John Kalucki
> > http://twitter.com/jkalucki
> > Infrastructure Twitter Inc.
>
> Thanks! I was just about to add CJK (Chinese - Japanese - Korean) regular
> expressions to my list of research topics! ;-) There must be something in
> the open source world we can (to use the tired old cliché) "leverage off
> of." ;-) Oniguruma?? Namazu?
>
> I suppose we need to look at Cyrillic and right-to-left (Arabic and Hebrew)
> too?
>
> --
> M. Edward (Ed) Borasky
> http://borasky-research.net/smart-at-znmeb
>
> "A mathematician is a device for turning coffee into theorems." ~ Paul
> Erdős
>


-- 
To unsubscribe, reply using "remove me" as the subject.

Reply via email to