Hey Twitter folks
I *love* that you guys index messages by smiley vs. frowny emoticons!  It
looks like you normalize a wide range of "happy" and "sad" emoticons
together.  I'm doing some searches in [[ :) ]] and trying to then identify
what the original smiley in the message was and it's a little tricky.  For
example, for "happy", I've seen all of these:

:-)  :D  ^_^  =)  :)  : )  ;)

I have some regexes to extract them them -- attached if anyone's curious --
but I'm sure i'm missing plenty, and I sometimes identify some that don't
belong.

Are you guys using a regex or something at index time?  Does it change much?
 Would you mind sharing?

cheers,  :)  ... or should i say ^_^

Brendan
[ anyall.org ]

Attachment: emoticons.py
Description: Binary data

Reply via email to