Hey Twitter folks I *love* that you guys index messages by smiley vs. frowny emoticons! It looks like you normalize a wide range of "happy" and "sad" emoticons together. I'm doing some searches in [[ :) ]] and trying to then identify what the original smiley in the message was and it's a little tricky. For example, for "happy", I've seen all of these:
:-) :D ^_^ =) :) : ) ;) I have some regexes to extract them them -- attached if anyone's curious -- but I'm sure i'm missing plenty, and I sometimes identify some that don't belong. Are you guys using a regex or something at index time? Does it change much? Would you mind sharing? cheers, :) ... or should i say ^_^ Brendan [ anyall.org ]
emoticons.py
Description: Binary data