Hi, I wanted to pick people's brains a little bit on the subject of determining importance. This isn't necessarily Mahout related, although I think we have some tools that help in the area.
One of the emerging trends it seems these days with all our connectivity and content is a notion of importance/priority. Some examples: 1. Google now has "Priority Inbox" for instance and I think most would agree that for things like Twitter and Facebook it would be really nice if you could separate out the Important updates/people from the less important. 2. Identifying important phrases, etc. in text across a corpus. 3. One of the things I think most researchers do when exploring a new topic is to identify the one or two seminal papers in the field, read them, and then read the ones that cite those papers and so on. 4. Take in all the day's news and figure out what the key articles are to read (in some sense it's picking the most representative document in a cluster) or that the article talking about raising Federal income taxes is likely more important than the one talking about raising local sales tax (or vice versa!) 5. PageRank, TextRank, etc. and other approaches to calculating authority What I'm looking for is help in researching this area. Is there a name for this (sub-)field (importance theory? prioritization theory?), particularly in mach. learning and NLP that is geared towards this? I realize some (most) of these problems can be solved with classifiers amongst other things like graph algorithms (particularly ones that use the social graph), but it also seems like the area is bigger than a particular implementation, so I wanted to hear what others thought. How would you go about solving these problems? Do you have any pointers to useful references on the subject (theoretical or practical)? What other examples have you run up against? Thanks, Grant
