On Mon, Mar 14, 2011 at 8:05 PM, Andrew Look <al...@shopzilla.com> wrote:
> This way coherent responses could be chained together in order to aggregate > more useful information, while people replying on tangents or spamming would > tend to get left out. > Interesting point. > Thoughts on how mahout might help create such an adjacency matrix? Yes. There are cooccurrence counters in Mahout that can help a lot with this. I will be visiting your facility on Wednesday fi you would like to talk about this more. > Obviously cosine similarity would still form the distances between each > reply in a given thread, but it seems like having some way of weighting each > term’s specificity would help too – i.e. SGD or SVM are more specific than > classifier, and classifier is more specific than Mahout since we’re looking > at the mahout mailing list... yes. Good idea.