Re: Clustering raw articles vs clustering (Stanford's) NER output

Ted Dunning Mon, 12 May 2014 14:26:45 -0700

Clustering with higher level data available for the distance computation is a 
fine thing.

The tuning will be very different but the results can be very good when the 
named entity resolver gets a good hit.  Since named entities tend to be 
relatively rare, they get high IDF scores and other terms recede a bit as a 
result if normalization.  

Sent from my iPhone

> On May 12, 2014, at 6:29, David Noel <david.i.n...@gmail.com> wrote:
> 
> I've spent a few weeks tuning Mahout to cluster news articles and have
> had decent results. Decent, but still not perfect. In trying to think
> of ways to improve my results I had the idea of running Mahout on
> output from Stanford's Named Entity Recognizer (NER) instead of the
> articles themselves, and seeing how that compared. Has anyone tried
> this? Did it generate more cohesive clusters?

Re: Clustering raw articles vs clustering (Stanford's) NER output

Reply via email to