Re: Suggestions Needed : Developing application using Mahout

2012-01-24 Thread Ted Dunning
THere are a bunch of papers on this. Search named entity recognizer CRF on google. The basic idea is that an HMM or CRF has internal state that can be used to mark named entities. We don't have to define what the hidden states mean, just help the HMM or CRF find an internal representation that

RE: Suggestions Needed : Developing application using Mahout

2012-01-24 Thread Paritosh Ranjan
] on behalf of Dhruv Kumar [dku...@ecs.umass.edu] Sent: Wednesday, January 25, 2012 1:17 AM To: user@mahout.apache.org Subject: Re: Suggestions Needed : Developing application using Mahout HMMs seem to be a good fit for this problem. They are used ubiquitously for pattern detection. If you

Re: Suggestions Needed : Developing application using Mahout

2012-01-23 Thread Ted Dunning
The HMM implementations might be of help, but I think that a small CRF implementation that is oriented around string transduction would be more helpful. The Stanford Named Entity Recognizer (NER) has such an implementation. I think NLTK has one. I think GATE has one as well. The basic