Hello Burcu,

UIMA has an entirely different purpose actually, and doesn't do classification or clustering. You would rather use UIMA to enrich documents (individually) through text analysis and then use the result to create better feature vectors to use with Solr, Mahout, etc.

We typically use UIMA to do named entity recognition, sentiment analysis, chunking, etc. and then index the result in Solr. From there you can either use it for retrieval (i.e. use the enriched representation to get a better document similarity measure) or extract the vectors to use with Mahout/Weka/Cluto/...

HTH,
Jens

On 14/01/14 16:25, Burcu B wrote:
Hi,

I'd like to know why someone should prefer UIMA when developing an
application for end users to classify and cluster general purpose
documents?

I have two options:
1- integrating Mahout, SOLR, R ,Hadoop and other file sources such as
  document man. systems or file system.
2- or doing these using UIMA.

Intiutively, I think that UIMA should be preferred, but I could not justify
my feeling. I need a list of pros and cons.

If you could suggest me resources, it would be great.

Thank you.



Reply via email to