Hello Burcu,
UIMA has an entirely different purpose actually, and doesn't do
classification or clustering. You would rather use UIMA to enrich
documents (individually) through text analysis and then use the result
to create better feature vectors to use with Solr, Mahout, etc.
We typically use UIMA to do named entity recognition, sentiment
analysis, chunking, etc. and then index the result in Solr. From there
you can either use it for retrieval (i.e. use the enriched
representation to get a better document similarity measure) or extract
the vectors to use with Mahout/Weka/Cluto/...
HTH,
Jens
On 14/01/14 16:25, Burcu B wrote:
Hi,
I'd like to know why someone should prefer UIMA when developing an
application for end users to classify and cluster general purpose
documents?
I have two options:
1- integrating Mahout, SOLR, R ,Hadoop and other file sources such as
document man. systems or file system.
2- or doing these using UIMA.
Intiutively, I think that UIMA should be preferred, but I could not justify
my feeling. I need a list of pros and cons.
If you could suggest me resources, it would be great.
Thank you.