On 07/09/2012 05:56 AM, Lance Norskog wrote:
Would it make sense to join OpenNLP, UIMA, and Open Relevance into one
top-level "Text Analysis" project? There are already cross-project
connections between UIMA and OpenNLP. ORP seems dormant. It also seems
a more natural place than OpenNLP for a database of tagged text.
OpenNLP and UIMA align nicely in my opinion. OpenNLP just implements
engines for various NLP tasks without any further support.
UIMA on the other side can do a lot of these additional things you need to
run OpenNLP in a production system e.g. scaling the engines
to many machines, providing workflow support, resource loading and
management, etc.
So there is not really an overlap between the two.
UIMA has some NLP related addons in their sandbox, some of them
duplicate functionality
which is also provided by OpenNLP e.g. pos tagging, or the dictionary
annotator, but that
does not seem to be that much.
Lucene contains a lot of NLP code for stemming and word segmentation in
different
languages. Thats probably the biggest NLP related code base next to
OpenNLP at Apache.
Jörn