Hi all, I am working on integrating Apache UIMA as un UpdateRequestProcessor for Apache Solr and I am now at the first working snapshot. I put the code on GoogleCode [1] and you can take a look at the tutorial [2].
I would be glad to donate it to the Apache Solr project, as I think it could be a useful module to trigger automatic content extraction while indexing documents. At the moment the UIMAUpdateRequestProcessor base implementation can automatically extract document's sentences, language, keywords, concepts and named entities using Apache UIMA's HMMTagger, OpenCalaisAnnotator and AlchemyAPIAnnotator components (but it can be easily expanded). Any feedback is welcome. Have a nice day. Tommaso [1] : http://code.google.com/p/solr-uima/ [2] : http://code.google.com/p/solr-uima/wiki/5MinutesTutorial