LUCENE-2899 is an integration of OpenNLP into the Lucene/Solr project. It includes sentence tokenization, term tokenization, parts-of-speech tagging, chunking, and NER tagging. I tried to get a basic port with most of the word-by-word tools in the OpenNLP toolkit. I believe I have finished every fiddly bit except for release engineering and IntelliJ support. If you use Lucene/Solr and OpenNLP, please try it out and vote for it:
http://wiki.apache.org/solr/OpenNLP https://issues.apache.org/jira/browse/LUCENE-2899 Cheers! -- Lance Norskog [email protected]
