At Findwise we active use a number of OpenNLP components with both Hydra and OpenPipeline when indexing with Solr.
I look forward to see the result of the patch! Best, Svetoslav On 2012-05-31 23:10, "Lance Norskog" <[email protected]> wrote: >Thanks. I have looked at UIMA several times and it seemed very >complex. It has a lot of features, is mature, has an Eclipse app >builder, etc. I could not keep it all in my head at once. The >Solr/Lucene document pipeline features give little space for NLP >features. Hydra or OpenPipeline give UIMA and OpenNLP "room to >breathe". > >Are there free annotated text databases for UIMA? OpenNLP does not use >any with open licences. It has binary models made from copyrighted >annotations and so they cannot be checked into Apache. > >On Wed, May 30, 2012 at 6:11 PM, Christian Moen <[email protected]> wrote: >> Hello Lance, >> >> This is very cool! I'm looking forward to having a look at this. >> >> >> Christian Moen >> http://atilika.com >> >> On May 31, 2012, at 9:54 AM, Lance Norskog wrote: >> >>> I'm creating a patch to integrate OpenNLP into the Lucene/Solr >>> project. The SentenceDetector, Tokenizer, POS tagger, Chunker, and NER >>> tools are included. The SentenceDetector and Tokenizer are a Lucene >>> Tokenizer, and a Lucene TokenFilter takes this stream and runs >>> POS/Chunking/NER on it, saving the tags as upper-case payloads. The >>> patch includes a couple of handy combinations. For example, make a >>> more focused search index by only indexing the nouns & verbs. >>> >>> Do you have any hints on how to package it? The documentation should >>> include how to download and install the models. >>> >>> -- >>> Lance Norskog >>> [email protected] >> > > > >-- >Lance Norskog >[email protected] >
