Take the longest NP chunks? There are NP chunker models for English. The results from the English NP chunker are quite granular so maybe the length (about 30 words) should steer this.
Alternatively, you can use the parser and get the longest Nps there as well which are children of a VP. Maybe also start with the very basic NP VP NP construction from the parse tree. This should, hopefully, give meaningful clauses. And then, probably a weird idea is to mimic a NER system. Just use the input from a POS tagger in connection with a RegEx NER finder. Your regex will work on POS sequences (e.g. DT JJ* NP). Hope this helps. Best, Svetoslav On 2012-05-23 05:29, "Lance Norskog" <[email protected]> wrote: >I would like to take a long sentence, let's say 30 words, and find >clauses maybe 10-20 words long that are somewhat self-contained blocks >of text; complete sentences or nearly. These clauses can be >overlapping. What is a good way to use OpenNLP's tools? > >The application is for document summarization via LSA. This technique >needs to operate on coherent statements rather than very long >sentences. Some of my test data is riddled with 30-50-word sentences, >and they have overlapping clauses which are coherent statements of the >document themes. > >-- >Lance Norskog >[email protected] >
