I would like to take a long sentence, let's say 30 words, and find clauses maybe 10-20 words long that are somewhat self-contained blocks of text; complete sentences or nearly. These clauses can be overlapping. What is a good way to use OpenNLP's tools?
The application is for document summarization via LSA. This technique needs to operate on coherent statements rather than very long sentences. Some of my test data is riddled with 30-50-word sentences, and they have overlapping clauses which are coherent statements of the document themes. -- Lance Norskog [email protected]
