Sahar, You could also try weeding out sentences that the sentence detector finds to have low probabilities. For the language part, are there multiple languages in one chunk of text? If not, you could use Tika or google LangDetect to detect the language.
Ryan On Apr 30, 2013, at 6:58, William Colen <[email protected]> wrote: > Hi, Sahar, > > I don't know a stabilished approach that solves your problem, but there are > a few things you could try. For example, you could check if the sentence is > parseable. If a Parser can figure out a tree for the sentence, it might > mean that its structure is known. I don't know if it would work with a > statistical parser like the one in OpenNLP, but it works at least for rule > based parsers, were you have fine-grained control over the structures. > > Regards, > William > > On Tue, Apr 30, 2013 at 10:43 AM, Sahar Ebadi > <[email protected]>wrote: > >> Hi all, >> >> lets say I have a text and I would like to detect only "good sentences". by >> "good sentences" I mean sentences that are 1)complete( grammatically >> 2)have meaning 3)are in English language. >> >> As far as I found Open NLP sentence detector only detects sentences >> according to punctuation(and a list of acronyms it has), so there is >> no guarantee that the sentences are real, complete and meaningful >> sentences. >> >> Now my question is is there any process in NLP that can help me to : >> >> 1)find grammatically complete sentences? >> 2)find if a sentence has meaning or no? >> 3)filter non-english texts? >> >> any suggestions or sharing useful resources is highly appreciated! >> >> Thanks. >>
