Hi, Thanks for the replies! :) Lance: 1)yes, I use sentence detector just to split the text in to sentences and I am not taking them as like they are Valid sentences. 2)Watson goes beyond what I need. I only need to find good/valid sentences in the text(only NLP, does not include reasoning and information retrival as watson does). 3)I know there should be some semi-effective solutions but I am not able to find them. can you give me some keywords or short explanation on some of them? that would be a greaat help!!
So what I have done: the only solution I found was to parse the sentence and then check to see if it follows the standard grammatical pattern of a sentence. If so it is a valid sentence otherwise it is not a valid sentence. so far, I have parsed the sentences using Open NLP which is tagged based on penn treebank. now I need to know if there is any standard sentence pattern which is based on penn treebank? Ryan: the result will not be accurate enough. Willian: can you pass me the name of some rule-based parser you have in mind? (especially those compatible with OPEN NLP) I really appreciate any suggestions on this. Thank you all so much! On Wed, May 1, 2013 at 5:34 PM, Lance Norskog <[email protected]> wrote: > The "sentence detector" is for tokenizing (breaking text into words), not > analysis. > > The 'brute force' approach for removing non-english texts is to search for > higher-page Unicode. If it's over 255, it's not english. (Except maybe for > currency.) > > What you're talking about are semantically deep problems that have a lot > of semi-effective solutions. How deep do you want this analysis to be? How > close to IBM Watson do you expect to get? > > > On 04/30/2013 06:43 AM, Sahar Ebadi wrote: > >> Hi all, >> >> lets say I have a text and I would like to detect only "good sentences". >> by >> "good sentences" I mean sentences that are 1)complete( grammatically >> 2)have meaning 3)are in English language. >> >> As far as I found Open NLP sentence detector only detects sentences >> according to punctuation(and a list of acronyms it has), so there is >> no guarantee that the sentences are real, complete and meaningful >> sentences. >> >> Now my question is is there any process in NLP that can help me to : >> >> 1)find grammatically complete sentences? >> 2)find if a sentence has meaning or no? >> 3)filter non-english texts? >> >> any suggestions or sharing useful resources is highly appreciated! >> >> Thanks. >> >> >
