I'm new to NLP but I think OpenNLP will solve my problem. I'm trying to
classify user inputted sentences about a prescribed situation into two sets
of categories. One set of categories relates specifically to content of the
sentence (for instance, AboutBob, AboutWeather, etc.). The other set of
categories relates to the likely emotion and nature of the sentence (for
instance, IsPraise, IsAssertion, IsInsult, etc.).

I plan on using the Document Categorizer; however, I have no idea how much
training data I'll need and I'll need to write the training data myself. Is
there any way you can give me an estimated range of ballpark figures of the
number of training sentences per category I should aim for in each curpos
(in other words, what are the usual ranges for this kind of project)? Also,
should I aim to include as many variations on the training data sentences
as possible? Right now I'm trying to estimate the amount of work required
so I can roughly estimate the time I'll need to complete the project.

-- Jonathan

Reply via email to