On 4/17/2013 12:15 AM, Richard Head Jr. wrote:

--- On Mon, 4/15/13, Jörn Kottmann <[email protected]> wrote:
Yes, the NER should be capable of detecting the terms, but
you could also try to use a dictionary.
Are you referring to a POS dictionary? I would have just 2 parts of speech: the 
terms and the other words, correct? What's the advantage of using NER over POS?
No, we are talking about the DictionaryNameFinder component.

Your training data is too small, especially when you train
with a cutoff of 5 and the maxent model,
the perceptron will work better.
So perception is good for a small set of training data? Is a maxent even 
necessary when words are not composed of other words?

Label more data until you have a few  thousand sentences.
Yes, this is my problem. I don't have thousands of sentences and I'm afraid to 
take the time and label the 100 or so that I have only for it to fail.

Is there a (dis)advantage to training with 1000 long sentences over say, 2500 
short ones?

Thanks!
Train with sentences in your domain. If all your sentences you are parsing are short then train on short. The disadvantage is really more so on getting a good sample space to train with. If you don't have many, then the training just trains on the few .... meaning your model won't work well.

Reply via email to