I have a bunch of sentences like the following: 

Guacamole Dip: 5 Hass Avocados, Jalapeno Puree with Salt and BHT (preservative).

They are standalone, i.e., they are not contained within a larger 
paragraph/document structure.

I want to tag various words, creating the following: 

Guacamole Dip: 5 Hass <START:term>Avocados<END>, <START:term>Jalapeno<END> 
Puree with <START:term>Salt<END> and <START:term>BHT<END> (preservative).

Looking through the mailing list for guidance, I came across this: 

http://mail-archives.apache.org/mod_mbox/opennlp-users/201205.mbox/%3C4FA1EE7E.2080608%40gmail.com%3E

Which made me think that, before going though a 100 or so documents and tagging 
the words to create training data, I should get some clarification on the 
following:

1. Is NER the right tool for this?
2. My training data is somewhat small (~100 sentences) will this stymie my goal 
above?
3. Were the poor results the gentleman had with Italian addresses in part do to 
a bug mentioned here:
http://mail-archives.apache.org/mod_mbox/opennlp-users/201205.mbox/%3C4FA1EF10.2020904%40gmail.com%3E
4. Is it possible to use a text file containing only terms, or a tab delimited 
file like the ones the Stanford NER uses?

Thanks in advance.

Reply via email to