I have a bunch of sentences like the following: Guacamole Dip: 5 Hass Avocados, Jalapeno Puree with Salt and BHT (preservative).
They are standalone, i.e., they are not contained within a larger paragraph/document structure. I want to tag various words, creating the following: Guacamole Dip: 5 Hass <START:term>Avocados<END>, <START:term>Jalapeno<END> Puree with <START:term>Salt<END> and <START:term>BHT<END> (preservative). Looking through the mailing list for guidance, I came across this: http://mail-archives.apache.org/mod_mbox/opennlp-users/201205.mbox/%3C4FA1EE7E.2080608%40gmail.com%3E Which made me think that, before going though a 100 or so documents and tagging the words to create training data, I should get some clarification on the following: 1. Is NER the right tool for this? 2. My training data is somewhat small (~100 sentences) will this stymie my goal above? 3. Were the poor results the gentleman had with Italian addresses in part do to a bug mentioned here: http://mail-archives.apache.org/mod_mbox/opennlp-users/201205.mbox/%3C4FA1EF10.2020904%40gmail.com%3E 4. Is it possible to use a text file containing only terms, or a tab delimited file like the ones the Stanford NER uses? Thanks in advance.
