Hi Joseph,
   I don’t remember exactly what features the NER uses, but a general rule of 
thumb is that you want the training data resembles the unseen data. Think of 
the training data as a sampling experiment, the closer the sample gets to the 
population (data not seen) the better the classifier will work.You certainly 
can use the presences of a word in dictionary as a feature, and that will 
probably help with the classification.  If you provide a little more about the 
problem, I could expand the answer a bit.
Daniel



> On Jul 9, 2017, at 9:38 AM, Joseph B. Ottinger <[email protected]> wrote:
> 
> I was planning on training my own model, but I wondered what kind of input
> data would give the best results; does the training data have to make
> sense, or be representative of common input? I have a dictionary of terms
> to mark as entities, and while I have a good bit of sensible data, I need
> to add entities to the model fairly often; typically I'll have the entity
> name and fairly little information to go with it, so it'd be easiest to use
> something like a Markov chain generator to generate content around the
> entity, or something. I could also generate fairly static content, but I'd
> prefer to train the system well, if possible.

Reply via email to