I was planning on training my own model, but I wondered what kind of input
data would give the best results; does the training data have to make
sense, or be representative of common input? I have a dictionary of terms
to mark as entities, and while I have a good bit of sensible data, I need
to add entities to the model fairly often; typically I'll have the entity
name and fairly little information to go with it, so it'd be easiest to use
something like a Markov chain generator to generate content around the
entity, or something. I could also generate fairly static content, but I'd
prefer to train the system well, if possible.

Reply via email to