Hi,
I just want to use the existing name finder with custom features. With the cmd
line I can get the custom set of features running. Thanks for that. However, I
want to be able to retrain the model dynamically, i.e. via source code.
I am now using the XML file for defining the set of custom features instead of
instantiating it via the AdaptiveFeatureGenerator. I then use the method
openFeatureGeneratorBytes() from the TokenNameFinderTrainerTool to convert it
to a byte array which I can then pass to the TokenNameFinderFactory like this:
TokenNameFinderFactory factory = new
TokenNameFinderFactory(openFeatureGeneratorBytes(featureGenFile),null, codec);
a) Is this approach alright or would you recommend something else?
b) Another question: Is it possible to somehow see the computed feature vector
for every token (during training and prediction)?
c) And out of curiosity: Is it possible to see how much a feature contributes
to the final decision? I want to identify features that are useless and those
which may lead to wrong predictions.
Thank you very much for your help again!
Best regards,
Markus
> Hello,
>
> it really depends on what are you trying to achieve.
>
> Maybe you know exactly what you want, in that case I would recommend to
> sub-class the TokenNameFinderFactory, there could override the method to
> create the feature generators. The default constructor is fine. The name
> finder supports different encodings, currently Bio and Bilou. You would
> need to pass a reference to one of those classes, or just use the default
> (which is Bio).
>
> If you just want to have the name finder with custom feature generation I
> would suggest to define an xml descriptor for it and just use our cmd line
> interface to build the model. The cmd lie inerface has the advantage that
> you can use all the tools without coding yourself, especially evaluation
> and cross validation should be interesting for you.
>
> TokenNameFinderFactory(byte[] featureGeneratorBytes,
> Map<String,Object> resources,
> SequenceCodec<String> seqCodec)
>
> The byte[] is supposed to contain the feature generator xml bytes.
>
> HTH,
> Jörn