Dear Daniel,
I updated https://issues.apache.org/jira/browse/OPENNLP-1223 and added
the updated model now trained on all of the tiger data (50.472 sentences).
Evaluation is done only on sentences containing names (6.271 sentences).
For restrictions see "Further improvements" in the issue.
Best regards,
Johannes
Am 09.10.2018 um 14:48 schrieb J. Fiala:
Dear Daniel,
Sure, so all data = training basis, only data with persons =
evaluation basis.
However, it seems it is not possible to supply the evaluation data,
only the model. But I can supply the evaluation results as a QA basis.
We can also take an evaluation run on other treebanks like Hamburg
Dependecy treebank (and spot some errors there).
Are there already any utilitiy routines for extracting the data from
tiger etc. or should I supply some in Java (besides using python based
nltk)?
I didn't see any tiger-based nltk / Java handling routines in the docs.
Best regards,
Johannes
Am 09.10.2018 um 14:38 schrieb Dan Russ:
Hi Jörn
Is it possible to train on all of the tiger data. but test on
universal
dependencies?
Daniel
On Fri, Sep 28, 2018, 3:28 AM Joern Kottmann <[email protected]> wrote:
Hello,
we can only distribute artifacts at Apache which can be licensed under
the AL 2.0.
I am not sure what the situation withe the tiger corpus is, but it
might have a clause in its license which would restrict this.
Anyway, +1 to release a model trained on the tiger corpus, and to add
support to train on it.
Jörn
On Wed, Sep 26, 2018 at 4:06 PM J. Fiala <[email protected]>
wrote:
Hi there,
I saw there is no model for Name Finder for language german.
Would you be interested to have on based on tiger or is someone else
already working on that?
I could not find an issue for adding models to NameFinder in other
languages, should I create a new one?
Thanks & Best regards,
Johannes