Hi all,
I finally managed to get coref working (phew!-my god that was tricky)
but I'm slightly confused with the results so I'd like to see if anyone
else has tried that out...Using the standard paragraph used in the other
examples:
/"Pierre Vinken, 61 years old, will join the board as a nonexecutive
director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch
publishing group. Rudolph Agnew, 55 years old and former chairman of
Consolidated Gold Fields PLC, was named a director of this British
industrial conglomerate."/
deploying the coref component gives me the following:
I must note that I'm trying to pass the named entities as well (person).
I've confirmed that the spans are correctly identitified (3 spans for
this particular example) and added to the parse tree via
/opennlp.tools.parser.Parse.addNames//("person", span,
parse.getTagNodes());/
[#<DiscourseEntity [ this British industrial conglomerate ]>,
#<DiscourseEntity [ a director of this British industrial conglomerate ]>,
#<DiscourseEntity [ Consolidated Gold Fields PLC ]>,
#<DiscourseEntity [ chairman of Elsevier N . V . , the Dutch
publishing group, former chairman of Consolidated Gold Fields PLC ]>,
#<DiscourseEntity [ 55 years ]>,
#<DiscourseEntity [ Rudolph Agnew , 55 years old and former chairman
of Consolidated Gold Fields PLC , was named a director of this British
industrial conglomerate . ]>,
#<DiscourseEntity [ Elsevier N . V . , the Dutch publishing group, the
Dutch publishing group ]>,
#<DiscourseEntity [ Mr . Vinken ]>,
#<DiscourseEntity [ a nonexecutive director Nov . 29 ]>,
#<DiscourseEntity [ the board ]>,
#<DiscourseEntity [ 61 years ]>,
#<DiscourseEntity [ Pierre Vinken , 61 years old ]>
]
*filtering for more than 1 mentions (per Jorn's suggestion) gives back:*
[#<DiscourseEntity [ chairman of Elsevier N . V . , the Dutch publishing
group, former chairman of Consolidated Gold Fields PLC ]>
#<DiscourseEntity [ Elsevier N . V . , the Dutch publishing group, the
Dutch publishing group ]>
]
Assuming that this is what it's supposed to output, can someone explain
this? First of all where are the named-entities? Secondly, out of the 2
filtered DiscourseEntities, both seem plain wrong! Moreover, where is
#<DiscourseEntity [Rudolph Agnew, //former chairman of Consolidated Gold
Fields PLC/,/ the Dutch publishing group, director of this British
industrial conglomerate ]> ???
Either I'm not understanding coreference, or I've coded the thing wrong
or the models is not very good! Which one is it? Has anyone else
attempted this? Can we compare results on this particular sentence?
thanks in advance :)
Jim
ps: my code is in Clojure but it is based on a code snippet provided by
Jorn to someone on the mailing list last year . I can easily provide it
but I don't think it will be of much help...