Jim,
From what I've been able to figure out and search on, coref component,
helps to collect ideas on a document. What happens, is most writers
start using personal pronouns and other key words to talk about the same
person or other people in a document. Jim and Jane may be referenced as
he and she throughout most if not all the rest of a document about
them. Coref is suppose to collect this data into properties about the
person or objects being discussed in the document so that one can
collect data or references in the documents to facts or other
information about that person.
It helps two fold in that the ideas get collected about Jim and Jane in
the document and one can determine in the long run about the category of
document. In document categorizing names like Andre Agassi are strongly
linked to tennis players, or Obama as being the current president and
former Illinois Senator. With information like this you can do a lot of
cross linking of documents and comparing of references to cross check
facts, etc....
It can also work with medicines if there are generics that come out and
are linked to the non-generic drug counterparts. You can gloss other
information from such parsings and collections.
It is a powerful tool.
James
On 2/27/2013 1:26 PM, Jim - FooBar(); wrote:
Hmmm.... interesting! When I run it on these 2 simple sentences:
/"Mary likes pizza but she also likes kebabs. Knowing her, I'd give it
2 weeks before she turns massive!"/
I get perfect results!
#<DiscourseEntity [ Mary, she, her, she ]>
this demonstrates 3 things:
- my understanding of coref is indeed correct
- the coref component can link entities from separate sentences
- possibly that my code is fine
any thoughts?
Jim
On 27/02/13 18:14, Jim - FooBar(); wrote:
Hi all,
I finally managed to get coref working (phew!-my god that was tricky)
but I'm slightly confused with the results so I'd like to see if
anyone else has tried that out...Using the standard paragraph used in
the other examples:
/"Pierre Vinken, 61 years old, will join the board as a nonexecutive
director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch
publishing group. Rudolph Agnew, 55 years old and former chairman of
Consolidated Gold Fields PLC, was named a director of this British
industrial conglomerate."/
deploying the coref component gives me the following:
I must note that I'm trying to pass the named entities as well
(person). I've confirmed that the spans are correctly identitified (3
spans for this particular example) and added to the parse tree via
/opennlp.tools.parser.Parse.addNames//("person", span,
parse.getTagNodes());/
[#<DiscourseEntity [ this British industrial conglomerate ]>,
#<DiscourseEntity [ a director of this British industrial
conglomerate ]>,
#<DiscourseEntity [ Consolidated Gold Fields PLC ]>,
#<DiscourseEntity [ chairman of Elsevier N . V . , the Dutch
publishing group, former chairman of Consolidated Gold Fields PLC ]>,
#<DiscourseEntity [ 55 years ]>,
#<DiscourseEntity [ Rudolph Agnew , 55 years old and former chairman
of Consolidated Gold Fields PLC , was named a director of this
British industrial conglomerate . ]>,
#<DiscourseEntity [ Elsevier N . V . , the Dutch publishing group,
the Dutch publishing group ]>,
#<DiscourseEntity [ Mr . Vinken ]>,
#<DiscourseEntity [ a nonexecutive director Nov . 29 ]>,
#<DiscourseEntity [ the board ]>,
#<DiscourseEntity [ 61 years ]>,
#<DiscourseEntity [ Pierre Vinken , 61 years old ]>
]
*filtering for more than 1 mentions (per Jorn's suggestion) gives back:*
[#<DiscourseEntity [ chairman of Elsevier N . V . , the Dutch
publishing group, former chairman of Consolidated Gold Fields PLC ]>
#<DiscourseEntity [ Elsevier N . V . , the Dutch publishing group,
the Dutch publishing group ]>
]
Assuming that this is what it's supposed to output, can someone
explain this? First of all where are the named-entities? Secondly,
out of the 2 filtered DiscourseEntities, both seem plain wrong!
Moreover, where is #<DiscourseEntity [Rudolph Agnew, //former
chairman of Consolidated Gold Fields PLC/,/ the Dutch publishing
group, director of this British industrial conglomerate ]> ???
Either I'm not understanding coreference, or I've coded the thing
wrong or the models is not very good! Which one is it? Has anyone
else attempted this? Can we compare results on this particular sentence?
thanks in advance :)
Jim
ps: my code is in Clojure but it is based on a code snippet provided
by Jorn to someone on the mailing list last year . I can easily
provide it but I don't think it will be of much help...