Jim,

From what I've been able to figure out and search on, coref component, helps to collect ideas on a document. What happens, is most writers start using personal pronouns and other key words to talk about the same person or other people in a document. Jim and Jane may be referenced as he and she throughout most if not all the rest of a document about them. Coref is suppose to collect this data into properties about the person or objects being discussed in the document so that one can collect data or references in the documents to facts or other information about that person. It helps two fold in that the ideas get collected about Jim and Jane in the document and one can determine in the long run about the category of document. In document categorizing names like Andre Agassi are strongly linked to tennis players, or Obama as being the current president and former Illinois Senator. With information like this you can do a lot of cross linking of documents and comparing of references to cross check facts, etc....

It can also work with medicines if there are generics that come out and are linked to the non-generic drug counterparts. You can gloss other information from such parsings and collections.

It is a powerful tool.

James

On 2/27/2013 1:26 PM, Jim - FooBar(); wrote:
Hmmm.... interesting! When I run it on these 2 simple sentences:

/"Mary likes pizza but she also likes kebabs. Knowing her, I'd give it 2 weeks before she turns massive!"/

I get perfect results!

#<DiscourseEntity [ Mary, she, her, she ]>

this demonstrates 3 things:
- my understanding of coref is indeed correct
- the coref component can link entities from separate sentences
- possibly that my code is fine

any thoughts?

Jim



On 27/02/13 18:14, Jim - FooBar(); wrote:
Hi all,

I finally managed to get coref working (phew!-my god that was tricky) but I'm slightly confused with the results so I'd like to see if anyone else has tried that out...Using the standard paragraph used in the other examples:

/"Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a director of this British industrial conglomerate."/

deploying the coref component gives me the following:
I must note that I'm trying to pass the named entities as well (person). I've confirmed that the spans are correctly identitified (3 spans for this particular example) and added to the parse tree via /opennlp.tools.parser.Parse.addNames//("person", span, parse.getTagNodes());/


[#<DiscourseEntity [ this British industrial conglomerate ]>,
#<DiscourseEntity [ a director of this British industrial conglomerate ]>,
 #<DiscourseEntity [ Consolidated Gold Fields PLC ]>,
#<DiscourseEntity [ chairman of Elsevier N . V . , the Dutch publishing group, former chairman of Consolidated Gold Fields PLC ]>,
 #<DiscourseEntity [ 55 years ]>,
#<DiscourseEntity [ Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields PLC , was named a director of this British industrial conglomerate . ]>, #<DiscourseEntity [ Elsevier N . V . , the Dutch publishing group, the Dutch publishing group ]>,
 #<DiscourseEntity [ Mr . Vinken ]>,
 #<DiscourseEntity [ a nonexecutive director Nov . 29 ]>,
 #<DiscourseEntity [ the board ]>,
 #<DiscourseEntity [ 61 years ]>,
 #<DiscourseEntity [ Pierre Vinken , 61 years old ]>
]

*filtering for more than 1 mentions (per Jorn's suggestion) gives back:*

[#<DiscourseEntity [ chairman of Elsevier N . V . , the Dutch publishing group, former chairman of Consolidated Gold Fields PLC ]> #<DiscourseEntity [ Elsevier N . V . , the Dutch publishing group, the Dutch publishing group ]>
]

Assuming that this is what it's supposed to output, can someone explain this? First of all where are the named-entities? Secondly, out of the 2 filtered DiscourseEntities, both seem plain wrong! Moreover, where is #<DiscourseEntity [Rudolph Agnew, //former chairman of Consolidated Gold Fields PLC/,/ the Dutch publishing group, director of this British industrial conglomerate ]> ???

Either I'm not understanding coreference, or I've coded the thing wrong or the models is not very good! Which one is it? Has anyone else attempted this? Can we compare results on this particular sentence?

thanks in advance :)

Jim

ps: my code is in Clojure but it is based on a code snippet provided by Jorn to someone on the mailing list last year . I can easily provide it but I don't think it will be of much help...





Reply via email to