Re: NER of fictional characters. Can I incorporate token "concreteness" into my custom model?

Jörn Kottmann Thu, 26 Jun 2014 08:05:07 -0700

On 06/26/2014 03:56 PM, John Miedema wrote:

First post. I'm working on NER in the domain of literature.


Using standard NER I can pull out People names, authors like "Robert Louis
Stevenson" and character names like "Long John Silver". But of course there
is no distinction between real-life authors and fictional characters.

I've built my first custom model to identify Book Titles. It's just a quick
implementation for test purposes but it works quite well.

I'm considering building a custom model to identify Characters. What I know
now is that the model trainer uses tokens, POS, and proximity of words to
establish features. I can also add dictionaries and such. But I think one
key distinguishing feature of characters (vs People) is the "colorfulness",
or concrete imagery associated with character names:

Long John Silver
Tin Tin
Sherlock Holmes
Gandalf
Nigel Molesworth

By colourful, I mean that the names are more likely to use concrete imagery
(long, tin, mole) or have unique phonetic qualities (Sher Lock, Gan Dalf).
Sure, many characters have common names,  but I think I can use these
properties to help identify Character entities. I can come up with a
measure of concreteness, at least.

*My question is, if I knew the concreteness of tokens, is there any way I
can incorporate this measure into my custom model?*

I would prefer to avoid resorting to a dictionary. I think this would work
just like other word attributes, such as frequency, e.g., "home" is a more
frequently used word than "dwelling." Do models ever incorporate attributes
like token frequency? If yes, I could work from that.

*How about the use of phonetics?*

You can define your own feature generators and combine it with theexisting feature generators.Right now the features are binary, they are either set or not. If youhave a strength/weight you mightbe able to translate that to binary features. e.g by using a mappingfunction.

If you decide to use a dictionary, have a look at wikipedia, maybe youare able to link the entitiesto wikipedia entries. They probably have some properties which indicateif it is fictional or not.Wikipedia is hard to use, but projects like dbpedia make these kind oflookups possible.


HTH,
Jörn

Re: NER of fictional characters. Can I incorporate token "concreteness" into my custom model?

Reply via email to