Re: Handling of Quotes

James Kosin Thu, 28 Mar 2013 19:13:43 -0700

On 3/28/2013 9:54 AM, Ian Jackson wrote:

I used the prebuilt models for the SetenceModel (en-sent.bin), TokenizerModel 
(en-token.bin), and ParserModel (en-parser-chunker.bin) with the following 
sentence:
    The "quick" brown fox jumps in over the lazy dog.


The result marks the part of speech for the quotes as JJ (for the open) and (NN 
for the close) as follows:
(TOP (NP (NP (DT The) (JJ ") (JJ quick) (NN ") (JJ brown) (NN fox) (NNS jumps)) 
(PP (IN over) (NP (DT the) (JJ lazy) (NN dog))) (. .)))

If I alter the sentence as follows changing double quotes to two single forward 
quotes and backward quotes 
[http://www.cis.upenn.edu/~treebank/tokenization.html]:
    The `` quick '' brown fox jumps over the lazy dog

The results are as follows:
(TOP (NP (NP (DT The) (`` ``) (JJ quick) ('' '') (JJ brown) (NN fox) (NNS 
jumps)) (PP (IN over) (NP (DT the) (JJ lazy) (NN dog))) (. .)))

Does a method exists to configure the tokenizer to handled quotes within a 
sentence?

Training the models with the double quotes instead of the singleforward/backward quote would do the trick.Would explain why the tokenizer model doesn't do good with mysentences... I've had to train my own models for a lot of the stuff I'mdoing these days.


Thanks,
James

Re: Handling of Quotes

Reply via email to