I'm using the OpenNLP Clojure interface, https://github.com/dakrone/clojure-opennlp
In my first attempt at parsing a sentence with the Treebank model, I tried the following: (treebank-parser ["What can happen in a second ."]) I got the following answer: (TOP (SBARQ (WHNP (WP What)) (SQ (VP (MD can) (VP (VB happen) (PP (IN in) (NP (DT a) (JJ second)))))) (. .))) For the most part that seems OK, except that "second" is tagged as an adjective (JJ) rather than as a noun (NN). [I'm certainly no linguist, but is it even meaningful to talk about a NP without a noun in it?] Anyway, at a technical level, I wonder how I can get the parser (or tagger) to notice and show me the alternative possibilities (i.e. where "second" is understood as a noun)? >From looking around online, I'm pretty sure this is possible, though I don't know if it's directly supported by the Clojure interface! I'd also appreciate any pointers to how to do it directly in Java, so I know what sorts of questions to ask next. Many thanks, Joe PS. The issue of indeterminacy is described in "Building a large annotated corpus of English: the Penn Treebank" as follows: «Since a major concern of the Treebank is avoid requiring annotators to make arbitrary decisions, we allow words to be associated with more than one POS tag. Such multiple tagging indicates either that the word's part of speech simply cannot be decided or that the annotator is unsure which of the alternative tags is the correct one. In principle, annotators can tag a word with any number of tags, but in practice, multiple tags are restricted to a small number of recurring two-tag combinations: JJNN (adjective or noun as prenominal modifier), JJVBG (adjective or gerund/present participle), JJVBN (adjective or past participle), NNVBG (noun or gerund), and RBRP (adverb or particle).» - https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html
