Hi All, I have a text file, which is tokenized with OpenNLP tokenizer. I am not sure which tokenizer did they use (probably TokenizerME). Is it possible to detokenize the file? I noticed there is a Detokenizer class, but can anyone provide a usable detokenize dictionary? https://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.tokenizer.detokenizing
the first ten lines of the file is as follows: 1000268201_693b08cb0e.jpg#0 A child in a pink dress is climbing up a set of stairs in an entry way . 1000268201_693b08cb0e.jpg#1 A girl going into a wooden building . 1000268201_693b08cb0e.jpg#2 A little girl climbing into a wooden playhouse . 1000268201_693b08cb0e.jpg#3 A little girl climbing the stairs to her playhouse . 1000268201_693b08cb0e.jpg#4 A little girl in a pink dress going into a wooden cabin . 1001773457_577c3a7d70.jpg#0 A black dog and a spotted dog are fighting 1001773457_577c3a7d70.jpg#1 A black dog and a tri-colored dog playing with each other on the road . 1001773457_577c3a7d70.jpg#2 A black dog and a white dog with brown spots are staring at each other in the street . 1001773457_577c3a7d70.jpg#3 Two dogs of different breeds looking at each other on the road . 1001773457_577c3a7d70.jpg#4 Two dogs on pavement moving toward each other . Thanks very much, -- Xingxing Zhang Institute for Language, Cognition and Computation (ILCC) The University of Edinburgh
