How to Detokenize a file (which is tokenized with OpenNLP tokenizer)?

Xingxing Zhang Thu, 12 Mar 2015 01:41:36 -0700

Hi All,

I have a text file, which is tokenized with OpenNLP tokenizer.  I am not
sure which tokenizer did they use (probably TokenizerME).
Is it possible to detokenize the file?
I noticed there is a Detokenizer class, but can anyone provide a usable
detokenize dictionary?
https://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.tokenizer.detokenizing



the first ten lines of the file is as follows:

1000268201_693b08cb0e.jpg#0 A child in a pink dress is climbing up a set of
stairs in an entry way .

1000268201_693b08cb0e.jpg#1 A girl going into a wooden building .

1000268201_693b08cb0e.jpg#2 A little girl climbing into a wooden playhouse .

1000268201_693b08cb0e.jpg#3 A little girl climbing the stairs to her
playhouse .

1000268201_693b08cb0e.jpg#4 A little girl in a pink dress going into a
wooden cabin .

1001773457_577c3a7d70.jpg#0 A black dog and a spotted dog are fighting

1001773457_577c3a7d70.jpg#1 A black dog and a tri-colored dog playing with
each other on the road .

1001773457_577c3a7d70.jpg#2 A black dog and a white dog with brown spots
are staring at each other in the street .

1001773457_577c3a7d70.jpg#3 Two dogs of different breeds looking at each
other on the road .

1001773457_577c3a7d70.jpg#4 Two dogs on pavement moving toward each other .

Thanks very much,
-- 
Xingxing Zhang

Institute for Language, Cognition and Computation (ILCC)
The University of Edinburgh

How to Detokenize a file (which is tokenized with OpenNLP tokenizer)?

Reply via email to