Hello, we have a detokenizer dict for englisch. It can be found in opennlp-tools/lang/en/tokenizer
HTH, Jörn On Wed, 2015-03-11 at 14:44 +0000, Xingxing Zhang wrote: > Hi All, > > I have a text file, which is tokenized with OpenNLP tokenizer. I am not > sure which tokenizer did they use (probably TokenizerME). > Is it possible to detokenize the file? > I noticed there is a Detokenizer class, but can anyone provide a usable > detokenize dictionary? > https://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.tokenizer.detokenizing > > > the first ten lines of the file is as follows: > > 1000268201_693b08cb0e.jpg#0 A child in a pink dress is climbing up a set of > stairs in an entry way . > > 1000268201_693b08cb0e.jpg#1 A girl going into a wooden building . > > 1000268201_693b08cb0e.jpg#2 A little girl climbing into a wooden playhouse . > > 1000268201_693b08cb0e.jpg#3 A little girl climbing the stairs to her > playhouse . > > 1000268201_693b08cb0e.jpg#4 A little girl in a pink dress going into a > wooden cabin . > > 1001773457_577c3a7d70.jpg#0 A black dog and a spotted dog are fighting > > 1001773457_577c3a7d70.jpg#1 A black dog and a tri-colored dog playing with > each other on the road . > > 1001773457_577c3a7d70.jpg#2 A black dog and a white dog with brown spots > are staring at each other in the street . > > 1001773457_577c3a7d70.jpg#3 Two dogs of different breeds looking at each > other on the road . > > 1001773457_577c3a7d70.jpg#4 Two dogs on pavement moving toward each other . > > Thanks very much,
signature.asc
Description: This is a digitally signed message part
