Hello,

we have a detokenizer dict for englisch. It can be found in
opennlp-tools/lang/en/tokenizer

HTH,
Jörn

On Wed, 2015-03-11 at 14:44 +0000, Xingxing Zhang wrote:
> Hi All,
> 
> I have a text file, which is tokenized with OpenNLP tokenizer.  I am not
> sure which tokenizer did they use (probably TokenizerME).
> Is it possible to detokenize the file?
> I noticed there is a Detokenizer class, but can anyone provide a usable
> detokenize dictionary?
> https://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.tokenizer.detokenizing
> 
> 
> the first ten lines of the file is as follows:
> 
> 1000268201_693b08cb0e.jpg#0 A child in a pink dress is climbing up a set of
> stairs in an entry way .
> 
> 1000268201_693b08cb0e.jpg#1 A girl going into a wooden building .
> 
> 1000268201_693b08cb0e.jpg#2 A little girl climbing into a wooden playhouse .
> 
> 1000268201_693b08cb0e.jpg#3 A little girl climbing the stairs to her
> playhouse .
> 
> 1000268201_693b08cb0e.jpg#4 A little girl in a pink dress going into a
> wooden cabin .
> 
> 1001773457_577c3a7d70.jpg#0 A black dog and a spotted dog are fighting
> 
> 1001773457_577c3a7d70.jpg#1 A black dog and a tri-colored dog playing with
> each other on the road .
> 
> 1001773457_577c3a7d70.jpg#2 A black dog and a white dog with brown spots
> are staring at each other in the street .
> 
> 1001773457_577c3a7d70.jpg#3 Two dogs of different breeds looking at each
> other on the road .
> 
> 1001773457_577c3a7d70.jpg#4 Two dogs on pavement moving toward each other .
> 
> Thanks very much,

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to