On Thursday, February 28, 2013 at 5:26 PM, Jörn Kottmann wrote:
> Hmm, pretty sure there is an encoding mismatch, do you know which > encoding is used by > your JVM? I would guess that is not UTF-8. You can probably get around > the issue by re-encoding the input > file to the encoding the JVM is using. > > Have a look here: > http://stackoverflow.com/questions/1749064/how-to-find-default-charset-encoding-in-java > > Would be nice if you can run the println statements there. > > Jörn Where ever this comes from .. $ java CharsetTest Default Charset=US-ASCII file.encoding=Latin-1 Default Charset=US-ASCII Default Charset in Use=ASCII $ echo $JAVA_TOOL_OPTIONS (empty) $ export JAVA_TOOL_OPTIONS='-Dfile.encoding=UTF8' $ java CharsetTest Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8 Default Charset=UTF-8 file.encoding=Latin-1 Default Charset=UTF-8 Default Charset in Use=UTF8 But this change itself didn't help .. output remains unchanged, so i took the road down to dirty-hack-land, applying the following change to bin/opennlp - for sure not how it should be .. but works at least for the moment: -$JAVACMD -Xmx1024m -jar $OPENNLP_HOME/lib/opennlp-tools-*.jar $@ +$JAVACMD -Xmx1024m -Dfile.encoding=UTF8 -jar $OPENNLP_HOME/lib/opennlp-tools-*.jar $@
