I realize I should probably provide the full picture here: The context consists of two routes where the first: ----- <from uri="<my ftp including the binary mode and charset set"> <to uri="direct-vm:another-route-that-returns nothing?timeout=300000"/>
<!-- Needed for the splitter --> <convertBodyTo type="java.lang.String"/> <split streaming="true"> <tokenize token="\n" group="5000"/> <wireTap uri="activemq:myQueue"/> </split> ----- And second: ----- <from uri=" activemq:myQueue"/> <unmarshal> <csv delimiter=";"/> </unmarshal> <bean ref="transformCSV" method="validateAndTransform"/> ----- After a lot of troubleshooting it seems that it's the splitter/tokenizer that messes up the data. It looks correct after the convertBodyTo but doesn't look ok after the tokenizer statement. Is the tokenizer doing anything here that I should be aware of? Thanks /Gustav -----Original Message----- From: Gustav Sinder [mailto:gustav.sin...@ferrologic.se] Sent: den 2 juli 2015 09:57 To: users@camel.apache.org Subject: Wrong charset when using FTP2 component, locale issue? Hi, I've got an issue with files being parsed differently in different environments...specifically handling Swedish characters. The ftp endpoint is configured with: - charset=iso-8859-1 (matches file format) - binary=true For debug purposes, I'm writing the data (in UTF-8) from a java bean, my local environment correctly outputs (hex) c3b6 for 'รถ'. Our test environment outputs (hex) efbfbdefbfbd which is clearly based on erroneously parsed data. Since the deployed code/test files is identical, is this an issue with Camel and the underlying system/locale? I'm using Apache Camel 2.12.0.redhat-610379 (as part of JBoss Fuse). My local (Linux) environment uses locale UTF-8: LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE="en_US.UTF-8" LC_MONETARY=en_US.UTF-8 LC_MESSAGES="en_US.UTF-8" LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL= Our test (Linux) environment uses POSIX: LANG= LANGUAGE= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= Thanks /Gustav