I realize I should probably provide the full picture here:

The context consists of two routes where the first:
-----
<from uri="<my ftp including the binary mode and charset set">
<to uri="direct-vm:another-route-that-returns nothing?timeout=300000"/>

<!-- Needed for the splitter -->
<convertBodyTo type="java.lang.String"/>
<split streaming="true">
        <tokenize token="\n" group="5000"/>
        <wireTap uri="activemq:myQueue"/>
</split>
-----
And second:
-----
<from uri=" activemq:myQueue"/>
<unmarshal>
        <csv delimiter=";"/>
</unmarshal>
<bean ref="transformCSV" method="validateAndTransform"/>
-----

After a lot of troubleshooting it seems that it's the splitter/tokenizer that 
messes up the data. It looks correct after the convertBodyTo but doesn't look 
ok after the tokenizer statement.

Is the tokenizer doing anything here that I should be aware of?

Thanks
/Gustav

-----Original Message-----
From: Gustav Sinder [mailto:gustav.sin...@ferrologic.se] 
Sent: den 2 juli 2015 09:57
To: users@camel.apache.org
Subject: Wrong charset when using FTP2 component, locale issue?

Hi,

I've got an issue with files being parsed differently in different 
environments...specifically handling Swedish characters.

The ftp endpoint is configured with:

-          charset=iso-8859-1 (matches file format)

-          binary=true

For debug purposes, I'm writing the data (in UTF-8) from a java bean, my local 
environment correctly outputs (hex) c3b6 for 'รถ'.
Our test environment outputs (hex) efbfbdefbfbd which is clearly based on 
erroneously parsed data.

Since the deployed code/test files is identical, is this an issue with Camel 
and the underlying system/locale?
I'm using Apache Camel 2.12.0.redhat-610379 (as part of JBoss Fuse).

My local (Linux) environment uses locale UTF-8:
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

Our test (Linux) environment  uses POSIX:
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Thanks
/Gustav

Reply via email to