FileConsumer always reads the data in system charset/encoding

Kevin Jackson Wed, 24 Mar 2010 08:25:54 -0700

Hi,

I have a camel application deployed on RHEL5 with a default
encoding/locale of UTF-8


I have to download data from a remote Windows server (CP1251 or
ISO-8859-1/latin-1)

My route breaks down the processing of the files into two steps:
1 - download
2 - consume and pass split/tokenized String/bytes to POJOs for further
processing.

My problem stems from the fact that I don't seem to have control over
the charset that the FileConsumer uses as it converts the file into a
String.  The data contains encoded chars which are corrupted if the
data is read as UTF-8 instead of as ISO-8859-1.

I have a simple test case of a file encoded as ISO-8859-1 and I can
read it with a specific charset and this allows me to process the data
without corruption.  If I read it as UTF-8, the data is corrupted.

Is there any way I can instruct each of my FileConsumer endpoints to
consume the file using a specific charset/encoding?  I cannot change
the locale on the server to fix this as other files must be read as
UTF-8, not ISO-8859-1

I've looked at the camel source code and the way that camel consumes
files seems to rely on some kind of type coercion:

in FileBinding:
    public void loadContent(Exchange exchange, GenericFile<File> file)
throws IOException {
        try {
            content =
exchange.getContext().getTypeConverter().mandatoryConvertTo(byte[].class,
file.getFile());
        } catch (NoTypeConversionAvailableException e) {
            throw IOHelper.createIOException("Cannot load file
content: " + file.getAbsoluteFilePath(), e);
        }
    }

Is this the code that actually consumes the file and creates the
message, or should I be looking elsewhere?  I'm trying to add a
property to GenericFileEndpoint that will allow me to set a parameter
via the uri :
file://target/encoded/?charsetEncoding=ISO-8859-1

Thanks,
Kev

FileConsumer always reads the data in system charset/encoding

Reply via email to