Just to document it for others with the same problem. When the body is passed as byte array, the bytes are correct.
public String detectEncodingByBom(@Body byte[] body) { byte[] firstThreeBytes = Arrays.copyOfRange(body, 0, 3); log.debug("3 Bytes as Hex: " + Hex.encodeHexString(firstThreeBytes)); ... } The log output for a UTF-16LE file is "fffe3c". This is the correct BOM (FFFE) and the first byte of the first character "<". Stephan -----Ursprüngliche Nachricht----- Von: Burkard Stephan Gesendet: Donnerstag, 4. Mai 2017 16:08 An: 'users@camel.apache.org' Betreff: AW: Charset on file poller endpoint Yes, a Bean is probably the best way to do the work. However, I tried to inject the exchange, get the body as InputStream and read the first 4 bytes from the body (because an InputStream is a byte representation and therefore not encoded). When I read a file that is UTF-16 (Big endian) encoded, I get the output "Hex: efbfbdef" public void determineEncoding(Exchange exchange) throws Exception { InputStream is = exchange.getIn().getBody(InputStream.class); DataInputStream dis = new DataInputStream(is); int fourBytes = dis.readInt(); String hex = Integer.toHexString(fourBytes); log.info("Hex: " + hex); } But when I read the file directly, I get the output "Hex: feff003c" public void testUtf16BeBom() throws Exception { InputStream utf16FileStream = this.getClass().getClassLoader().getResourceAsStream("testfiles/XmlUtf16Be.xml"); DataInputStream dis = new DataInputStream(utf16FileStream); int fourBytes = dis.readInt(); String hex = Integer.toHexString(fourBytes); log.info("Hex: " + hex); } The output of the direct read is correct since "feff" is the UTF-16 BE BOM, followed by "003c" which is the first character "<" in a 2-byte representation. Any idea why the output through the Camel route/Bean is wrong? Is it because the body has already be encoded (with a wrong encoding)? Thanks Stephan -----Ursprüngliche Nachricht----- Von: souciance [mailto:souciance.eqdam.ras...@gmail.com] Gesendet: Donnerstag, 4. Mai 2017 12:13 An: users@camel.apache.org Betreff: Re: Charset on file poller endpoint Probably the easiest is to read the file and send the exchange to a bean. In the bean try to read it and determine the encoding and if it has a BOM character. Finally do your conversion and put the body back to the exchange. from(file:/myDir) .to(DetermineEncoding.class, "determineEncoding") .to(activemq:queue:myQueue) On Thu, May 4, 2017 at 12:01 PM, Burkard Stephan [via Camel] < ml+s465427n5798625...@n5.nabble.com> wrote: > Hi Camel users > > I read files with a Camel file poller and they can have different > encodings (UTF-8 with or without BOM, UTF-16). Therefore I would like > to determine the given encoding and convert the message body to UTF-8 > without BOM for the further processing. > > How can I do this and what is exactly the result in the message > payload in the exchange? Is it payload an inputstream (just bytes, no > encoding) or is it already converted to a string or a reader (already > encoded). > > And what does the "charset" option change? Does it overwrite the > default encoding of the operating system? > > from(file:/myDir) > // can I read here the first bytes of the file? > .to(activemq:queue:myQueue) > > Thanks for any hints > Stephan > > > ------------------------------ > If you reply to this email, your message will be added to the > discussion > below: > http://camel.465427.n5.nabble.com/Charset-on-file-poller- > endpoint-tp5798625.html > To start a new topic under Camel - Users, email ml+s465427n465428h31@n5. > nabble.com > To unsubscribe from Camel - Users, click here > <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=unsu > bscribe_by_code&node=465428&code=c291Y2lhbmNlLmVxZGFtLnJhc2h0aUBnbWFpb > C5jb218NDY1NDI4fDE1MzI5MTE2NTY=> > . > NAML > <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macr > o_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namesp > aces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.vi > ew.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble% > 3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%2 > 1nabble%3Aemail.naml> > -- View this message in context: http://camel.465427.n5.nabble.com/Charset-on-file-poller-endpoint-tp5798625p5798627.html Sent from the Camel - Users mailing list archive at Nabble.com.