Hi, I’m having problems with the bindy component and wonder if there is something I missed. Maybe one can help me addressing it. I cannot believe, that I’m the first to hit this problem.
I need to port an EAI application built using bindy, that reads a fixed type file(*) converts it and sends the data somewhere else. Currently this file is in Latin 1 encoding, but we need to take it to Unicode – effectively UTF-8. We have an ugly, but effectively unavoidable legacy application that creates the file. Unicode is a bit tricky, when it comes to counting the length of a string specially since Java uses internally UTF-16, which means depending on the codepoint 1 – 2 (Java-)chars. Bindy seems to use internally for selection substring and counts chars like Java does. This means the length of a string is the count of the chars, i.e. UTF-16 surrogates, but not codepoints, which is the common denominator (e.g. see definition of string length in XMLSchema). And when one takes combing chars into account (one “base char” plus 0 – n combining chars are perceived as one “char” by users) it becomes even more of a problem. Is there a possibility to tell bindy how it counts an and selects the tokens based on char counts in a given line? Any suggestions? Is the are related bug or change to come that addresses this problem? -- Mik (*) This means, that on certain positions there start certain data (columns if you will).