Hi Michael,

  Good to know that you sorted it out :) The compatibility between the
ICU4L and Apache License is not straightforward, we would need to look
closer.
Still creating a quick ticket and sharing a github project would make it
possible to save your work, and may be of interest later on to the
community.
  Would one provide a PR against 3.x, chances are that this could be
back-ported to 2.x. Please, keep time frame in mind as 2.x may close end of
this year.

Alex

On Fri, Jan 24, 2020 at 5:20 PM Michael Greulich <d...@greulich-online.eu>
wrote:

>
> Hi Alex,
>
> well, your comment was already very helpful. I created a custom DataFormat
> and ModelFactory from the default ones for FixedLength. Of course I obeyed
> the license terms of the Apache license ;-) For some aspect of recognizing
> chars, I used the ICU4J-lib, because the support for some things (e.g.
> emojis) in the Java runtime is not up to date. The license of ICU it quite
> permitting, too. I’ve no idea, if this is a problem for an Apache project...
>
> Well I think I’m not the only one, that has this use-case -- so I  think
> this can be useful for the community, too. Currently I’m under pressure,
> but I think I will create a JIRA ticket when the stress has become less. If
> the community is interested, I can provide the code of my solution and
> would be glad if this thing goes upstream (i.e. into the camel distro) some
> day.
>
> Currently we (the company I work for) are using Camel 2.2 and I guess this
> will be the case for some time. If this feature or bug (not very determined
> what it actually is, I will leave the decision to the community)  in which
> version will it be included? Only Camel 3.x or will it be backported to 2.2?
>
> -- Mik
>
> --------------------------------------------------------------------------
> Gesendet: Freitag, 24. Januar 2020 um 11:43 Uhr
> Von: "Alex Dettinger" <aldettin...@gmail.com>
> An: users@camel.apache.org
> Betreff: Re: Bindy plus Unicode
> Hi Michael,
>
> I was just looking at this component for another purpose and it looks
> to me that fixed length tokenzation occurs here:
>
>
> https://github.com/apache/camel/blob/master/components/camel-bindy/src/main/java/org/apache/camel/dataformat/bindy/BindyFixedLengthFactory.java#L212..L216
> So, It counts in java chars and not code points. You can maybe experiment
> injecting a custom BindyFixedLengthFactory, via
> dataFormat.setModelFactory(..).
>
> Would you feel that an extension point to customize count/selection of
> chars/codepoint/grapheme would be valuable to the community, feel free to
> raise a JIRA ticket.
>
> Alex
>
>
> On Fri, Jan 24, 2020 at 9:52 AM Michael Greulich <
> mich...@greulich-online.eu>
> wrote:
>
> > Hi,
> >
> > I’m having problems with the bindy component and wonder if there is
> > something I missed. Maybe one can help me addressing it. I cannot
> believe,
> > that I’m the first to hit this problem.
> >
> > I need to port an EAI application built using bindy, that reads a fixed
> > type file(*) converts it and sends the data somewhere else. Currently
> this
> > file is in Latin 1 encoding, but we need to take it to Unicode –
> > effectively UTF-8. We have an ugly, but effectively unavoidable legacy
> > application that creates the file.
> >
> > Unicode is a bit tricky, when it comes to counting the length of a string
> > specially since Java uses internally UTF-16, which means depending on the
> > codepoint 1 – 2 (Java-)chars. Bindy seems to use internally for selection
> > substring and counts chars like Java does. This means the length of a
> > string is the count of the chars, i.e. UTF-16 surrogates, but not
> > codepoints, which is the common denominator (e.g. see definition of
> string
> > length in XMLSchema). And when one takes combing chars into account (one
> > “base char” plus 0 – n combining chars are perceived as one “char” by
> > users) it becomes even more of a problem.
> >
> > Is there a possibility to tell bindy how it counts an and selects the
> > tokens based on char counts in a given line? Any suggestions? Is the are
> > related bug or change to come that addresses this problem?
> >
> > -- Mik
> >
> > (*) This means, that on certain positions there start certain data
> > (columns if you will).
> >
> >
>
>
>

Reply via email to