Hi Michael, You would need to open a PR against master. Please, find some helpful information around contributions https://camel.apache.org/manual/latest/contributing.html.
I'm sure ICU4J is functionally great. However, license compatibility is a legal matter, we don't really have choice. Could you please point to the ICU4J license you've been using ? I could have a try with checking the compatibility. Alex On Sat, Jan 25, 2020 at 5:42 PM <d...@greulich-online.eu> wrote: > > Hi Alex, > > well, which would then be the appropriate branch? Master or 3.x? > I guess if i create a ticket I get informed by e-mail what happens to the > thing, right? > I think there could be a ticket + PR in the next two weeks. > > I word on ICU4J. Of course I understand, that an Apache Project has to be > careful, but there > are features like splitting strings into graphemes, that need features, > the old logic in the JDK > doesn't support. The lib is very common (e.g. LibreOffice uses it) and > AFAIK the de-facto standard > for working with elaborate Unicode. > > -- Mik > > ---- > Gesendet: Freitag, 24. Januar 2020 um 19:15 Uhr > Von: "Alex Dettinger" <aldettin...@gmail.com> > An: users@camel.apache.org > Betreff: Re: Re: Bindy plus Unicode > Hi Michael, > > Good to know that you sorted it out :) The compatibility between the > ICU4L and Apache License is not straightforward, we would need to look > closer. > Still creating a quick ticket and sharing a github project would make it > possible to save your work, and may be of interest later on to the > community. > Would one provide a PR against 3.x, chances are that this could be > back-ported to 2.x. Please, keep time frame in mind as 2.x may close end of > this year. > > Alex > > On Fri, Jan 24, 2020 at 5:20 PM Michael Greulich <d...@greulich-online.eu> > wrote: > > > > > Hi Alex, > > > > well, your comment was already very helpful. I created a custom > DataFormat > > and ModelFactory from the default ones for FixedLength. Of course I > obeyed > > the license terms of the Apache license ;-) For some aspect of > recognizing > > chars, I used the ICU4J-lib, because the support for some things (e.g. > > emojis) in the Java runtime is not up to date. The license of ICU it > quite > > permitting, too. I’ve no idea, if this is a problem for an Apache > project... > > > > Well I think I’m not the only one, that has this use-case -- so I think > > this can be useful for the community, too. Currently I’m under pressure, > > but I think I will create a JIRA ticket when the stress has become less. > If > > the community is interested, I can provide the code of my solution and > > would be glad if this thing goes upstream (i.e. into the camel distro) > some > > day. > > > > Currently we (the company I work for) are using Camel 2.2 and I guess > this > > will be the case for some time. If this feature or bug (not very > determined > > what it actually is, I will leave the decision to the community) in which > > version will it be included? Only Camel 3.x or will it be backported to > 2.2? > > > > -- Mik > > > > > -------------------------------------------------------------------------- > > Gesendet: Freitag, 24. Januar 2020 um 11:43 Uhr > > Von: "Alex Dettinger" <aldettin...@gmail.com> > > An: users@camel.apache.org > > Betreff: Re: Bindy plus Unicode > > Hi Michael, > > > > I was just looking at this component for another purpose and it looks > > to me that fixed length tokenzation occurs here: > > > > > > > https://github.com/apache/camel/blob/master/components/camel-bindy/src/main/java/org/apache/camel/dataformat/bindy/BindyFixedLengthFactory.java#L212..L216 > > So, It counts in java chars and not code points. You can maybe experiment > > injecting a custom BindyFixedLengthFactory, via > > dataFormat.setModelFactory(..). > > > > Would you feel that an extension point to customize count/selection of > > chars/codepoint/grapheme would be valuable to the community, feel free to > > raise a JIRA ticket. > > > > Alex > > > > > > On Fri, Jan 24, 2020 at 9:52 AM Michael Greulich < > > mich...@greulich-online.eu> > > wrote: > > > > > Hi, > > > > > > I’m having problems with the bindy component and wonder if there is > > > something I missed. Maybe one can help me addressing it. I cannot > > believe, > > > that I’m the first to hit this problem. > > > > > > I need to port an EAI application built using bindy, that reads a fixed > > > type file(*) converts it and sends the data somewhere else. Currently > > this > > > file is in Latin 1 encoding, but we need to take it to Unicode – > > > effectively UTF-8. We have an ugly, but effectively unavoidable legacy > > > application that creates the file. > > > > > > Unicode is a bit tricky, when it comes to counting the length of a > string > > > specially since Java uses internally UTF-16, which means depending on > the > > > codepoint 1 – 2 (Java-)chars. Bindy seems to use internally for > selection > > > substring and counts chars like Java does. This means the length of a > > > string is the count of the chars, i.e. UTF-16 surrogates, but not > > > codepoints, which is the common denominator (e.g. see definition of > > string > > > length in XMLSchema). And when one takes combing chars into account > (one > > > “base char” plus 0 – n combining chars are perceived as one “char” by > > > users) it becomes even more of a problem. > > > > > > Is there a possibility to tell bindy how it counts an and selects the > > > tokens based on char counts in a given line? Any suggestions? Is the > are > > > related bug or change to come that addresses this problem? > > > > > > -- Mik > > > > > > (*) This means, that on certain positions there start certain data > > > (columns if you will). > > > > > > > > > > > > > > >