Re: Ruta 2.7.0 SeedLexer issue with special unicode characters

2019-09-26 Thread Mario Juric
Hi Peter, I opened this issue: https://issues.apache.org/jira/browse/UIMA-6131 Cheers, Mario > On 26 Sep 2019, at 16:15 , Peter Klügl wrote: > > Hi, > > > there is no reason. > > > I think there are two, I remember. One in the test utils, which can be > replaced, and one in

Re: Ruta 2.7.0 SeedLexer issue with special unicode characters

2019-09-26 Thread Peter Klügl
Hi, there is no reason. I think there are two, I remember. One in the test utils, which can be replaced, and one in the generated code of the jflex lexer, which I do not know if it can be replaced. Maybe a newer version of jflex avoids this? Or, it could be catched and wrapped in an

Re: Ruta 2.7.0 SeedLexer issue with special unicode characters

2019-09-25 Thread Mario Juric
Hi Peter, Just one more thing that came to my mind. Is there a particular reason for throwing a java.lang.Error instead of an exception? Normally that is something only thrown by the JVM when it’s really impossible to continue the process, e.g. out of memory, linkage errors or fatal VM

Re: Ruta 2.7.0 SeedLexer issue with special unicode characters

2019-09-23 Thread Mario Juric
Thanks Peter, I will await your confirmation of the fix, but I guess we will then stick with 2.6.1 until the next Ruta release :) Cheers, Mario > On 20 Sep 2019, at 18:09 , Peter Klügl wrote: > > Hi Mario, > > > I did not have the chance to have a look at your example yet... > >

Re: Ruta 2.7.0 SeedLexer issue with special unicode characters

2019-09-20 Thread Peter Klügl
Hi Mario, I did not have the chance to have a look at your example yet... Most likely, this problem is already fixed in the current trunk, but I was not able to find the time for a new release. In 2.7.0, there was a small modification in the lexer rules for the seeding, which had unfortunately

Ruta 2.7.0 SeedLexer issue with special unicode characters

2019-09-19 Thread Mario Juric
Hi Peter,After upgrading to Ruta 2.7.0 a while ago we started getting some errors from the SeedLexer, which we didn’t have before. It appears related to odd unicode characters that we haven’t cleaned properly upstream, but it is consumed by the previous version 2.6.1 where our pipeline completes