Re: Migrating type system of form 6 compressed CAS binaries

2019-09-20 Thread Marshall Schor
In the test "OddDocumentText", this produces a "throw" due to an invalid xml char, which is the \u0002. This is in part because the xml version being used is xml 1.0. XML 1.1 expanded the set of valid characters to include \u0002. Here's a snip from the XmiCasSerializerTest class which

Re: Migrating type system of form 6 compressed CAS binaries

2019-09-20 Thread Marshall Schor
here's an idea. If you have a string, with the surrogate pair at position 10, and you have some Java code, which is iterating through the string and getting the code-point at each character offset, then that code will produce: at position 10:  the code-point 77987 at position 11:  the

Re: a question about REGEXP

2019-09-20 Thread Peter Klügl
Hi, the REGEXP condition is only a boolean function without "side effects". You could solve your use case in Ruta with simple regex rules. You need to use some rules which do not depend on annotations for matching in order to create smaller annotations. Something like: DECLARE

Re: Ruta 2.7.0 SeedLexer issue with special unicode characters

2019-09-20 Thread Peter Klügl
Hi Mario, I did not have the chance to have a look at your example yet... Most likely, this problem is already fixed in the current trunk, but I was not able to find the time for a new release. In 2.7.0, there was a small modification in the lexer rules for the seeding, which had unfortunately

Re: Migrating type system of form 6 compressed CAS binaries

2019-09-20 Thread Mario Juric
Thanks Marshall, Encoding the characters like you suggest should work just fine for us as long as we can serialize and deserialise the XMI, so that we can open the content in a tool like the CVD or similar. These characters are just noise from the original content that happen to remain in the

a question about REGEXP

2019-09-20 Thread B. Li
Hi All, I have a question about REGEXP. I would like to extract a digit (e.g. the third one) in a number (NUM). Could I use REGEXP to get the result of a matched group (something like NUM{REGEXP("^\\d\\d(\\d)")})? Any hint would be greatly appreciated! Thanks in advance! Baoli