In the test "OddDocumentText", this produces a "throw" due to an invalid xml
char, which is the \u0002.
This is in part because the xml version being used is xml 1.0.
XML 1.1 expanded the set of valid characters to include \u0002.
Here's a snip from the XmiCasSerializerTest class which
here's an idea.
If you have a string, with the surrogate pair at position 10, and you
have some Java code, which is iterating through the string and getting the
code-point at each character offset, then that code will produce:
at position 10: the code-point 77987
at position 11: the
Hi,
the REGEXP condition is only a boolean function without "side effects".
You could solve your use case in Ruta with simple regex rules. You need
to use some rules which do not depend on annotations for matching in
order to create smaller annotations. Something like:
DECLARE
Hi Mario,
I did not have the chance to have a look at your example yet...
Most likely, this problem is already fixed in the current trunk, but I
was not able to find the time for a new release. In 2.7.0, there was a
small modification in the lexer rules for the seeding, which had
unfortunately
Thanks Marshall,
Encoding the characters like you suggest should work just fine for us as long
as we can serialize and deserialise the XMI, so that we can open the content in
a tool like the CVD or similar. These characters are just noise from the
original content that happen to remain in the
Hi All,
I have a question about REGEXP. I would like to extract a digit (e.g. the third
one) in a number (NUM). Could I use REGEXP to get the result of a matched group
(something like NUM{REGEXP("^\\d\\d(\\d)")})?
Any hint would be greatly appreciated! Thanks in advance!
Baoli