So Owl has a ZipLayer - for messages that contain the bytes of a zip file. As part of exercising these new Daffodil Layer APIs, I have rewritten this Zip Layer.
I have written a fuzz tester, the objective being to try to crash the layers so they throw bad exceptions like NPE or other ill-behaved things. E Voila.... I am able to get an SDE thrown by the InfosetWalker after the parse is completed. It took 100,000 trials to get it to happen (only a few seconds), but sure enough. So fuzz testing is super helpful. Turns out we throw a RemapPUACharDetected if data contains Unicode PUA characters. Fuzz testing is causing the zip-layer to read some data as unicode PUA characters, and when the InfosetOutputter is projecting that string to XML, it catches this exception and escalates it to an SDE. So bug 1 is that we can't throw an exception when a PUA character is encountered. That's easily remedied by changing a flag already in the code. Pre-existing PUA characters can just be passed through. Changing this breaks no tests. The fact that we remap some XML illegal characters into the PUA means there's some chance that some kinds of data won't round-trip parse/unparse or unparse/parse due to this, but that's just par-for-the-course when dealing with XML as a data format. I will create a test for this, a separate ticket for it, and fix it separately from the Layer changes. But bug 2 is how the String-as-XML feature works. Turns out the catch of Exceptions in the InfosetOutputter gets exercised by the Daffodil "String-as-XML" extension on negative tests. When the string of data, which is supposed to be XML, turns out to be malformed, an exception gets thrown such as WstxUnexpectedCharException. That gets escalated to an SDE by the InfosetOutputter and a few tests check for this. This needs to be a ParseError. It's a problem with the data, not the schema, so an SDE is really incorrect here. I think we have to do this conversion in the parser so that if the XL is malformed we can make it a ParseError. How to make this work is a bit tricky. This is very out of scope of the Layering changes. But there is a real issue here with String-as-XML feature being unable to backtrack on a data error like the XML string is not well-formed XML. I'll report that as a separate issue.