Thanks' Mike, So, it would seem that the backtracking code would need to be updated not to remove SDWs. It seems like the backtracking code would only be active during parsing (runtime) - not compiling - so we could probably just alter the backtrack code to remove everything except SDWs from the Pstate. I'm not familiar with that - would there be a structural issue discarding everything except SDWs? Or, would you recommend using another method of reporting the regex limit message?
-----Original Message----- From: Beckerle, Mike [mailto:mbecke...@owlcyberdefense.com] Sent: Monday, December 28, 2020 12:15 PM To: dev@daffodil.apache.org Subject: Re: How to add warnings that are not lost due to backtracking So, I wanted to clarify a few things. Then I think I agree we want runtime-issued SDW to not be lost when backtracking. An SDE, or schema definition error, is most commonly the Daffodil schema compiler telling you your schema isn't meaningful, so parsing/unparsing cannot even be started. We divide up Daffodil into "compiling the schema" or "compile time" and runtime (parse/unparse time). Some SDEs cannot be detected until runtime, but SDEs are always fatal. I.e, there is never any backtracking from them, because they mean there is something wrong with your DFDL schema. Processing errors (parse error or unparse error) are errors where your schema is meaningful but the data doesn't match the schema. Some parse errors are a normal part of parsing as they are suppressed by backtracking to try other alternatives. Schema-definition Warnings (SDW) are not parse errors but the warning version of a SDE. I.e., they suggest a possible error in the schema. SDWs detected at compile time are always output by the compiler. If an SDW is issued at runtime, there is an interesting question of should those be suppressed by backtracking? I don't know of runtime SDWs off hand, I searched the source for them, but found only one possibility where the SDW could be issued at runtime. Which is this code in DState.scala: private def isAnArray(): Boolean = { if (!currentNode.isInstanceOf[DIArray]) { Assert.invariant(errorOrWarn.isDefined) if (currentNode.isInstanceOf[DIElement]) { errorOrWarn.get.SDW(WarnID.PathNotToArray, "The specified path to element %s is not to an array. Suggest using fn:exists instead.", currentElement.name) } else { errorOrWarn.get.SDW(WarnID.PathNotToArray, "The specified path is not to an array. Suggest using fn:exists instead.") } false } else { true } } This does get called at runtime. I just would expect path expressions to be compiled and this to have been checked already at compilation time, which should render this runtime check unnecessary, I think. I did not find a test that produces this warning message. I think a SDW that is warning about a implementation limit like regex match length limit, being reached, should not be suppressed by backtracking. As you pointed out, such a warning could be telling you about the reason for the backtracking, and suppressing the warning means you would not be able to diagnose why the backtracking is occurring. Calling these implementation limit hits "schema definition" warnings is ok with me, because the schema goes along with the tunables like the max regex size limit. Both are static things that the data must comply with for parse/unparse to be successful. I imagine that if you just add an SDW call at runtime, it will put the warning onto the diagnostics in the PState, and they will be discarded on backtracking, but probably that should not happen for runtime SDWs, only for parse errors. -mikeb ________________________________ From: Larry Barber <larry.bar...@nteligen.com> Sent: Friday, December 18, 2020 2:59 PM To: dev@daffodil.apache.org <dev@daffodil.apache.org> Subject: RE: How to add warnings that are not lost due to backtracking I actually ran into this problem with parsing a large jpeg file. I thought that I had uncovered a bug because the file was not being parsed correctly. Once it was pointed out to me, the problem was solved by changing the tunable to increase the REGex search length, the file parsed as expected. The REGex search failure caused (erroneous) backtracking, so I need to see the information about the search failing. This is part of Daffodil-412, which required a 2 part solution. The tunable was implemented for the first part, but the second part - the warning message was not. If SDW is not the way to go, I'd be happy to work with another suggestion. From: Carlson, Ian [mailto:icarl...@owlcyberdefense.com] Sent: Friday, December 18, 2020 2:37 PM To: dev@daffodil.apache.org Subject: RE: How to add warnings that are not lost due to backtracking I'm still new at this - but I've found a great way to learn is to invite people to tell me I'm wrong, so here's my two cents. SDE in particular is generally used to tell the parser that something has gone wrong. This invites the parser to either back up to the most recent point of uncertainty and try another path or fail completely if none exists. That's how we select one branch over another in the cases where there are multiple possible paths. If we do select a path that turns out to be invalid, we generally don't want those errors to propagate back up the chain, since they are for a "path not taken" and failing in a way that leads us to the correct path is both expected and desired behavior. By extension, warnings encountered on our "path not taken" also get discarded since. For instance, if we have a regex failure looking for the length of a discriminator that ultimately doesn't exist because this is an invalid path - that isn't really a failure at the top level. So using SDW for a global "something weird you might want to examine" sort of warning is somewhat at odds with the way SDE and SDW are usually used. Our runtime does generate quite a bit of text - so simply printing to console for a warning is likely to be missed. If we want to have a sort of global log that doesn't get cleared, but also isn't mingled with the runtime console output - we may need a new facility for that. Side note - there are certain classes of diagnostics around choice branches that don't get discarded currently, which may cause some warnings and errors to escape even though we output a successful infoset. Ticket 2399 discusses this issue, and a partial attempt at a fix is languishing WIP on https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-daffodil%2Fpull%2F444&data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751199918%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=O7PvPW3%2B%2B7geK4FnFZbE%2Bv8fPcDpqsUPvk%2FsHPpJeHw%3D&reserved=0<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-daffodil%2Fpull%2F444&data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QwMTtKw%2FwRyUjjTLeRLGNNevVcuxrYgq1oy26O56vHc%3D&reserved=0>. The short version being that I wouldn't want to rely on any information from SDW or SDE escaping a "path not taken" once that fix is in place. [A picture containing object, clock Description automatically generated] Ian Carlson | Software Engineer [Owl Cyber Defense] W icarl...@owlcyberdefense.com<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vmzsHrJvQHchI%2BT%2Fpdc650Hy4t6bsRCUEAWYjw7%2BZuA%3D&reserved=0> Connect with us! Facebook<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.facebook.com%2Fowlcyberdefense%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ybumIVSPrM78R6H2xb9zDrEtGRIpjfNNgUR1sE%2FMUqo%3D&reserved=0> | LinkedIn<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fowlcyberdefense%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=P%2FxXmQfAK9dR8TRObM%2FDIxvUgIZbgEpsPnidsUEuMKI%3D&reserved=0> | Twitter<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Ftwitter.com%2Fowlcyberdefense&data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=elegVzECF0zrulnfq%2Fiq%2B%2B0VWR8wGzwLLdaj9XJQdo8%3D&reserved=0> [Find us at our next event. Click Here.]<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2Fresources%2Fevents%2F%3Futm_source%3Dowl-cyber-defense%26utm_medium%3Demail%26utm_content%3Dbanner%26utm_campaign%3D2020-events&data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751219915%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QsycqteQLntlaDfUDc9Cr7DhftxxjVgnZI0y%2BBpVWPM%3D&reserved=0> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed. If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited. If you have received this transmission in error, please notify the sender immediately From: Larry Barber<mailto:larry.bar...@nteligen.com> Sent: Friday, December 18, 2020 12:49 PM To: dev@daffodil.apache.org<mailto:dev@daffodil.apache.org> Subject: How to add warnings that are not lost due to backtracking I hoping someone could give me some pointers on adding a warning message to the Daffodil io code. I'm looking at Daffodil-412 and want to generate a warning message when the REGex search gets expanded and another if it exceeds the tunable for maximum length. I've located the code that does these expansions in io/InputSourceDaraInputStream.scala, but I'm unsure how to generate the warning messages. I don't see any other warning messages being generated in the io code. I've seen several instances in core that just use SDW(...) and others in DSOM that use context.SDW(...), but I'm confused about this - I'm afraid that this method buffers warnings and throws them away in the case of backtracking. Since the REGex search may be the cause of backtracking, I think these warnings need to be presented always. I'm just not sure of the proper way to access SDW in this situation and need to make sure that the messages will not be discarded.