This email is FYI only. You can skip it unless you care about this specific development topic.
I have made quite a lot of progress this past week so I thought it worth reporting about given that fixing these issues so that EDIFACT and TLOG can work has been delayed for so long. There are numerous JIRA tickets associated with this problem area: DAFFODIL-1080, DAFFODIL-1976, DAFFODIL-1886, DAFFODIL-1919, DAFFODIL-110 On my branch, separated sequences have been substantially revised, and I hope to get this into code review as a PR in a few days. I added a new property daf:emptyElementPolicy intended to control whether Daffodil implements the DFDL spec, or bends the rules in order to be compatible with IBM DFDL so that we can run their published DFDL schemas on github. The status of my daffodil-1080-sep branch is that all tests in daffodil-test pass. There are exactly 3 errors in daffodil-test-ibm1 test_AX000 test_ptLax1rt The above fail because daffodil doesn't implement the behavior where empty strings are only created for optional string elements when there is some non-zero-length syntax defined by dfdl:emptyValueDelimiterPolicy and initiator/terminator. Daffodil is creating a empty string value here based on just the presence of a separator, which is incorrect. When dfdl:separatorSuppressionPolicy is trailingEmpty (or trailingEmptyStrict), then this should NOT create an empty string value. It should just tolerate the separator (or not for trailingEmptyStrict) test_ptg3_1p_ibm_daf The above fails because in the new daf:emptyElementPolicy noEmptyElements mode, daffodil does not cause a processing error on a required (scalar or required array element < minOccurs) string element that has empty-string as its value. This causes a parse error on IBM DFDL, and the daf:emptyElementPolicy of noEmptyElements is supposed to be compatible with this. (In addition if a default value is specified, then we need to produce a runtime SDE, so that this will not backtrack. Also consistent with IBM DFDL behavior.) Right now daffodil is creating empty-string elements here. Which it shouldn't be doing in this compatibility noEmptyElements mode, but in regular emptyElementPolicy="emptyElements" this would be correct behavior. I believe fixing the above will fix several of the regressions on published DFDL schemas also. This change set is extensive enough that I also ran all the published DFDL schemas from DFDLSchemas site on github (and iCalendar as well) Published Schema Regressions: iCalendar - now gets a SDE - implicit with unbounded maxOccurs only allowed on last declared element of sequence. This is not due to my changes, but a check that has been added recently. mil-std-2045 - 2 tests fail. One is Terminator 7F not found, the other is empty children related: expected 5 children got 3. Probably same issue as identified above for one of the daffodil-test-ibm1 tests. png - many tests fail. All for same reason: expected 1 child got 0. Probably same issue as identified above for one of the daffodil-test-ibm1 tests. (Also bmp - fails with java out of heap space, but that was true of 2.3.0 released version of Daffodil - see DAFFODIL-2118) Now of course the objective of these separated sequence changes is to get more published DFDL schemas to run. Specifically, EDIFACT, and ibm4690-TLog (aka TLOG). Progress on EDIFACT * The one test fails for same reason as test_ptg3_1p_ibm_daf, or at least that is what it is currently clearly failing on. It runs and produces an infoset. Note: EDIFACT takes like a minute+ to compile the schema. Ugh. Progress on TLOG * 2 of 5 tests pass * 3 others fail - reasons as yet unanalyzed. They run, and produce infosets. Those infosets aren't the same as what is expected. Final point: performance - the unparser for separated sequences with separator suppression uses some pretty heavy-weight techniques - it creates suspendable unparsers for the separators that might be suppressed. The performance implications of this are as yet unexamined. I've been focused on just getting the behavior to be right first.