Re: Incorrect delimiter scanning when mixed encodings?

2020-04-30 Thread Beckerle, Mike
I have honestly never seen these mixed-encoding cases in real formats so I have no actual use cases. I've seen delimited binary data, but never textual mixtures. There's a reason for this. Writing software to parse such data would be very hard also. The only way such data could come up would b

Re: Incorrect delimiter scanning when mixed encodings?

2020-04-30 Thread Steve Lawrence
As Brandon points out, this might give issues to non-byte size encodings? I assume mandatory text alignment applies for these as well, which I think deals with such issues. And I assume MTA still applies even with raw byte entities. Seems like the algorithm is something like: 1) Set a mark 2)

Re: Incorrect delimiter scanning when mixed encodings?

2020-04-30 Thread Beckerle, Mike
The encoding for the delimiter is the encoding in effect on the schema component carrying the property. Making them take on contextual encodings makes things much too complicated. So yeah, I think in your case, if we're scanning for that "ยง" but we're using a decoder for ASCII, that's incorrec

Re: Incorrect delimiter scanning when mixed encodings?

2020-04-30 Thread Sloane, Brandon
Without looking at the spec, I would expect that delimiters be defined by the encoding the the element that defines the delimeter; so Daffodil is buggy in the case you describe. However, there are a couple of complications we have to consider: 1) What if instead of a terminator, we had a separa

Incorrect delimiter scanning when mixed encodings?

2020-04-30 Thread Steve Lawrence
Say we have a schema like this: http://www.w3.org/2001/XMLSchema"; xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";> http://www.ogf.org/dfdl/";> So we have a format that is all ISO-8859-1,