I've seen encoding issues similar to this when running on Windows. One
potential cause is however you're getting the XML into a file (e.g. copy
paste, redirection in a shell), windows might be messing with the
encoding and creating XML that isn't encoded as UTF-8, but is something
else. If the XML is wrong, the unparsed output will be wrong too.

So in addition to the full schema, it might also be helpful to attach
the actual XML file that you are unparsing and we can see what the
encoding of that file is.

- Steve

On 5/13/19 5:15 PM, Sloane, Brandon wrote:
> Roger,
> 
> 
> I am unable to reproduce this. Can you post a complete schema?
> 
> 
> Looking at your output, the only thing that jumps out to me is that the 
> problem 
> is 83 C2 being inserted between each character. My guess is you are setting 
> some 
> property that changes how strings are encoded, but nothing jumps out at me as 
> being able to cause this type of encoding behavior.
> 
> 
> Below is the schema I tried which does not reproduce this problem.
> 
> 
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
>             xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
>             xmlns:tns="urn:a"
>             xmlns:ex="http://example.com";
>             xmlns:fn="http://www.w3.org/2005/xpath-functions";
>             targetNamespace="urn:a" >
>    <xs:include 
> schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
> 
>     <xs:annotation>
>      <xs:appinfo source="http://www.ogf.org/dfdl/";>
>        <dfdl:format ref="tns:GeneralFormat"/>
>     </xs:appinfo>
>    </xs:annotation>
> 
> 
> <xs:element name="UTF-8">
>      <xs:complexType>
>          <xs:sequence>
>              <xs:element name="string" type="xs:string" dfdl:encoding="utf-8" 
> dfdl:lengthKind="pattern" dfdl:lengthPattern=".*" />
>              <xs:element name="length" type="xs:integer"
>                                         dfdl:inputValueCalc="{ 
> fn:string-length(../string) }" />
>          </xs:sequence>
>      </xs:complexType>
> </xs:element>
> 
> </xs:schema>
> 
> 
> --------------------------------------------------------------------------------
> *From:* Costello, Roger L. <[email protected]>
> *Sent:* Monday, May 13, 2019 2:38:03 PM
> *To:* [email protected]
> *Subject:* Strange behavior with dfdl:encoding
> 
> Hello DFDL community,
> 
> My input is a single UTF-8 string. Parsing the input generates the expected 
> XML 
> document, but unparsing the XML results in a totally different string. Below 
> is 
> a graphic showing the input, parsing results, and unparsing results. Under it 
> are the actual hex bytes. Note how the bytes for the input are very different 
> than the bytes for the unparse results. Why such differences between the 
> input 
> and the parse output?  At the bottom is my DFDL schema. /Roger
> 
> <xs:elementname="UTF-8">
> <xs:complexType>
> <xs:sequence>
> <xs:elementname="string"type="xs:string"dfdl:encoding="utf-8"/>
> <xs:elementname="length"type="xs:integer"
>                                         dfdl:inputValueCalc="{ 
> fn:string-length(../string) }"/>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
> 

Reply via email to