I did some more tests:

  *   Linux (CentOS 7) displays the ligature correctly in the diagnostic message
  *   On Windows, I piped the error output to a file, and viewed the file using 
Notepad. The ligature displayed properly.

It looks like a limitation of the Windows Console where it doesn’t fully 
support utf-8 characters 
(https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/).
 I am on Windows 10 build 1809.

From: Beckerle, Mike <[email protected]>
Sent: Wednesday, September 18, 2019 4:48 PM
To: [email protected]
Subject: Re: Is an oe ligature okay in the value of dfdl:initiator?

That poor message is a bug then, unless there is some real reason daffodil 
cannot find the intended charset.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Ramaka, Shashi <[email protected]<mailto:[email protected]>>
Sent: Wednesday, September 18, 2019 1:45:18 PM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: RE: Is an oe ligature okay in the value of dfdl:initiator?

I changed the input so that it doesn’t contain the correct Initiator. The 
diagnostic message doesn’t display the character properly.

[cid:[email protected]]

The default code page used by the Windows Command Prompt is 437. I expected the 
diagnostic message to display the character correctly after changing the code 
page to 65001 (corresponding to UTF-8) but it did not. I am using a TrueType 
font (Lucida Console) so the font I am using does support the ligature 
character.

From: Beckerle, Mike <[email protected]<mailto:[email protected]>>
Sent: Wednesday, September 18, 2019 2:06 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Is an oe ligature okay in the value of dfdl:initiator?

If you break the data so the initiator doesnt match does the diagnostic message 
display the character properly?
Some things that dont have the charset info sometimes display data as strings 
of iso8859-1, so if that is happening the diagnostic message would not display 
this particular character but some substitute.

________________________________
From: Ramaka, Shashi <[email protected]<mailto:[email protected]>>
Sent: Wednesday, September 18, 2019 11:49:40 AM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: RE: Is an oe ligature okay in the value of dfdl:initiator?

This worked for me with utf-8 encoding. With encoding set to utf-8 in the 
schema, and the input file using utf-8 (I used Notepad++), Daffodil parsed the 
file correctly.

From: Beckerle, Mike <[email protected]<mailto:[email protected]>>
Sent: Wednesday, September 18, 2019 1:25 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Is an oe ligature okay in the value of dfdl:initiator?

Could be a bug, but we do regression testing of UTF-8 that uses lots of 
multi-byte characters and such. So I'd be surprised.

We need to see the entire example including the data bytes you are parsing so 
we can reproduce.
________________________________
From: Costello, Roger L. <[email protected]<mailto:[email protected]>>
Sent: Wednesday, September 18, 2019 1:16 PM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: Is an oe ligature okay in the value of dfdl:initiator?


Hi Mike,



I changed the encoding to utf-8:



<xs:element     name="input"
                        type="xs:string"
                        dfdl:initiator="Lecœur"
                        dfdl:encoding="utf-8"/>



I get the same error message:



[error] Parse Error: Initiator 'Lec?ur' not found



Bug in Daffodil?



/Roger



From: Beckerle, Mike <[email protected]<mailto:[email protected]>>
Sent: Wednesday, September 18, 2019 1:02 PM
To: [email protected]<mailto:[email protected]>
Subject: [EXT] Re: Is an oe ligature okay in the value of dfdl:initiator?



You have a mismatch between the character set encoding of your DFDL schema, and 
the character set encoding it says is in the data.



Is your DFDL schema in UTF-8?



The character œ doesn't exist in iso-8859-1.



If your data contains œ then the encoding must be iso-8859-15 or utf-8 or 
something that has the œ character.



I think it is a daffodil bug that you did not get a schema definition error 
when it read the string for your dfdl:initiator, but is not able to translate 
it into the encoding because some characters are illegal/unmapped. I would like 
you to have gotten "SDE: initiator contains characters undefined in encoding 
iso-8859-1: 'œ' ".





________________________________

From: Costello, Roger L. <[email protected]<mailto:[email protected]>>
Sent: Wednesday, September 18, 2019 12:43 PM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Is an oe ligature okay in the value of dfdl:initiator?



Hello DFDL community,



Here’s my input file (notice the œ ligature):



Lecœur Hello, world



Lecœur is the initiator.



Here’s my DFDL schema:



<xs:element     name="input"
                        type="xs:string"
                        dfdl:initiator="Lecœur"
                        dfdl:encoding="ISO-8859-1"/>



Running it yields this error message:



[error] Parse Error: Initiator 'Lec?ur' not found



Why am I getting this error message?



/Roger




Reply via email to