That poor message is a bug then, unless there is some real reason daffodil cannot find the intended charset.
Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: Ramaka, Shashi <[email protected]> Sent: Wednesday, September 18, 2019 1:45:18 PM To: [email protected] <[email protected]> Subject: RE: Is an oe ligature okay in the value of dfdl:initiator? I changed the input so that it doesn’t contain the correct Initiator. The diagnostic message doesn’t display the character properly. [cid:[email protected]] The default code page used by the Windows Command Prompt is 437. I expected the diagnostic message to display the character correctly after changing the code page to 65001 (corresponding to UTF-8) but it did not. I am using a TrueType font (Lucida Console) so the font I am using does support the ligature character. From: Beckerle, Mike <[email protected]> Sent: Wednesday, September 18, 2019 2:06 PM To: [email protected] Subject: Re: Is an oe ligature okay in the value of dfdl:initiator? If you break the data so the initiator doesnt match does the diagnostic message display the character properly? Some things that dont have the charset info sometimes display data as strings of iso8859-1, so if that is happening the diagnostic message would not display this particular character but some substitute. ________________________________ From: Ramaka, Shashi <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 18, 2019 11:49:40 AM To: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: RE: Is an oe ligature okay in the value of dfdl:initiator? This worked for me with utf-8 encoding. With encoding set to utf-8 in the schema, and the input file using utf-8 (I used Notepad++), Daffodil parsed the file correctly. From: Beckerle, Mike <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 18, 2019 1:25 PM To: [email protected]<mailto:[email protected]> Subject: Re: Is an oe ligature okay in the value of dfdl:initiator? Could be a bug, but we do regression testing of UTF-8 that uses lots of multi-byte characters and such. So I'd be surprised. We need to see the entire example including the data bytes you are parsing so we can reproduce. ________________________________ From: Costello, Roger L. <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 18, 2019 1:16 PM To: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: Re: Is an oe ligature okay in the value of dfdl:initiator? Hi Mike, I changed the encoding to utf-8: <xs:element name="input" type="xs:string" dfdl:initiator="Lecœur" dfdl:encoding="utf-8"/> I get the same error message: [error] Parse Error: Initiator 'Lec?ur' not found Bug in Daffodil? /Roger From: Beckerle, Mike <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 18, 2019 1:02 PM To: [email protected]<mailto:[email protected]> Subject: [EXT] Re: Is an oe ligature okay in the value of dfdl:initiator? You have a mismatch between the character set encoding of your DFDL schema, and the character set encoding it says is in the data. Is your DFDL schema in UTF-8? The character œ doesn't exist in iso-8859-1. If your data contains œ then the encoding must be iso-8859-15 or utf-8 or something that has the œ character. I think it is a daffodil bug that you did not get a schema definition error when it read the string for your dfdl:initiator, but is not able to translate it into the encoding because some characters are illegal/unmapped. I would like you to have gotten "SDE: initiator contains characters undefined in encoding iso-8859-1: 'œ' ". ________________________________ From: Costello, Roger L. <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 18, 2019 12:43 PM To: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: Is an oe ligature okay in the value of dfdl:initiator? Hello DFDL community, Here’s my input file (notice the œ ligature): Lecœur Hello, world Lecœur is the initiator. Here’s my DFDL schema: <xs:element name="input" type="xs:string" dfdl:initiator="Lecœur" dfdl:encoding="ISO-8859-1"/> Running it yields this error message: [error] Parse Error: Initiator 'Lec?ur' not found Why am I getting this error message? /Roger
