This limitation of the windows command line is something we ran into a long time back, and I recall we have instructions
" Microsoft Windows Users Daffodil depends on Unicode support. Some MS-Windows versions do not come with Unicode support by default. Daffodil has tests that make use of Japanese Kanji characters, so requires the Japanese Language Pack be installed." Which are on this wiki page - which is no longer the front page of the wiki. This page seems to be lost to the ages. https://cwiki.apache.org/confluence/display/DAFFODIL/For+Contributors I created ticket https://issues.apache.org/jira/browse/DAFFODIL-2206 which is to put this info onto the user web site, not buried in the wiki. ________________________________ From: Ramaka, Shashi <[email protected]> Sent: Thursday, September 19, 2019 1:41 PM To: [email protected] <[email protected]> Subject: RE: Is an oe ligature okay in the value of dfdl:initiator? I did some more tests: * Linux (CentOS 7) displays the ligature correctly in the diagnostic message * On Windows, I piped the error output to a file, and viewed the file using Notepad. The ligature displayed properly. It looks like a limitation of the Windows Console where it doesn’t fully support utf-8 characters (https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/). I am on Windows 10 build 1809. From: Beckerle, Mike <[email protected]> Sent: Wednesday, September 18, 2019 4:48 PM To: [email protected] Subject: Re: Is an oe ligature okay in the value of dfdl:initiator? That poor message is a bug then, unless there is some real reason daffodil cannot find the intended charset. Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: Ramaka, Shashi <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 18, 2019 1:45:18 PM To: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: RE: Is an oe ligature okay in the value of dfdl:initiator? I changed the input so that it doesn’t contain the correct Initiator. The diagnostic message doesn’t display the character properly. [cid:[email protected]] The default code page used by the Windows Command Prompt is 437. I expected the diagnostic message to display the character correctly after changing the code page to 65001 (corresponding to UTF-8) but it did not. I am using a TrueType font (Lucida Console) so the font I am using does support the ligature character. From: Beckerle, Mike <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 18, 2019 2:06 PM To: [email protected]<mailto:[email protected]> Subject: Re: Is an oe ligature okay in the value of dfdl:initiator? If you break the data so the initiator doesnt match does the diagnostic message display the character properly? Some things that dont have the charset info sometimes display data as strings of iso8859-1, so if that is happening the diagnostic message would not display this particular character but some substitute. ________________________________ From: Ramaka, Shashi <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 18, 2019 11:49:40 AM To: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: RE: Is an oe ligature okay in the value of dfdl:initiator? This worked for me with utf-8 encoding. With encoding set to utf-8 in the schema, and the input file using utf-8 (I used Notepad++), Daffodil parsed the file correctly. From: Beckerle, Mike <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 18, 2019 1:25 PM To: [email protected]<mailto:[email protected]> Subject: Re: Is an oe ligature okay in the value of dfdl:initiator? Could be a bug, but we do regression testing of UTF-8 that uses lots of multi-byte characters and such. So I'd be surprised. We need to see the entire example including the data bytes you are parsing so we can reproduce. ________________________________ From: Costello, Roger L. <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 18, 2019 1:16 PM To: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: Re: Is an oe ligature okay in the value of dfdl:initiator? Hi Mike, I changed the encoding to utf-8: <xs:element name="input" type="xs:string" dfdl:initiator="Lecœur" dfdl:encoding="utf-8"/> I get the same error message: [error] Parse Error: Initiator 'Lec?ur' not found Bug in Daffodil? /Roger From: Beckerle, Mike <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 18, 2019 1:02 PM To: [email protected]<mailto:[email protected]> Subject: [EXT] Re: Is an oe ligature okay in the value of dfdl:initiator? You have a mismatch between the character set encoding of your DFDL schema, and the character set encoding it says is in the data. Is your DFDL schema in UTF-8? The character œ doesn't exist in iso-8859-1. If your data contains œ then the encoding must be iso-8859-15 or utf-8 or something that has the œ character. I think it is a daffodil bug that you did not get a schema definition error when it read the string for your dfdl:initiator, but is not able to translate it into the encoding because some characters are illegal/unmapped. I would like you to have gotten "SDE: initiator contains characters undefined in encoding iso-8859-1: 'œ' ". ________________________________ From: Costello, Roger L. <[email protected]<mailto:[email protected]>> Sent: Wednesday, September 18, 2019 12:43 PM To: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: Is an oe ligature okay in the value of dfdl:initiator? Hello DFDL community, Here’s my input file (notice the œ ligature): Lecœur Hello, world Lecœur is the initiator. Here’s my DFDL schema: <xs:element name="input" type="xs:string" dfdl:initiator="Lecœur" dfdl:encoding="ISO-8859-1"/> Running it yields this error message: [error] Parse Error: Initiator 'Lec?ur' not found Why am I getting this error message? /Roger
