Unfortunately PonyMail doesn't render HTML formatted emails well that contain inline images and formatting. This AM's response attached in HTML format: "how to incorporate file terminator into generic CSV schema .html"
---------- Forwarded message --------- From: attila horvath <[email protected]> Date: Tue, Jun 22, 2021 at 5:53 AM Subject: Re: how to incorporate file terminator into generic CSV schema? To: <[email protected]>, Beckerle, Mike < [email protected]> I looked at and considered TDML (link) <https://daffodil.apache.org/tdml/> but could not find a convenient tool to facilitate the generation of a TDML file from inputs so instead... I've attached the relevant ~.dfdl.xsd and input/output ~.csv files. After adjusting for respective file paths, below are the results of: parse, unparse, diff, and xmllint respectively where 'diff' [without the '-qs' options] indicates 'No newline at end of file' error... --------------------------------------------------- *+ daffodil parse --validate=on -s csv-version4.dfdl.xsd -r csv-version4.dfdl.xsd -Dheader=present -o out/_-_home_-_attila_-_CDES_-_trunk_-_perstempo_-_data_-_JITC_-_CT_-_USSOCOM_PERSTEMPO_CT1.xml /home/attila/CDES/trunk/perstempo/data/JITC/CT/USSOCOM_PERSTEMPO_CT1.csv**+ parse_exit_code=0**+ daffodil unparse --validate=on -s csv-version4.dfdl.xsd -r csv-version4.dfdl.xsd '-DSeparator=|' header=present -o out/_-_home_-_attila_-_CDES_-_trunk_-_perstempo_-_data_-_JITC_-_CT_-_USSOCOM_PERSTEMPO_CT1.csv out/_-_home_-_attila_-_CDES_-_trunk_-_perstempo_-_data_-_JITC_-_CT_-_USSOCOM_PERSTEMPO_CT1.xml**+ unparse_exit_code=0**+ diff -qs /home/attila/CDES/trunk/perstempo/data/JITC/CT/USSOCOM_PERSTEMPO_CT1.csv out/_-_home_-_attila_-_CDES_-_trunk_-_perstempo_-_data_-_JITC_-_CT_-_USSOCOM_PERSTEMPO_CT1.csv**+ diff_exit_code=1**+ xmllint --schema csv-version4.dfdl.xsd out/_-_home_-_attila_-_CDES_-_trunk_-_perstempo_-_data_-_JITC_-_CT_-_USSOCOM_PERSTEMPO_CT1.xml --noout out/_-_home_-_attila_-_CDES_-_trunk_-_perstempo_-_data_-_JITC_-_CT_-_USSOCOM_PERSTEMPO_CT1.xml validates**+ xmllint_exit_code=0* --------------------------------------------------- thx in advance - attila On Mon, Jun 21, 2021 at 3:20 PM Beckerle, Mike < [email protected]> wrote: > Attila, > > It took me a bit to spot this, and I'm really not sure I am correct here. > > I think you need one more sequence. If you insert another element of type > "TailType-perstempo" at the end, it doesn't want to be inside the sequence > with NL infix separators. It wants to be after that sequence has ended, but > inside a surrounding sequence that is the model group of the complexType of > the csv-version4... element, but which has no separators. > > Given the images you provided, I can't cut/paste to try this theory out. > > I would like to make self-contained TDML files be the way we all exchange > examples/bug-reports. > > Could you make a TDML file? (See https://daffodil.apache.org/tdml/) > > Their beauty is that they can be fully self-contained, i.e., contain > schema, data, and expected results all together. Everything to reproduce > can be in the same file. > > -mikeb > > > > > > ------------------------------ > *From:* Attila Horvath <[email protected]> > *Sent:* Monday, June 21, 2021 1:27 PM > *To:* [email protected] <[email protected]> > *Subject:* how to incorporate file terminator into generic CSV schema? > > I have following generic variable field/record length schema which > daffodil 2.4.0 parses/unparses verbatim except "No newline at end of file" > error when I diff original CSV against reconstituted CSV. Otherwise > reconstituted CSV appears to match original CSV:... > [image: image.png] > > To get around this I've tried/failed to incorporate code block in *RED* > into code block in *YELLOW* (see code block image). I've used code block > in *RED* successfully but not w/ variable fields/records CSV via > "...fn:count...". > Can someone pls suggest how to correctly integrate unknown file terminator > into code block in *YELLOW* in this schema? > [image: image.png] > > Thx in advance, > > Attila > > > > >Title: Re: how to incorporate file terminator into generic CSV schema?
|
I looked at and
considered TDML
(link) but could not find a convenient tool to
facilitate the generation of a TDML file from inputs so
instead...
I've attached the
relevant ~.dfdl.xsd and input/output ~.csv files.
After adjusting for
respective file paths, below are the results of: parse,
unparse, diff, and xmllint respectively where 'diff' [without
the '-qs' options] indicates 'No newline at end of file'
error...
---------------------------------------------------
+ daffodil parse --validate=on -s csv-version4.dfdl.xsd -r csv-version4.dfdl.xsd -Dheader=present -o out/_-_home_-_attila_-_CDES_-_trunk_-_perstempo_-_data_-_JITC_-_CT_-_USSOCOM_PERSTEMPO_CT1.xml /home/attila/CDES/trunk/perstempo/data/JITC/CT/USSOCOM_PERSTEMPO_CT1.csv + parse_exit_code=0 + daffodil unparse --validate=on -s csv-version4.dfdl.xsd -r csv-version4.dfdl.xsd '-DSeparator=|' header=present -o out/_-_home_-_attila_-_CDES_-_trunk_-_perstempo_-_data_-_JITC_-_CT_-_USSOCOM_PERSTEMPO_CT1.csv out/_-_home_-_attila_-_CDES_-_trunk_-_perstempo_-_data_-_JITC_-_CT_-_USSOCOM_PERSTEMPO_CT1.xml + unparse_exit_code=0 + diff -qs /home/attila/CDES/trunk/perstempo/data/JITC/CT/USSOCOM_PERSTEMPO_CT1.csv out/_-_home_-_attila_-_CDES_-_trunk_-_perstempo_-_data_-_JITC_-_CT_-_USSOCOM_PERSTEMPO_CT1.csv + diff_exit_code=1 + xmllint --schema csv-version4.dfdl.xsd out/_-_home_-_attila_-_CDES_-_trunk_-_perstempo_-_data_-_JITC_-_CT_-_USSOCOM_PERSTEMPO_CT1.xml --noout out/_-_home_-_attila_-_CDES_-_trunk_-_perstempo_-_data_-_JITC_-_CT_-_USSOCOM_PERSTEMPO_CT1.xml validates + xmllint_exit_code=0
---------------------------------------------------
thx in advance -
attila
On Mon, Jun 21, 2021 at 3:20
PM Beckerle, Mike <[email protected]>
wrote:
|
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" elementFormDefault="qualified"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:defineFormat name="default-dfdl-properties"> <dfdl:format alignment="1" alignmentUnits="bytes" binaryFloatRep="ieee" binaryNumberRep="binary" bitOrder="mostSignificantBitFirst" byteOrder="bigEndian" calendarPatternKind="implicit" choiceLengthKind="implicit" documentFinalTerminatorCanBeMissing="yes" emptyValueDelimiterPolicy="none" encoding="ISO-8859-1" encodingErrorPolicy="replace" escapeSchemeRef="" fillByte="f" floating="no" ignoreCase="no" initiator="" initiatedContent="no" leadingSkip="0" lengthKind="delimited" lengthUnits="bytes" nilKind="literalValue" nilValueDelimiterPolicy="none" occursCountKind="implicit" outputNewLine="%CR;%LF;" representation="text" separator="" separatorPosition="infix" separatorSuppressionPolicy="anyEmpty" sequenceKind="ordered" terminator="" textBidi="no" textNumberCheckPolicy="strict" textNumberPattern="#,##0.###;-#,##0.###" textNumberRep="standard" textNumberRounding="explicit" textNumberRoundingIncrement="0" textNumberRoundingMode="roundUnnecessary" textOutputMinLength="0" textPadKind="none" textStandardBase="10" textStandardDecimalSeparator="." textStandardExponentRep="E" textStandardInfinityRep="Inf" textStandardNaNRep="NaN" textStandardZeroRep="0" textStandardGroupingSeparator="," textTrimKind="none" trailingSkip="0" truncateSpecifiedLengthString="no" utf16Width="fixed" /> </dfdl:defineFormat> </xs:appinfo> </xs:annotation> </xs:schema>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:math="http://www.w3.org/2005/xpath-functions/math" elementFormDefault="qualified"> <xs:include schemaLocation="defaults.dfdl.xsd" /> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:defineEscapeScheme name="Quotes"> <dfdl:escapeScheme escapeKind="escapeBlock" escapeBlockStart='"' escapeBlockEnd='"' escapeEscapeCharacter="\" extraEscapedCharacters="" generateEscapeBlock="whenNeeded"/> </dfdl:defineEscapeScheme> <dfdl:format ref="default-dfdl-properties"/> <dfdl:defineVariable name="Separator" type="xs:string" external="true">,</dfdl:defineVariable> <dfdl:defineVariable name="header" type="xs:string" external="true">present</dfdl:defineVariable> <dfdl:defineFormat name="fieldSeparator"> <dfdl:format separator="{ $Separator }" separatorPosition="infix"/> </dfdl:defineFormat> </xs:appinfo> </xs:annotation> <xs:element name="csv-version4.dfdl.xsd"> <xs:complexType> <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix" dfdl:separatorSuppressionPolicy="trailingEmpty"> <xs:element name="header"> <xs:complexType> <xs:sequence dfdl:separator="," dfdl:separatorPosition="infix"> <xs:element name="title" maxOccurs="unbounded" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="record" maxOccurs="unbounded"> <xs:complexType> <xs:sequence dfdl:separator="," dfdl:separatorPosition="infix"> <xs:element name="field" maxOccurs="unbounded" type="xs:string" dfdl:occursCount="{ fn:count(../../header/title) }" dfdl:occursCountKind="expression" /> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:complexType name="TailType-perstempo"> <xs:sequence> <xs:element name="contents" type="xs:string" dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=\r\n|\n|\z)"/> <xs:choice> <xs:element name="CRLF" type="xs:string" dfdl:lengthKind="explicit" dfdl:length="0" dfdl:initiator="%CR;%LF;"/> <xs:element name="LF" type="xs:string" dfdl:lengthKind="explicit" dfdl:length="0" dfdl:initiator="%LF;"/> <xs:element name="NIL" type="xs:string" dfdl:lengthKind="explicit" dfdl:length="0"/> </xs:choice> </xs:sequence> </xs:complexType> </xs:schema>
"Ssn","Asvab Test Date","Gct Total" "0000000057","2004-01-15 12:00 AM","124" "0000000174","1996-10-08 12:00 AM","105" "0000700161","2008-06-30 12:00 AM","136" "0130000155","1996-08-25 12:00 AM","118" "0100000175","1993-12-07 12:00 AM","117" "0000000738","1995-11-07 12:00 AM","125" "0000000070","","112" "0000000895","1989-08-20 12:00 AM","108" "0000000217","1998-09-03 12:00 AM","108" "0200000961","1994-09-27 12:00 AM","117" "0330000160","1997-07-01 12:00 AM","132" "0000001861","2004-06-24 12:00 AM","114" "0000000596","2003-12-09 12:00 AM","120" "0000000009","2000-04-20 12:00 AM","99" "0400000000","2009-08-14 12:00 AM","107" "0470000000","1994-11-16 12:00 AM","106" "0400000000","2002-02-19 12:00 AM","111" "0000000000","","0" "0400000000","2008-08-15 12:00 AM","123" "0400000000","2004-09-13 12:00 AM","127" "0400000006","2000-04-28 12:00 AM","135" "0500000004","2007-11-06 12:00 AM","125" "0500000001","2010-02-22 12:00 AM","102" "0500000001","2005-06-07 12:00 AM","108" "0500000007","2004-09-30 12:00 AM","116" "0570000013","1998-11-04 12:00 AM","121" "0400000084","1998-11-30 12:00 AM","118" "0500000047","1999-10-25 12:00 AM","102" "0000000085","","130" "0500000007","2002-06-10 12:00 AM","111" "0000000000","2008-02-01 12:00 AM","95" "0000000000","2004-04-01 12:00 AM","115" "0100000005","","0" "0200000005","2009-05-14 12:00 AM","103" "0009600002","2004-09-13 12:00 AM","138"
"Ssn","Asvab Test Date","Gct Total" "0000000057","2004-01-15 12:00 AM","124" "0000000174","1996-10-08 12:00 AM","105" "0000700161","2008-06-30 12:00 AM","136" "0130000155","1996-08-25 12:00 AM","118" "0100000175","1993-12-07 12:00 AM","117" "0000000738","1995-11-07 12:00 AM","125" "0000000070","","112" "0000000895","1989-08-20 12:00 AM","108" "0000000217","1998-09-03 12:00 AM","108" "0200000961","1994-09-27 12:00 AM","117" "0330000160","1997-07-01 12:00 AM","132" "0000001861","2004-06-24 12:00 AM","114" "0000000596","2003-12-09 12:00 AM","120" "0000000009","2000-04-20 12:00 AM","99" "0400000000","2009-08-14 12:00 AM","107" "0470000000","1994-11-16 12:00 AM","106" "0400000000","2002-02-19 12:00 AM","111" "0000000000","","0" "0400000000","2008-08-15 12:00 AM","123" "0400000000","2004-09-13 12:00 AM","127" "0400000006","2000-04-28 12:00 AM","135" "0500000004","2007-11-06 12:00 AM","125" "0500000001","2010-02-22 12:00 AM","102" "0500000001","2005-06-07 12:00 AM","108" "0500000007","2004-09-30 12:00 AM","116" "0570000013","1998-11-04 12:00 AM","121" "0400000084","1998-11-30 12:00 AM","118" "0500000047","1999-10-25 12:00 AM","102" "0000000085","","130" "0500000007","2002-06-10 12:00 AM","111" "0000000000","2008-02-01 12:00 AM","95" "0000000000","2004-04-01 12:00 AM","115" "0100000005","","0" "0200000005","2009-05-14 12:00 AM","103" "0009600002","2004-09-13 12:00 AM","138"
