I have seen commercial data sets, literally data you buy and pay big bucks for 
from financial data companies,  that looked exactly like a bunch of perl log 
output (all text, lots of semicolon separators), concatenated with Cobol 
oriented binary mainframe data, (packed decimal, EBCDIC characters) in each 
record of the data set.

Hence each record has a part that is dfdl:representation text, and another part 
that is dfdl:representation binary.  Each record did not even use the same 
character set encoding throughout.

This is one of the reasons that DFDL has a composition principle that "if you 
can describe A, and you can describe B, you can describe A concatenated to B."



________________________________
From: Roger L Costello <[email protected]>
Sent: Tuesday, September 15, 2020 6:40 AM
To: [email protected] <[email protected]>
Subject: Have you ever used more than one dfdl:representation in a DFDL schema?

Hi Folks,

A file contains a long series of text data and at the end is binary data. The 
binary data is not encoded as base64 text or anything like that. It is raw, 
unfiltered, unencoded binary data.

Is it a text file or a binary file?

Should the DFDL schema specify representation="text" or representation="binary"?

Or, should the DFDL schema specify representation="text" for the text part and 
then switch to representation="binary" for the binary part?

All this time I have been thinking that a DFDL schema will have one 
dfdl:representation, But perhaps I am wrong. Have you ever used more than one 
dfdl:representation in a DFDL schema? If yes, then are the files it specifies 
some kind of a hybrid between text and binary?

/Roger

Reply via email to