It is possible to have a dynamic length using dfdl:lengthKind="explicit"
and setting dfdl:length to an expression that reaches into the
field-descriptor-array to get the length.

However, there's no way to set the type dynamically at parse time.
Element types must be statically defined in the DFDL schema, just like
their names. You could perhaps parse each field as a xs:hexBinary type
and then use XSLT to transform that hex binary based on the type, but
then you lose alot of the benefits that DFDL/Daffodil provides.

To me, this sounds like a format that is self-descriptive--the
specification for that data is within the data itself. DFDL/Daffodil
does not usually handle these types of formats very well. It can be done
with a two pass solution, though. The first pass uses a schema that
describes and parses only the header of the data. The resulting XML
infoset is then transformed into another DFDL schema based on the
self-description. The remaining data can then be parsed with that
generated schema.

This has clear performance implications since you need to perform a
transform and compile a new DFDL schema for every new piece of data, but
it is really the only way to handle these self describing formats.

- Steve

On 10/8/18 7:55 AM, Costello, Roger L. wrote:
> Hello DFDL community!
> 
> I am creating a DFDL schema to parse dBase files.
> 
> A dBase file consists of a list of records. Each record consists of a list of 
> fields. Prior to the list of records is a header which describes each record 
> field: the field's name, the length of the field's value, and its datatype 
> (string, date, numeric, boolean, etc.). For example, I have a dBase file 
> containing railway data and the file looks like this (albeit in binary):
> 
> Field-descriptor-array
>     Field
>         name: station-name
>         length: 254
>         datatype: string
>     Field
>         name: line
>         length: 100
>         datatype: string
>     Field
>         name: isActive
>         length: 1
>         datatype: boolean
> 
> Here is a record:
> 
>     Van Dorn Street
>     blue
>     T
> 
> Ideally, parsing the dBase file would yield this XML:
> 
>     <record>
>         <station-name>Van Dorn Street</station-name>
>         <line>blue</line>
>         <isActive>true</isActive>
>     </record>
>     
> However, that requires element names be dynamically generated, which is not 
> currently supported. So, instead I can design the DFDL schema to generate 
> this XML:
> 
>     <record>
>         <field>Van Dorn Street</field>
>         <field>blue</field>
>         <field>true</field>
>     </record>
> 
> That will require the DFDL schema to calculate the number of <field> elements:
> 
>     <xs:element       name="field" 
>               minOccurs="0" 
>               maxOccurs="unbounded" 
>               dfdl:occursCountKind="expression" 
>               dfdl:occursCount="count{../../Field-descriptor-array/Field}" 
>               ...
> 
> Does this seem reasonable thus far?
> 
> Now I am stuck: how to specify the length and the datatype of each field 
> element? The i'th <field> element must have a length and datatype as 
> specified in the i'th Field (which are in the header section). For the 
> example above, the first <field> element must be a string with length 254 
> characters, the second <field> element must be a string with length 100 
> characters, and the third <field> element must be a boolean with length 1 
> byte. How do I dynamically specify length and datatype?
> 
> /Roger
> 


Reply via email to