Hi Steve,

> It can be done with a two pass solution, though.

Okay, I'll give this 2-pass approach a try. However, I've never done this 
before. Does the first pass generate XML? And then the second pass (somehow) 
uses the XML to parse the remainder of the dBase file? I don't have any idea 
how to do this. Would you sketch out how to do 2-passes in Daffodil, please?

/Roger

-----Original Message-----
From: Steve Lawrence <[email protected]> 
Sent: Monday, October 8, 2018 8:19 AM
To: [email protected]; Costello, Roger L. <[email protected]>
Subject: Re: How to dynamically specify the length and datatype of an element?

It is possible to have a dynamic length using dfdl:lengthKind="explicit"
and setting dfdl:length to an expression that reaches into the 
field-descriptor-array to get the length.

However, there's no way to set the type dynamically at parse time.
Element types must be statically defined in the DFDL schema, just like their 
names. You could perhaps parse each field as a xs:hexBinary type and then use 
XSLT to transform that hex binary based on the type, but then you lose alot of 
the benefits that DFDL/Daffodil provides.

To me, this sounds like a format that is self-descriptive--the specification 
for that data is within the data itself. DFDL/Daffodil does not usually handle 
these types of formats very well. It can be done with a two pass solution, 
though. The first pass uses a schema that describes and parses only the header 
of the data. The resulting XML infoset is then transformed into another DFDL 
schema based on the self-description. The remaining data can then be parsed 
with that generated schema.

This has clear performance implications since you need to perform a transform 
and compile a new DFDL schema for every new piece of data, but it is really the 
only way to handle these self describing formats.

- Steve

On 10/8/18 7:55 AM, Costello, Roger L. wrote:
> Hello DFDL community!
> 
> I am creating a DFDL schema to parse dBase files.
> 
> A dBase file consists of a list of records. Each record consists of a list of 
> fields. Prior to the list of records is a header which describes each record 
> field: the field's name, the length of the field's value, and its datatype 
> (string, date, numeric, boolean, etc.). For example, I have a dBase file 
> containing railway data and the file looks like this (albeit in binary):
> 
> Field-descriptor-array
>     Field
>         name: station-name
>         length: 254
>         datatype: string
>     Field
>         name: line
>         length: 100
>         datatype: string
>     Field
>         name: isActive
>         length: 1
>         datatype: boolean
> 
> Here is a record:
> 
>     Van Dorn Street
>     blue
>     T
> 
> Ideally, parsing the dBase file would yield this XML:
> 
>     <record>
>         <station-name>Van Dorn Street</station-name>
>         <line>blue</line>
>         <isActive>true</isActive>
>     </record>
>     
> However, that requires element names be dynamically generated, which is not 
> currently supported. So, instead I can design the DFDL schema to generate 
> this XML:
> 
>     <record>
>         <field>Van Dorn Street</field>
>         <field>blue</field>
>         <field>true</field>
>     </record>
> 
> That will require the DFDL schema to calculate the number of <field> elements:
> 
>     <xs:element       name="field" 
>               minOccurs="0" 
>               maxOccurs="unbounded" 
>               dfdl:occursCountKind="expression" 
>               dfdl:occursCount="count{../../Field-descriptor-array/Field}" 
>               ...
> 
> Does this seem reasonable thus far?
> 
> Now I am stuck: how to specify the length and the datatype of each field 
> element? The i'th <field> element must have a length and datatype as 
> specified in the i'th Field (which are in the header section). For the 
> example above, the first <field> element must be a string with length 254 
> characters, the second <field> element must be a string with length 100 
> characters, and the third <field> element must be a boolean with length 1 
> byte. How do I dynamically specify length and datatype?
> 
> /Roger
> 


Reply via email to