Re: How are decimal (number with decimal point) values expressed in a binary file?

Beckerle, Mike Mon, 30 Sep 2019 06:05:34 -0700

My experience is that decimal numbers are largely used by financial 
applications. The usage is almost always fixed point, not floating point.

Those applications are defined to use base-10 numerics, as in rounding to some 
fixed precision using base-10 rounding.

The representations used are usually either packed, or zoned decimal.

Packed decimal uses a 4-bit nibble to represent digits 0 to 9. So two digits 
per byte.

So 1995 would be hex 01995C. The leading 0 is padding because these are always 
stored in complete bytes. The nibbles C or F are a plus sign. D is a minus 
sign.  The decimal point is implied, not stored. This would be described in 
Cobol as PIC S99V99 Comp-3 where S means signed, 99V99 are the digits with the 
V showing location of the implied decimal point, and Comp-3 means packed 
decimal.

DFDL supports this notion of implied decimal point, so you can do 
dfdl:textNumberPattern="99V99" just like Cobol. But.... Daffodil doesn't 
support this "V" character in textNumberPatterns yet.

There is also Cobol "DISPLAY" decimal aka "zoned" decimal. This is almost text 
characters.  Your 1995 would be this in hex, using Ascii charset: 0x31393935. 
If the value was -19.95 the hex would be 0x31393975 in standard ascii zoned. 
This way of storing decimal dates from back when punchcards were used. It is 
called "overpunched trailing sign". IBM s390 and earlier versions thereof  have 
had actual CPU instructions that do math directly on these representations 
(zoned and packed) since the 1960's.

This is still, today, the most common representation for fixed-point decimal, 
as used in financial applications.

Because this is "almost" textual, this kind of DISPLAY decimal is often 
mistaken for textual data. 19.95 would appear in a file viewed textually as 
"1995", and -19.95 would be "199u". (The u is ascii 0x75 I believe.)

...mikeb

________________________________
From: Steve Lawrence <[email protected]>
Sent: Monday, September 30, 2019 8:19 AM
To: [email protected] <[email protected]>
Subject: Re: How are decimal (number with decimal point) values expressed in a 
binary file?

Yeah, this system really only works when the scale factor is a constant.
Which might be common in formats dealing with currencies for example.
Such formats generally represent values as integers in number of cents
so there aren't any issues related to floating point precision loss, but
then the int is scaled by 100 to get the actual dollars value when needed.

But if the scale factor isn't constant, you would need to use
inputValueCalc, e.g.:

  <xs:element name="scaleFactor" type="xs:int" ... />
  <xs:element name="intValue" type="xs:int" ... />
  <xs:element name="decValue" type="xs:decimal" dfdl:inputValueCalc="{
    ../intValue div math:pow(10, ../scaleFactor)
  }" />

Though, it might be a nice extension of daffodil to allow
dfdl:binaryVirtualDecimalPoint to accept a DFDL expression. Then this
could just look like;

  <xs:element name="scaleFactor" type="xs:int" ... />
  <xs:element name="decValue" type="xs:decimal"
    dfdl:binaryDecimalVirtualPoint="{ ../scaleFactor }" />

>From an implementation perspective I don't think it would be difficult
to add.

Though, I think in the real world binary decimal formats I've seen just
use the standard IEEE float or double representations. I think integer +
scale for decimals is relatively uncommon, and when it does happen, I
think the scale tends to be constant.

- Steve

On 9/30/19 7:56 AM, Costello, Roger L. wrote:
> Thank you Steve! One follow-up question, please.
>
>> In order to get a value of 19.95, your data
>> would contain the bytes 0x07CB (1995 in
>> two's complement binary) and you'd have
>> dfdl:binaryDecimalVirtualPoint="2" to
>> move the decimal point two places to the left.
>
> Is that how most real world data formats express 19.95? If I understand 
> correctly, the 19.95 is treated as the integer 1995 and then the integer 1995 
> is expressed in binary. Then, the data format must, by some means, indicate 
> that there is a decimal point between 19 and 95. Wouldn't that information 
> about the location of the decimal point either require some additional 
> information within the binary file, or some out-of-band knowledge by the 
> applications that processes the binary file? It seems, based on your 
> description, you are assuming the latter.
>
> /Roger
>
> -----Original Message-----
> From: Steve Lawrence <[email protected]>
> Sent: Monday, September 30, 2019 7:40 AM
> To: [email protected]
> Subject: [EXT] Re: How are decimal (number with decimal point) values 
> expressed in a binary file?
>
> If the type were xs:float or xs:double, then the dfdl:binaryFloatRep property 
> defines how binary is converted to a number. Daffodil currently only supports 
> "ieee", which is IEEE 754-1985 floating point representation. And byte 
> lengths must be 4 for xs:float or 8 for xs:double.
>
> Things are a bit different for xs:decimal. In that case, we parse the number 
> of bits as an integer (based on the dfdl:binaryNumberRep property), and then 
> move the decimal point of that integer base on the value of 
> dfdl:binaryDecimalVirtualPoint.
>
> So in your example, let's say your field length was 2 bytes and 
> dfdl:binaryNumberRep="binary" (i.e. two's complement). In order to get a 
> value of 19.95, your data would contain the bytes 0x07CB (1995 in two's 
> complement binary) and you'd have dfdl:binaryDecimalVirtualPoint="2" to move 
> the decimal point two places to the left.
>
>
> On 9/30/19 7:19 AM, Costello, Roger L. wrote:
>> Hello DFDL community,
>>
>> Scenario: The book cost is: 19.95
>>
>> The book data is in binary.
>>
>> How is 19.95 expressed in binary?
>>
>> How would the cost be expressed in a DFDL schema? Simply this:
>>
>> <xs:element name="cost" type="decimal" />
>>
>> /Roger
>>
>

Re: How are decimal (number with decimal point) values expressed in a binary file?

Reply via email to