Forgive me if this is a rookie user error, but I ran across an odd
situation with respect to the libxml2 parser. It appears that when it is
parsing a series of numerical values separated by whitespace (e.g. "...
293.18 218.92 289.13 ..."), it is possible for it to grab just a portion
of the series resulting in a number being separated from the rest of its
digits.
For example, in the series above which is the data portion of a table
(rows, columns, data values - see attached file), the parser may grab
everything from the beginning of the list through 293.18 and 218 (but
/not /".92") and pass it in through the 'cdata' argument to
characterHandler(). My code then extracts each of the values with "218"
being the last one. Then libxml2 grabs the rest of the series, beginning
with ".92 289.13 ..." through the end, passes it again to
characterHandler() where my code again extracts each value, starting
with ".92". This results in an extra value being parsed from the series
("218" and ".92" instead of "218.92") and an error in my code (# of data
values doesn't match the product of rows and columns).
It appears to be completely random as to whether the split happens
within a value or not, except that the array ends with character number
250 (suspiciously close to 256). The more white space characters, the
less likely it is to happen within a value, so I can reformat the
numbers and make the error go away. However, this doesn't leave me with
a very comfortable feeling. Plus, I have a hard time believing that we
wouldn't have seen this a long time ago if it was truly this random.
Is this something with which you are familiar? I searched the web but
didn't see any returns that seems to address this situation. I'm
wondering if there's a way to instruct libxml2 to not separate
contiguous non-whitespace characters. I hope I don't have to try to
reassemble separated values myself.
Any help is appreciated.
Thanks,
-Dan
--
Dan McRae
Software Engineering Manager
Comet Solutions, Inc.
505.323.2525
505.353.2635
<?xml version="1.0" ?>
... much removed ...
<Table>
<RowSet> <!-- 1 row -->
</RowSet>
<ColumnSet name="Time"> <!-- 37 columns -->
<Scalar value="0"/>
<Scalar value="600"/>
<Scalar value="1200"/>
<Scalar value="1800"/>
<Scalar value="2400"/>
<Scalar value="3000"/>
<Scalar value="3600"/>
<Scalar value="4200"/>
<Scalar value="4800"/>
<Scalar value="5400"/>
<Scalar value="6000"/>
<Scalar value="6600"/>
<Scalar value="7200"/>
<Scalar value="7800"/>
<Scalar value="8400"/>
<Scalar value="9000"/>
<Scalar value="9600"/>
<Scalar value="10200"/>
<Scalar value="10800"/>
<Scalar value="11400"/>
<Scalar value="12000"/>
<Scalar value="12600"/>
<Scalar value="13200"/>
<Scalar value="13800"/>
<Scalar value="14400"/>
<Scalar value="15000"/>
<Scalar value="15600"/>
<Scalar value="16200"/>
<Scalar value="16800"/>
<Scalar value="17400"/>
<Scalar value="18000"/>
<Scalar value="18600"/>
<Scalar value="19200"/>
<Scalar value="19800"/>
<Scalar value="20400"/>
<Scalar value="21000"/>
<Scalar value="21600"/>
</ColumnSet>
<DataSet name="Temperature"> <!-- 37 values, but one gets separated into two => 38 => error! -->
293.15 291.15 288.15 283.15 274.65 273.15 274.15 277.15 281.15 285.15 290.15 295.15 301.15 307.15 312.15 315.15 317.15 318.15 318.15 317.15 315.15 312.15 308.15 303.15 297.15 292.15 290.15 289.15 289.15 290.15 291.15 292.15 292.65 293.15 293.35 293.25 293.15
</DataSet>
</Table>
... much more removed ..._______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml