Hi everyone,

I was hoping for some advice regarding a problem my team is facing related to SAX parsing in Xerces-C++. I'm new to Xerces, and SAX in general, so please forgive any stupidity!

The application we're developing is processing /very/ large XML files that contain time-series data looking something like this:

<Root>
  <Header>
    <SomeMetaData>
    <SomeMoreMetaData>
    ...
    ...
  </Header>

  <Frame id="1">
    <LotsOfData>
    <LotsMoreData>
    <YetMoreData>
    ...
  </Frame>
  <Frame id="2">
    ...
  </Frame>
  <Frame id="3">
    ...
  </Frame>
  ...
  ...
  ...
</Root>

We've been using progressive parsing SAX to read the <Frame> data from these XML files, which works great because we can deal with it as a stream without having to read the entire file up front.

We've also been using the MSXML DOM implementation to read <Header> data with the same Schema as the <Header> element in the time-series files, but from other, small files.

The problem now is that we wish to access the <Header> data in these extremely large files. We don't want to use DOM to parse the entire file (for efficiency issues), but we'd like to re-use the existing DOM-based implementation that we have for reading the <Header> schema (rather than implementing a new SAX parser for the <Header>).

So, I guess my question is, is there a way to discover the exact file location of an Element as it's encountered during a SAX parse? If we could get the location we could manually read the entire <Header> section into a string and DOM-parse the string. We'd also like to be able to access file location information for other reasons, such as to pre-parse the files and build a 'look up table' for the XML file, so that a particular section of the time series can be read in on demand with the help of a custom LocalFileSource.

The closest thing I've found is Locator, but that doesn't help because it gives you a line and column, rather than an absolute location within the file. I looked into peeking at the BinInputStream that the SAX2XMLReader is using, but that doesn't work because the stream is read in chunks, so calling BinInputStream::curPos() when the Header element is encountered doesn't supply the exact location either. I know that that would have been a kludgy solution anyways, but it would have served our purposes.

Any suggestions on how best to solve this one?

Many Thanks,

Pete Hodgson

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to