Hi everyone,
I was hoping for some advice regarding a problem my team is facing related to SAX parsing in Xerces-C++. I'm new to Xerces, and SAX in general, so please forgive any stupidity!
The application we're developing is processing /very/ large XML files that contain time-series data looking something like this:
<Root> <Header> <SomeMetaData> <SomeMoreMetaData> ... ... </Header>
<Frame id="1"> <LotsOfData> <LotsMoreData> <YetMoreData> ... </Frame> <Frame id="2"> ... </Frame> <Frame id="3"> ... </Frame> ... ... ... </Root>
We've been using progressive parsing SAX to read the <Frame> data from these XML files, which works great because we can deal with it as a stream without having to read the entire file up front.
We've also been using the MSXML DOM implementation to read <Header> data with the same Schema as the <Header> element in the time-series files, but from other, small files.
The problem now is that we wish to access the <Header> data in these extremely large files. We don't want to use DOM to parse the entire file (for efficiency issues), but we'd like to re-use the existing DOM-based implementation that we have for reading the <Header> schema (rather than implementing a new SAX parser for the <Header>).
So, I guess my question is, is there a way to discover the exact file location of an Element as it's encountered during a SAX parse? If we could get the location we could manually read the entire <Header> section into a string and DOM-parse the string. We'd also like to be able to access file location information for other reasons, such as to pre-parse the files and build a 'look up table' for the XML file, so that a particular section of the time series can be read in on demand with the help of a custom LocalFileSource.
The closest thing I've found is Locator, but that doesn't help because it gives you a line and column, rather than an absolute location within the file. I looked into peeking at the BinInputStream that the SAX2XMLReader is using, but that doesn't work because the stream is read in chunks, so calling BinInputStream::curPos() when the Header element is encountered doesn't supply the exact location either. I know that that would have been a kludgy solution anyways, but it would have served our purposes.
Any suggestions on how best to solve this one?
Many Thanks,
Pete Hodgson
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]