Hi Pete,
For SAX2 try using setFeature(XMLUni::fgXercesCalculateSrcOfs, true).
Regards,
David A. Cargill
XML Parser Development
IBM Toronto Lab
(905) 413-2371, tie 969
[EMAIL PROTECTED]
Pete Hodgson
<[EMAIL PROTECTED]
epete.net> To
[EMAIL PROTECTED]
11/16/2004 12:52 cc
PM
Subject
Re: Accessing file position
Please respond to information during a SAX parse
xerces-c-dev
I've tried using SAX2XMLReader::getSrcOffset(), but
XmlReader::getSrcOffset() throws a Reader_SrcOfsNotSupported exception.
Do I need to explicitly tell the parser to maintain source offset
information? I noticed that SAXParser has a setCalculateSrcOfs() method,
but I can't find an equivalent for SAX2XMLReader. Do I need to choose a
specific scanner maybe?
Any help would be greatly appreciated!
Cheers,
Pete
Erik Rydgren wrote:
> Try this path: SAXParser().getScanner().getSrcOffset()
>
> The problem is that the getScanner method is protected. You might
> inherit the SAXParser into your own class to get access.
>
> But it should give you the number of characters eaten by the XMLReader.
> That is the current fileposition.
>
> / Erik
>
>
>>-----Original Message-----
>>From: Pete Hodgson [mailto:[EMAIL PROTECTED]
>>Sent: den 16 november 2004 16:53
>>To: [EMAIL PROTECTED]
>>Subject: Accessing file position information during a SAX parse
>>
>>Hi everyone,
>>
>>I was hoping for some advice regarding a problem my team is facing
>>related to SAX parsing in Xerces-C++. I'm new to Xerces, and SAX in
>>general, so please forgive any stupidity!
>>
>>The application we're developing is processing /very/ large XML files
>>that contain time-series data looking something like this:
>>
>><Root>
>> <Header>
>> <SomeMetaData>
>> <SomeMoreMetaData>
>> ...
>> ...
>> </Header>
>>
>> <Frame id="1">
>> <LotsOfData>
>> <LotsMoreData>
>> <YetMoreData>
>> ...
>> </Frame>
>> <Frame id="2">
>> ...
>> </Frame>
>> <Frame id="3">
>> ...
>> </Frame>
>> ...
>> ...
>> ...
>></Root>
>>
>>We've been using progressive parsing SAX to read the <Frame> data from
>>these XML files, which works great because we can deal with it as a
>>stream without having to read the entire file up front.
>>
>>We've also been using the MSXML DOM implementation to read <Header>
>
> data
>
>>with the same Schema as the <Header> element in the time-series files,
>>but from other, small files.
>>
>>The problem now is that we wish to access the <Header> data in these
>>extremely large files. We don't want to use DOM to parse the entire
>
> file
>
>>(for efficiency issues), but we'd like to re-use the existing
>
> DOM-based
>
>>implementation that we have for reading the <Header> schema (rather
>
> than
>
>>implementing a new SAX parser for the <Header>).
>>
>>So, I guess my question is, is there a way to discover the exact file
>>location of an Element as it's encountered during a SAX parse? If we
>>could get the location we could manually read the entire <Header>
>>section into a string and DOM-parse the string. We'd also like to be
>>able to access file location information for other reasons, such as to
>>pre-parse the files and build a 'look up table' for the XML file, so
>>that a particular section of the time series can be read in on demand
>>with the help of a custom LocalFileSource.
>>
>>The closest thing I've found is Locator, but that doesn't help because
>>it gives you a line and column, rather than an absolute location
>
> within
>
>>the file. I looked into peeking at the BinInputStream that the
>>SAX2XMLReader is using, but that doesn't work because the stream is
>
> read
>
>>in chunks, so calling BinInputStream::curPos() when the Header element
>>is encountered doesn't supply the exact location either. I know that
>>that would have been a kludgy solution anyways, but it would have
>
> served
>
>>our purposes.
>>
>>Any suggestions on how best to solve this one?
>>
>>Many Thanks,
>>
>>Pete Hodgson
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]