Hi,
the documentation of DOMString::rawBuffer() says that the returned buffer is
not always null terminated. This implies that the buffer has to be copied
and a null character has to be appended to that copy of the buffer for
further use.
I would like to skip this copying for performance reasons
This won't work. An illegal character is an illegal character, even if
it's represented as a numeric character reference. Attempting to parse a
file with results in the following error:
Fatal Error at (file test1.xml, line 4, column 16): Invalid character
reference
The only way to make
They are inherently illegal in XML, so it doesn't matter what encoding you
use. Its not that they can't be represented in the source encoding, but that
the parser won't accept them. You will have to escape them using character
refs, e.g:
F;
Unless my memory is failing me, this will work beca
We are attempting to store data in an XML file. This data is encoded ascii
text, and because of this, some of the characters end up falling outside the
legal limits for XML characters. Specifically, I am getting this error:
Fatal Error at file "C:\natemail.xml", line 2, column 3275(4/23/2002
22
The getSrcOffset() method of XMLScanner should return you the information
you want. However, it can only do that if the source offset stuff is
supported by the transcoding system being used. For ICU and the internal
transcoders that is true. I just looked and in the latest repository files,
the Wi
"Jason E. Stewart" <[EMAIL PROTECTED]> writes:
> Any ideas what to do?
I finally broke down and read the source code for XMLScanner and
XMLReader and I'm convinced that without a major re-writing, this is
not possible.
Basically, the XMLReader calls readBytes() on the stream to fill up a
buffer
"Murphy, James" <[EMAIL PROTECTED]> writes:
> Yuck? WTF? Its beautiful! Maybe I should explain some more.
>
> I followed the same links you described and realized that without some
> hacking in and around XMLScanner getting the BinInputStream from the
> ReaderMgr is a no go. BTW, getting at
Hmm...you're right.
We get some value in SAX parsing the initial part of the document before the
glut of repeated record structures. That is where we do some "document
level" sanity checking and hang onto some other higher level data.
Thanks
Jim
> -Original Message-
> From: Dean Rod
If you can impose certain restrictions, don't even use the XML parser. Just
do a fast and dirty scan, based on known limitations of the format and break
it up yourself at maximum speed.
--
Dean Roddey
The Charmed Quark Controller
Charmed Quark Software
[EMAIL PROTECTED]
ht
You're right of course, that's a very sensible approach.
But my client has an XML based product to handle communication between
trading partners. The benefits of XML are significant since it is an
integration product and honestly the instance sizes a usually very
manageable. But 5% of the tim
My XML doesn't get within 100 miles of a DTD. If I care to validate I use
schema. The chunks that I find are very well formed XML due to a priori
knowledge of the xml structure I'm parsing. They look like:
...
...
...
...
...
.
Of course, the counter argument to that is: Use a format that's designed to
handle that reasonably. XML isn't, so why use it if its not an optimal (or
even reasonable) format to use for this kind of thing?
--
Dean Roddey
The Charmed Quark Controller
Charmed Quark Software
> I am working on a system that will be responsible for
> splitting large XML files into record sized chunks.
> These chunks will be handed off to end-users who
> want the option of parsing them with whatever parser
> they choose.
No XML compliant parser should parse such chunks, because they ar
Fair enough Dean - I'm sympathetic to your point that Xerces was designed
from an InfoSet perspective. That's cool - but when you are writing for
performance we are willing to make some Faustian bargains. Especially
since, like Jason our environment stipulates single entities anyway.
Jim
> ---
Yuck? WTF? Its beautiful! Maybe I should explain some more.
I followed the same links you described and realized that without some
hacking in and around XMLScanner getting the BinInputStream from the
ReaderMgr is a no go. BTW, getting at XMLScanner from a parser would be
real handy for lots o
>
> From: Dean Roddey <[EMAIL PROTECTED]>
> Date: 2002/04/23 Tue PM 03:33:45 EDT
> To: [EMAIL PROTECTED]
> Subject: Re: RE: how to access the raw text that generated a sax event
>
> The source offset stuff is always relative to the entity, so if you have
> internal or external entity references
"Jason E. Stewart" <[EMAIL PROTECTED]> writes:
> "Murphy, James" <[EMAIL PROTECTED]> writes:
>
> > BinInputStream::curPos() const; looks promising since the built in
> > input sources actually implement it! So you should be able to call
> > this in your SAX event handler methods if you provide
"Murphy, James" <[EMAIL PROTECTED]> writes:
> BinInputStream::curPos() const; looks promising since the built in
> input sources actually implement it! So you should be able to call
> this in your SAX event handler methods if you provide your event
> handler class with the InputSource you use to
"Dean Roddey" <[EMAIL PROTECTED]> writes:
> Anyway, the whole concept of getting back to the original raw XML
> text is counter to what an XML parser is supposed to do, so its
> never going to be easy because it wasn't designed to make that easy
> or useful to do. I always argued that we never ev
The source offset stuff is always relative to the entity, so if you have
internal or external entity references and such, you are going to have to
keep up with that fact. So if a entity reference to an internal general
entity contains elements (and it pretty much has to contain whole elements),
th
The other potential solution I've found is the XMLScanner's "getSrcOffset" method. My
only fear in using it is that it will give weird results if an XML document is
comprised of more than 1 entity.
Does "getSrcOffset" treat the document as a continuous sequence of bytes, or is it
more low-le
Looking through the source...
BinInputStream::curPos() const; looks promising since the built in input
sources actually implement it! So you should be able to call this in your
SAX event handler methods if you provide your event handler class with the
InputSource you use to parse.
I haven't tri
I have intentionaly added a violation(against the external schema) inside
the XML document but no error is being reported during:
parser->getErrorCount();
and
cerr << "Errors from errReporter ->"
<< errReporter->getSawErrors() << "<-\n";
"Murphy, James" <[EMAIL PROTECTED]> writes:
> I thought this would be really handy when parsing from a continuous buffer
> like a MemBufInputSource or a LocalFileInputSource. I have a situation
> where I SAX parse _very_ large XML instances looking for small repeating
> fragments. These fragmen
I thought this would be really handy when parsing from a continuous buffer
like a MemBufInputSource or a LocalFileInputSource. I have a situation
where I SAX parse _very_ large XML instances looking for small repeating
fragments. These fragments are operated on individually by making a DOM to
op
Both of them returns a newly
allocated DOMString, hence, newly allocated memory for it.
Jorge
- Original Message -
From:
Felipe Micaroni Lalli
To: [EMAIL PROTECTED]
Sent: Tuesday, April 23, 2002 1:24
AM
Subject: memory allocate in
DOM_Node
Hel
Hello people,
I made a program using DOM Xerces for C++ and the
functions:
DOM_Node::getNodeName() or DOM_Node::getNodeValue()
allocate memory.
Any ideas?
Thanks,
hugs,
Felipe.
--- Murty Dasari <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I was trying to get source-code for latest stable Xerces C parser for
> Unix.
>
> I've downloaded a tar-ball, xerces-c-src1_7.0.tar.gz (Latest Stable
> source package for Unix's) from the following download site.
> http://xml.apache.org/dist
The problem with using the "locator" is that it only reports line+column info. Byte
offsets into the file would be more helpful for my purposes.
-ted
>
> From: "Joseph Kesselman/CAM/Lotus" <[EMAIL PROTECTED]>
> Date: 2002/04/23 Tue AM 08:39:09 EDT
> To: [EMAIL PROTECTED]
> Subject: Re: how t
Best suggestion I've got is to use the SAX "locator" to find the relevant
area of the document, then perform your own primitive parsing to extract a
moderately meaningful chunk thereof
... but I suspect that's more work than simply using a single parser and
routing its SAX events to the app
I need a way to tell the XMLScanner to use the
default validator.
The one is actually created during the XMLScanner
creation (fDTDValidator) when valToAdopt=NULL is passed to its
constructor.
But for my needs I need to change the validator
dynamically.
The XMLScanner::setValidator(XMLValida
31 matches
Mail list logo