"Jason E. Stewart" <[EMAIL PROTECTED]> writes:
> What I'll need to do is to put a test in there so the code looks like:
>
> %typemap(perl5, in) const XMLCh* qualifiedName (XMLCh *temp_qualifiedName) {
> if ( SvPOK( $source ) ) {
> if (SvUTF8($source)) {
> // turn it into a UTF8 XMLCh*
> } else {
> // turn it into a ISO-8859-1 XMLCh*
> }
> } else {
> croak("Type error in argument 2 of $name, Expected perl-string.");
> XSRETURN(1);
> }
> }
There are a couple of noteworthy consequences to this:
1 When Xerces returns a string that has high-order UTF-8 characters
(i.e. when the chars are outside the ASCII range 0-127) I'll have to
transcode from UTF-16 into UTF-8.
2. This will involve a lot of back and forth transcoding if the
document contains a significant amount of high-order information. I
don't know what affect this will have on the running time of the
code.
3. to make the glue code simple that passes arguments from perl and
hands them to Xerces, I will always have to call the XMLCh*
interfaces instead of the char* ones.
4. Currently, if a Xerces API method returns a DOMString object or an
XMLCh*, there is no way to keep that object, the glue code converts
all of them into perl strings for 'convenience'. I think this is a
feature, but it might turn out to be have performance benefits to
allow users to keep them around.
5. All of this is going to require the use of Perl-5.6.0 or better. I
get a lot of notices from people still using 5.004 and 5.005, so
this is going to mean upgrading for a lot of people. It is possible
that I can make the code conditional and people with 5.005/4 could
compile XML::Xerces but just not get unicode support.
I WILL NOT ATTEMPT THIS WITHOUT ASSISTANCE => it's a lot of work.
The issue with 4 is tricky. Perl is great about giving lots of
information about what context a method is being called in. For
example the following all look different to perl:
$a = foo(); // scalar context
@a = foo(); // list context
foo(); // void context
So within any method, I can figure out exactly what value to return to
best satisfy the user. It is tricky because these look identical to
Perl:
my $dom_string = $element->getAttribute('foo');
my $perl_string = $element->getAttribute('foo');
Even though the first should return a reference to
XML::Xerces::DOMString object, and the second should return a vanilla
perl string, there is no way to tell them apart. So I'll need more
help from the user.
I believe that it can be handled with a pragma similar to that of
use utf8;
maybe
use utf16;
The user could then turn it on for different pieces of the
application by using code blocks:
# now we get perl strings from all functions
my $perl_string = $element->getAttribute('foo');
$perl_string .= 'nothing up my sleeve';
$element->setAttribute($perl_string);
{
use utf16;
# now we get DOMString's from all functions
my $dom_string = $element->getAttribute('baz');
$dom_string->appendData($perl_string);
$element->setAttribute($dom_string);
}
# now we get perl strings from all functions
my $perl_string2 = $element->getAttribute('bar');
jas.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]