Here's an older summary of this problem and one possible
fix. Yes, it does appear to be a bug of some sort, I've run
into it myself.
--- Begin Message ---
Hello, Ken and Joaquin.

I've pursed this problem some. I submitted any earlier version
of my "ppm_fix.pl" script to the ActivePerl mailing list. I've
also sent a copy of this, the current version, to the ActiveState
developer working on the PPM problem. I saw the wierd XML entity
problem, too.

Anyway, here is my "ppm_fix.pl" program. Hope it helps. Let me
know if you run into any problems.

------------ Start of ppm_fix.pl ------------
#!/usr/bin/perl -w

=pod

=head1 NAME

ppm_fix.pl - repairs a corrupted PPM XML file

=head1 SYNOPSIS

  ppm_fix.pl > C:\temp\ppmfixed.xml
  ren C:\Perl\site\lib\ppm.xml ppm.xml.bad
  move C:\temp\ppmfixed.xml C:\Perl\site\lib\ppm.xml

=head1 DESCRIPTION

The main problem this script addresses is that for some reason
the PPM XML file starts growing one extremely long line as the
last line. This script looks for that last line that starts with
"<PACKAGE" and ends with "</PPMCONFIG>" and then splits the
PACKAGE structures into nice newline delimited XML as is the
case in the top of the file.

I now also fix any of the lines that end up with a corrupted Perl
installation root. It looks like this:

  C:\p4view\Apps\ActivePerl\MSI\data\ActivePerl\Perl

This is repaired to be $Config{installprefix} (which hopefully
hasn't been corrupted). Also changed any path separators in these
repaired lines from '/' to '\'.

For some reason the 'libwww-perl' package has an invalid path for
its packlist file. It has '\LWP\' rather than '\libwww-perl\' in
its path. I now fix this, too.

I also split the long line that merged REPOSITORY tags, the
PPMPRECIOUS tag, and the initial PACKAGE tag.

Once ActiveState fixes these PPM misbehaviors, this script should
no longer be necessary.

=head1 NOTES

The script also repairs some entities in AUTHOR tags as at least
one of the enties is unrecognised by XML::Simple . For details,
see the code below (look for "AUTHOR").

=head1 AUTHOR

Mike Arms ([EMAIL PROTECTED])

=head1 HISTORY

2002/03/28 - version 1.00 released

  Fix the last long line by splitting the tags onto their own lines.
  Clean up some unexpected XML entities in AUTHOR tags.

2002/04/03 - version 1.01 released

  Fix lines with C:\p4view\Apps\ActivePerl\MSI\data\ActivePerl\Perl
    to be the $Config{'installprefix'} .
  Fix Unix-style forward slash path delimiters to be DOS-style backwards
    slash path delimiters.
  Fix the libwww-perl packlist path.
  Issue diagnostic messages to stderr about each repair done.

2002/04/04 - version 1.02 released

  Split the long line that merged REPOSITORY tags, the PPMPRECIOUS tag,
    and the initial PACKAGE tag.

=head1 COPYRIGHT

Copyright (C) 2002 Mike Arms.

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

To receive a copy of the GNU General Public License, visit
http://www.gnu.org/copyleft/gpl.html . Or write to the
Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.

=cut

use strict;

use Config;

use Data::Dumper;
$Data::Dumper::Indent = 1;

# Predeclare all subroutines with prototypes.
sub escape_dospath ($);


my $VERSION = '1.02';

my $file = $Config{'sitelib'} . '/ppm.xml';
#print Data::Dumper->Dump( [\%Config], [qw(*Config)] );
#exit;

# Uncomment this if you want to check for well-formedness of the XML.
#use XML::Simple;
#my $xs1 = XML::Simple->new();
#my $doc = $xs1->XMLin($file);
#print Data::Dumper->Dump( [$doc], [qw(doc)] );

@ARGV = ( $file );

my $bad_root = 'C:\p4view\Apps\ActivePerl\MSI\data\ActivePerl\Perl';
my $bad_pat = escape_dospath( $bad_root );

my $install_root_pat = escape_dospath( $Config{'installprefix'} );

my $pkg = '';
my $pkg_last = '';
my %packages = ();

my $repaired = 0;

while (<>) {
        unless ( /^<PACKAGE.*<\/PPMCONFIG>$/  ||
/^<REPOSITORY.*<PACKAGE\s+NAME=".*?">$/ ) {
                $pkg = $1 if /<PACKAGE.*NAME="(.*?)"/;
                my $count = s#$bad_pat#$Config{'installprefix'}#g;
                if ( $count ) {
                        $repaired = 1;
                        my ($tag) = /<(.+?)>/;
                        warn "\n" if $pkg_last ne '' and $pkg_last ne $pkg;
                        $pkg_last = $pkg;
                        warn "$pkg bad root $tag\n";
                        my ($path) = m#>(${install_root_pat}.*?)<#;
                        my $pathdos = $path;
                        # Convert any Unix-style forward slash path
delimiters to
                        # the DOS-style backwards slash.
                        $pathdos =~ s#/#\\#g;
                        if ( $path ne $pathdos ) {
                                s#>$install_root_pat.*?<#>$pathdos<#;
                                warn "$pkg path sep $tag $path\n";
                        }

                        # If this is a INSTPACKLIST tag, check that the file
exists.
                        if ( /<INSTPACKLIST>(.*?)</ ) {
                                my $packlist = $1;
                                unless ( -f $packlist ) {
                                        if ( $pkg eq 'libwww-perl' ) {
                                                my $packlist_LWP =
$packlist;
                                                $packlist_LWP =~
s#\\LWP\\#\\$pkg\\#;
                                                if ( -f $packlist_LWP ) {
                                                        s#\\LWP\\#\\$pkg\\#;
                                                        warn "$pkg bad path
INSTPACKLIST\n"
                                                                . "   from:
$packlist\n"
                                                                . "   to:
$packlist_LWP\n";
                                                        $packlist =
$packlist_LWP;
                                                } else {
                                                        warn "*** $packlist
does not exist\n";
                                                }
                                        } else {
                                                warn "*** $packlist does not
exist\n";
                                        }
                                }

                                # Store the packlist in the %packages hash
so that we
                                # can detect if any packages have more than
one entry.
                                if ( exists $packages{$pkg} ) {
                                        warn "*** $pkg has multiple
entries\n";
                                } else {
                                        $packages{$pkg} = $packlist;
                                }
                        }
                }
                print(), next;
        }

        $repaired = 1;
        my @items = split /(<.*?>)/, $_;
        my @prev = ();
        my $tag = '';
        my $text = '';
        for (@items) {
                if ( ($tag) = /^<\/([^>\s\/]+)/ ) {
                        # an end tag
                        if ( ! @prev ) {
                                print "$_\n";
                        } elsif ( $tag eq $prev[0] ) {
                                print "$prev[1]$_\n";
                                @prev = ();
                        } else {
                                print "$prev[1]\n$_\n";
                                @prev = ();
                        }
                        print "\n" if m#^</(PACKAGE|PPMPRECIOUS)>$#;

                } elsif ( ($tag) = /^<([^>\s\/]+)/ ) {
                        # a start tag
                        if ( /\/>$/ ) {
                                print( "$prev[1]\n" ), @prev = () if @prev;
                                print "$_\n";
                                print "\n" if m#^<REPOSITORY\s+#;
                        } else {
                                print( "$prev[1]\n" ) if @prev;
                                @prev = ( $tag, $_ );
                        }
                        if ( $tag = /^<PACKAGE.*NAME="(.*?)"/ ) {
                                $pkg = $1;
                                warn "\n" if $pkg_last ne '' and $pkg_last
ne $pkg;
                                $pkg_last = $pkg;
                                warn "$pkg split from long line\n";
                        }

                } elsif ( @prev ) {
                        # text part of a tag
                        if ( $prev[0] eq 'AUTHOR' ) {
                                # This is just to clean up some unneeded
entities I have
                                # found in the AUTHOR tags (the Jenda one
has an entity
                                # that XML::Simple does not recognise; the
Jenda one
                                # is NOT from Jenda's PPM repository but
rather is an
                                # from an obsolete version on ActiveState's
PPM repository).
                                my $orig = $_;
                                s/&Atilde;&frac12;
\(Jenda\@Operamail.com\)/y (Jenda\@Krynicky.cz)/;
                                warn "$pkg fixed entities in $prev[0]\n"
                                        . "   from: $orig\n"
                                        . "   to:   $_\n"
                                        if $_ ne $orig;
                        }
                        $prev[1] .= $_;
                } elsif ( $_ ne '' && $_ ne "\n" ) {
                        warn "*** no prev when encountered '$_'\n";
                }
        }

        # Output any leftovers.
        print( "$prev[1]" ) if @prev;

}

warn "PPM XML did not need any repair\n" unless $repaired;


#----------------------------------------
sub escape_dospath ($) {
        my $path = shift;
        $path =~ s#\\#\\\\#g;
        return $path;
} # end escape_dospath()


1;
__END__
------------ End of ppm_fix.pl ------------

--
Mike Arms


-----Original Message-----
From: macnerd [mailto:[EMAIL PROTECTED]]
Sent: Monday, April 08, 2002 3:54 AM
To: 'Ken Bandes'; 'Perl-Win32-Web Mailing List'
Subject: RE: PPM Config File Corruption - An Explanation


Ken,

I think this is the wrong mailing list though.  Try PERL-ActivePerl mailing
list as well, as this mailing list has the "ActivePerl" word in it. Also, if
you can report this in Bugzilla.

Though, due to bugs in ActiveState's Bugzilla, I was not able to get onto
the system and report other bugs. :-)  Ah the irony of it.

I have been having nightmares with PPM.  Everyone suggests I go directly to
CPAN.

 - Joaquin

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On
> Behalf Of Ken
> Bandes
> Sent: Tuesday, March 19, 2002 7:28 PM
> To: Perl-Win32-Web Mailing List
> Subject: PPM Config File Corruption - An Explanation
>
>
> A lot of people, myself included, have reported problems with
> ppm which were
> caused by bad character entities in the ppm.xml file.  I
> think I've tracked
> down the bug in ppm.  Maybe someone at ActiveState can fix it?
>
> To recap:  after installing certain modules, the ppm.xml file becomes
> corrupted.  The first symptom of this is that when you run
> ppm, none of your
> configuration settings (notably repositories) are set.
>
> On inspection of the ppm.xml file, it turns out that this
> file contains
> invalid character entities, such as &aring;, which cause the
> parse of this
> file to fail.  ppm only reports this if tracing is on, which
> of course it
> won't be since this setting would be in your ppm.xml file.
>
> I wrote a script to download and parse every ppd file on the
> ActiveState
> repository.  To my surprise, they all parse correctly.  They
> contain funny
> characters (funny to us non-Japanese types, anyway) but
> nothing that breaks
> the parse (remember, there are no bad characters, only bad character
> entities).
>
> The problem occurs within ppm itself which, after downloading
> the ppd file,
> substitutes in some character entities before adding the ppd info to
> ppm.xml.  This occurs in the function PPMdat_add_package (in
> ppm.pm), which
> calls PPM::XML::PPMConfig::Characters on the contents of each
> element in the
> ppd.  This is defined in PPM\XML\PPMConfig.pm, which in turn
> inherits from
> PPM\XML\Element.pm.  The culprit is the content function in
> this file, which
> calls encode_entities on the text.  Unfortunately, these are
> HTML entities.
> Since neither the ppd nor ppm.xml has a doctype declaration
> that would pull
> in the HTML (or any other) character entities, these entities
> are illegal.
>
> I haven't gotten any farther.  The next step would be to
> change this line so
> ppm doesn't translate character entities.  Or maybe it should
> only translate
> the basic xml entities, such as &amp;, &lt;, and &gt;.
>
> Anyone care to pursue this?
>
> --
> Ken Bandes

_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--- End Message ---

Reply via email to