John E. Malmberg wrote:
Craig A. Berry wrote:
On May 3, 2009, at 9:20 PM, John E. Malmberg wrote:

John E. Malmberg wrote:

Perl_sv_upgrade(pTHX_ register SV *const sv, svtype new_type)
case SVt_PVMG:
    ...
   new_body_inline(new_body, new_type);
new_type = SVt_PVMG,
SVt_PVMG has a value of 7.
new_body = 44.
PL_Body_roots[sv_type] = 44.
From the code, it looks like this was expected to contain a valid pointer.

From looking at the source code, it appears that the linked list of bodies is corrupted. my_perl->Ibodyroots[7] has 44.

Yes, I see the same thing.

I have been looking at the S_more_bodies routine. Would it be practical to put an assert on for a pointer being added to the linked list with a value above 512? On VMS, the first page of memory is protected no access.


I haven't had much time to poke at this, but I think an assert there would only help if the body is created with a bogus pointer in the SVt_PVMG slot rather than created with a good pointer that gets clobbered later, and I think the second explanation is more likely. I merely observe (without yet a chance fully to pursue) that 44 is a suspicious number on a couple of different fronts.

I put some asserts in, and can confirm that the linked list is not corrupted when it is set up.

With the bodies code, a "arena" of memory is allocated and for the body type 7 in question, it is divided up into 32 byte chunks.

So it is possible that there is a buffer overrun if something writes 40 bytes into the body.

That might be possible if some struct is being cast on the memory and it has a different size on VMS than on other platforms due to alignment issues. On Alpha / VMS by default the compiler adds padding to structures so that the members are more naturally aligned.

The structure below will have a size of 64 bits as padding will be added
to have member b start on a longword boundary.

struct foo {
    char a;
    long b;
}

The corruption is consistently on this same linked list.

If this were the case of a memory cell being used after it was freed, I would expect corruptions to occur in more random places.

Running with -Dm shows that various 44-byte chunks of memory get allocated, including arenas that are multiples of 44 in size, so if there is a legitimate size of 44 that is added to something that should be a good value but is actually NULL, that might be one explanation for where the bad smell is coming from.

44 / 0x2c is the value of SS$_ABORT, which is the return value of the system() call in IPC::Cmd::_run, which is called somewhere in the chain following from CPANPLUS::Dist::_resolve_prereqs.[1] If there is something inappropriate going on with a vmsish pragma and the return value of the system() call, that's another place where something could go wrong, but also as yet another wacky theory that I haven't been able to prove.

Since no one else is reporting this failure, I will start looking at the VMS specific code for implementing system() so see if I can find anything.

Unfortunately many other things in VMS can return same code, but so far, that is the best theory I have seen.
I've attached a version of the test script that is slimmed down from 400+ lines to 99 lines but still produces the access violation.

Thanks, I will try that.

It consistently crashes when not run in debug, but only crashes sometimes when I have it in the debugger.

I have it crashing on my assert now instead of the access violation.

My next plan is to put some code to walk that body linked list at various places where the code implementing the system() call is writing the status value to memory to see if the corruption can be detected.

The base of the body linked list is a off of the my_perl context variable.

[1] IPC::Cmd::_run does not quote arguments, so in its current form it's not really suitable as a cross-platform way to run Perl one-liners. For example, when it means to run:

perl "-M10000000000" "-e1"
Perl v1410065408.0.0 required--this is only v5.11.0, stopped.
BEGIN failed--compilation aborted.
%SYSTEM-F-ABORT, abort

it's actually running:

$ perl -M10000000000 -e1
syntax error at -e line 0, near "use 10000000000 ("
Execution of -e aborted due to compilation errors.
%SYSTEM-F-ABORT, abort

So the CPANPLUS::Dist test is not distinguishing between a syntax error and a version check failure. I don't think it makes any difference for the access violation, but it's something I noticed while trying to pursue that.

That probably explains some of the failures besides the access violation. The other is probably related to a sample file having a '~' character in the name.

-John
wb8...@qsl.net
Personal Opinion Only

Reply via email to