Never mind. There was, in fact one stray pointer, which I introduced in the last two weeks making these plugins. Thank the tao for libgmalloc I guess.
On 5/31/10, Tim Prepscius <[email protected]> wrote: > Well I wrote this e-mail to a friend, but perhaps one of you may read > it and see the solution, > any hints would be interesting: > > > > I'm having a debugging problem like I've never had before with this > safari plugin. > > > Maybe explaining to you will help me gain some sort of insight. > > Imagine this: > You have a program. It is made of about 20 libraries. Some other > peoples, some yours. > The difference between the safari plugin, and an executable, is about > 100 lines of startup code. Maybe 0.01% of the entire program. > > On both windows and osx, the executable functions flawlessly. 80% of > the program has been around for 5 years atleast, 18% is in the last 2 > years, 1-2% is in the last year. So in other words, things have been > working for a long time. > > > As a safari plugin, there is a point A and a point B which crash. > Only in release build. > > A crashes only 10% of the time. > I can increase the likely hood that A crashes by delaying the event by > about 10 seconds. At which the likely hood is maybe 30%. (but I'm > weirded out by this and don't trust this observation) I can create > this delay by just pausing the debugger, or pausing the server with > which it is talking to. > A crashes during a dynamic cast of an object. > The same dynamic cast occurred a few moments before. > If it makes it past point A, that same code/dyamic_cast will work > perpetually. > This same code is called millions of times. The object it is casting > is allocated in the very beginning, and deallocated at the very end. > > > When I turn on logging, the crash does not occur. > When I turn off optimization, the crash does not occur. However if I > turn off optimization of only the module in which the crash occurs (or > the call to dynamic cast), it still occurs. > > The code in which A crashes looks like this: > void Dynamic::event (const Object::Event::Base *event) > { > LogDebug (SnowCrash::Object::Dynamic::event, "object receiving event > " << this); > > std::list<Common::Object::Component *>::iterator i; > > for (i=orderedComponents.begin(); i!=orderedComponents.end(); ++i) > { > Common::Object::Component *_component = *i; > LogDebug (SnowCrash::Object::Dynamic::event, "object > distributing > event to " << _component); > LogDebug (SnowCrash::Object::Dynamic::event, "object > distributing > event to " << _component->getComponentID()); > > Object::Component *component = CheckCastPtr(Object::Component, > _component); > if (component) > { > component->event (event); > } > } > } > > > > B has no pattern. > It occurs when a piece of memory is deallocated twice. When a > smartptr decs. But this is impossible. Unless either a copy > constructor or a copy operator is not being called. It could be a > copy construct of the object which contains the smart pointer or the > smart pointer itself. Either seem very unlikely. Unfortunately this > bug occurs so rarely it is hard to catch. > > > > -- > > So at first my theory was, well, let's see what is happening. > But after stepping through over and over, I can't see anything wrong > with the object it is trying to cast. Obviously there is. > > > So then I thought, well, perhaps this is just a messed up build. So I > rebuilt everything. This occurs sometimes on win32 with me if I link > to a class of which I've changed the virtual methods, but not > recompiled modules depending on it. > > > So then I thought.. Well given that the executables operate fine. > Maybe there is some sort of bug in static initializations. > But they *seem* to be occurring. At least some of them are. > > > So then I thought, maybe there is some sort of discord between > object-c and c++, with memory management. And I investigated that for > a while. However that would not explain the fact it always crashes in > the same place. If it crashes at all. It seems to me, that enough > people are mixing objective-c and c++ so that this should not be a > problem. > > > So then I thought.. Ok, I think that that memory is being modified, > either by safari. Or by my own threads (which function fine as an > executable). And it is suspicious that this problem seems linked to > time. So I wrote a memory watcher. I overwrote new and delete, kept > a set of memory, and did continuous CRC's on that memory, looking for > when bits changed. [which it turns out is pretty interesting to watch > anyway] > > > However this new/delete overriding changed the timing of the program. > And it stopped crashing. > I tried to move the area which is watched only to a specific section, > however it continues to not crash. > But when I turn that memory watching off, it crashes again. > > Also, perhaps that memory watching causes more allocations, and > perhaps that changes the overall structure of the allocations. > Because a *single time*, this memory watcher/debugger crashed. Saying > that it was watching NULL memory. Which was impossible. > Cause basically I have this: > > new: > lock memory-mutex > make memory, make memory tag > if either is NULL, return NULL > else add it to the set of memory to watch. > unlock memory-mutex > > delete: > lock memory-mutex > if the memory is tagged > remove it from set and delete it > else just delete it > unlock memory-mutex > > test: > lock memory-mutex > evaluate crc's of memory compare with tags, has anything changed, > print out a message > unlock memory-mutex > > > This crash of the memory watcher really weirded me out, cause it it > nearly impossible, unless boost+pthreads has problems on osx, so it > seems to me that some external process zero'd a segment of my memory. > > Which would explain why the crash of the smart ptr dec, and also the > dynamic_cast failure. > > > So my current working theory is: > 1. a pointer somewhere, is initialized incorrectly, but always the same > way. > 2. writing to it is zeroing out my memory. > 3. this pointer may or may not be within my dylib/process space > > > So my question to you is: > > What would your approach to solving this be? Cause my usual isn't > working. Any magic bullets? > I'm up to maybe 50 hours on this bug. > > > -tim > > > > On 5/28/10, Tim Prepscius <[email protected]> wrote: >> Greetings again, >> >> So I've been able to (perhaps) solve my opengl issues, by switching >> cocoa basically. I'm still using agl via the window ref of the cocoa >> window. Seems to function, I wonder if it will fail with some update >> of safari. On a side note, if anyone sees this post while >> investigating opengl problems, don't bother with xulrunner on mac! It >> will just be a waste of time. It took me a while to figure out that >> npapi was in webkit as well. >> >> >> But now I'm seeing some extreme strangeness in other areas. >> >> >> So I have a Client application. >> It is made up of about 20 libraries and a bit of connecting code. >> >> One version links as a windowed executable. >> One version links as a plugin. >> (depending on which bit of connecting code you use) >> However the rest of the code for the application in both cases is >> exactly they same. 99.999% of it. >> >> >> The strangeness I'm seeing is this: >> The application version functions without problem both debug and >> release. (as it has done for quite a while). >> The plugin version crashes. But only the optimized non debug build. >> >> And it crashes is weird ways that are reminiscent of out of sync >> linking problems. For instance "dynamic_cast" is failing and causing >> a crash in an area nearly impossible. And that area of code has >> existed without problem for 9 years. >> >> There seem to be initialization problems of variables. Or perhaps a >> copy operator/constructor is not being called correctly. >> >> >> >> I've spent the last two days investigating what could be causing this. >> It is a mystery, cause the normal application just hums along fine, >> while the plugin crashes, not immediately, however in the first 5 >> seconds or so, as significant events occur. >> >> My leaning is to think there is a problem with gcc and optimized code >> in dylibs, perhaps their static initializations are not being >> completely performed? But I must think that the chances of this are >> fairly small, as apple uses dylibs everywhere, so they would make sure >> that these function correctly. >> >> >> Has anyone else seen a situation where optimized code doesn't perform >> as a dylib, while as an executable it does? What was the work around? >> >> Or, does anyone know of problems with mixing objective-c and c++ in a >> dylib? >> >> >> >> As of now, I'm trying to isolate the module which causes the problem >> in release build, and see if I can isolate the code segment, but it is >> slow going, and I'm not sure whether this error will manifest >> somewhere else. >> >> -tim >> > _______________________________________________ webkit-help mailing list [email protected] http://lists.webkit.org/mailman/listinfo.cgi/webkit-help
