The problem is that the peer_select code trying to access a std::vector
which is already destroyed.
If we call peerSelect for fwd FwdState object eg:
    peerSelect(&fwd->serverDestinations, ..., fwd);

and the fwd state object become invalid (eg because client closed the
connection) then it is possible to see this crash.

I believe the fix for these cases is easy, we need to add the following
code at the beggining of peerSelectDnsResults and peerSelectDnsPaths
methods (and any other method uses the vector):

  if (!cbdataReferenceValid(psstate->callback_data)) {
    delete psstate;
    return;
  }

Regards,
   Christos

On 03/26/2014 11:13 AM, Kinkie wrote:
> And instead..
> the code testing Alex's suspects triggered in peer_select.cc:337, this
> is the relevant snippet of stack trace:
> 
> 
> #3  0x0000000000632f2d in Vector<RefCount<Comm::Connection> >::size (
>     this=<optimized out>, this=<optimized out>) at ../src/base/Vector.h:355
> #4  0x0000000000638a45 in peerSelectDnsResults (ia=0x29ff8b0, details=...,
>     data=0x2cf59b0) at peer_select.cc:337
> #5  0x000000000060d4e6 in ipcacheCallback (i=i@entry=0x29ff890,
>     wait=wait@entry=19) at ipcache.cc:347
> #6  0x000000000060dbd4 in ipcacheHandleReply (data=<optimized out>,
>     answers=<optimized out>, na=<optimized out>, error_message=0x0)
>     at ipcache.cc:497
> #7  0x000000000058b357 in idnsCallback (q=0x24eec80, q@entry=0x24f5370,
>     error=error@entry=0x0) at dns_internal.cc:1127
> 
> 
> I'm currently investigating more in depth.
> 
> On Wed, Mar 26, 2014 at 8:32 AM, Kinkie <[email protected]> wrote:
>> Hi,
>>   I think I have isolated the changelet in FwdState::connectionList.
>> However WHAT in that changelet may be causing the crashes is still
>> unknown. The hunt for iterate-after-clear has so far turned up
>> nothing.
>>
>> On Wed, Mar 26, 2014 at 8:01 AM, Niki Gorchilov <[email protected]> wrote:
>>> Hello there,
>>>
>>> Is there any progress regarding the random coredumps related to
>>> std::vector migration?
>>>
>>> Best,
>>> Niki
>>>
>>> On Mon, Mar 17, 2014 at 6:41 PM, Kinkie <[email protected]> wrote:
>>>> Yes, I will.
>>>>
>>>> On Mon, Mar 17, 2014 at 5:24 PM, Alex Rousskov
>>>> <[email protected]> wrote:
>>>>> Hello,
>>>>>
>>>>>     I have discovered one difference between std::vector and Squid's own
>>>>> Vector implementation that might explain some of the random crashes that
>>>>> some of us have been seeing after the Vector class was replaced with
>>>>> std::vector in trunk.
>>>>>
>>>>> Squid Vector sets the number of stored elements to zero on destruction.
>>>>> Std::vector does not do that. Here is the output of a simple test code
>>>>> quoted in the postscriptum:
>>>>>
>>>>>> alive std::vector size:    1
>>>>>> deleted std::vector size:  3
>>>>>> alive Squid Vector size:   1
>>>>>> deleted Squid Vector size: 0
>>>>>
>>>>> Both vectors behave correctly, but std::vector code is optimized not to
>>>>> do extra work such as maintaining the size member. This means that
>>>>> iterating a deleted Squid Vector is often safe (until the freed memory
>>>>> is overwritten) because the broken caller code would just discover that
>>>>> there are no items in the vector and move on.
>>>>>
>>>>> Iterating the deleted std::vector usually leads to crashes because all
>>>>> iteration-related methods such as size() rely on immediately deleted and
>>>>> changed pointers (std::vector does not store its size as a data member
>>>>> but uses memory pointers to compute the number of stored elements).
>>>>>
>>>>> This difference leads to std::vector migration problems such as this
>>>>> trivially reproducible one:
>>>>>
>>>>>> Adaptation::AccessRules &
>>>>>> Adaptation::AllRules()
>>>>>> {
>>>>>>     static AccessRules TheRules;
>>>>>>     return TheRules;
>>>>>> }
>>>>>
>>>>> After TheRules array is deallocated during global destructions,
>>>>> iterating AllRules() becomes unsafe. The old Vector-based code usually
>>>>> worked because deallocated TheRules had zero size.
>>>>>
>>>>> The specific bug mentioned above is trivial to work around by allocating
>>>>> TheRules dynamically (and never deleting it). However, if similar bugs
>>>>> exist in other code where Vector is used as a data member of some
>>>>> transaction-related structure, then they will lead to crashes. It is
>>>>> quite possible that the affected structure's memory itself is never
>>>>> deleted when the offending code accesses it (due to cbdata locks, for
>>>>> example) so the [equally buggy] Vector-based code "works".
>>>>>
>>>>> One way we can try to catch such cases is by temporary switching back to
>>>>> Vector and then:
>>>>>
>>>>> * Modifying Vector to mark deleted Vector objects:
>>>>>
>>>>> Vector::~Vector() {
>>>>>     clean();
>>>>>     deleted = true;
>>>>> }
>>>>>
>>>>> * And adjusting *every* Vector method to assert if a deleted object is
>>>>> still being used. For example:
>>>>>
>>>>> template<class E>
>>>>> size_t
>>>>> Vector<E>::size() const
>>>>> {
>>>>>     assert(!deleted);
>>>>>     return count;
>>>>> }
>>>>>
>>>>> If one of those asserts is triggered, we would be closer to solving this
>>>>> mystery.
>>>>>
>>>>>
>>>>> Kinkie, if you agree with this analysis, would you be able to make the
>>>>> above modifications and test?
>>>>>
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Alex.
>>>>>
>>>>>
>>>>> Goes into Squid's main.cc:
>>>>>> +#include <vector>
>>>>>> +#include "base/Vector.h"
>>>>>>  int
>>>>>>  main(int argc, char **argv)
>>>>>>  {
>>>>>> +       typedef std::vector<int> StdVector;
>>>>>> +       StdVector *stdv = new StdVector;
>>>>>> +       stdv->push_back(1);
>>>>>> +       std::cout << "alive std::vector size:    " << stdv->size() << 
>>>>>> "\n";
>>>>>> +       delete stdv;
>>>>>> +       std::cout << "deleted std::vector size:  " << stdv->size() << 
>>>>>> "\n";
>>>>>> +
>>>>>> +       typedef Vector<int> SqdVector;
>>>>>> +       SqdVector *sqdv = new SqdVector;
>>>>>> +       sqdv->push_back(1);
>>>>>> +       std::cout << "alive Squid Vector size:   " << sqdv->size() << 
>>>>>> "\n";
>>>>>> +       delete sqdv;
>>>>>> +       std::cout << "deleted Squid Vector size: " << sqdv->size() << 
>>>>>> "\n";
>>>>>> +       if (true) return 0;
>>>>>>      return SquidMainSafe(argc, argv);
>>>>>>  }
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>     Francesco
>>
>>
>>
>> --
>>     Francesco
> 
> 
> 

Reply via email to