Hi all,
Alex analysis for the crashes is correct. However I believe that we
should try fixing the crashes, not be back to squid Vector. Or even if
we go back to squid Vector, we still need to fix this problem.
The Squid Vector, just hides the problem does not solve it.
Looks that in in many cases we are trying to access deleted objects.
Even if these objects are on cbdata which are still valid allocated
memory regions, they are invalid objects.
The problem has different form in various cases. For the Adaptation
lists (AccessRules, Services, Groups) we are emptying the std::vectors
inside the the Adaptation::*::TheConfig objects destructor.
But the std::vector's destroyed before the Adaptation::Icap::TheConfig
and Adaptation::Ecap::TheConfig objects. So we are trying to access
destroyed objects.
The global objects in C++ normally destroyed in the reverse order they
created. The Adaptation::*::TheConfig objects created on squid start,
but the std::vector lists created later when the
adaptation::All(Rules|Services|Groups)() method called. On squid exit,
the std::vector lists destroyed first.
I am attaching a patch which solves this problem. This patch creates
small temporary object classes to store the std::vector lists for
Rules, Services and Groups, which are responsible to empty and release
the lists on shutdown.
These lists are not emptied from Adaptation::Config
(Adaptation::*::TheConfig objects) any more, and a new method the
Adaptation::Config::Clear() added to empty these lists on reconfigure.
On 03/17/2014 06:24 PM, Alex Rousskov wrote:
> Hello,
>
> I have discovered one difference between std::vector and Squid's own
> Vector implementation that might explain some of the random crashes that
> some of us have been seeing after the Vector class was replaced with
> std::vector in trunk.
>
> Squid Vector sets the number of stored elements to zero on destruction.
> Std::vector does not do that. Here is the output of a simple test code
> quoted in the postscriptum:
>
>> alive std::vector size: 1
>> deleted std::vector size: 3
>> alive Squid Vector size: 1
>> deleted Squid Vector size: 0
>
> Both vectors behave correctly, but std::vector code is optimized not to
> do extra work such as maintaining the size member. This means that
> iterating a deleted Squid Vector is often safe (until the freed memory
> is overwritten) because the broken caller code would just discover that
> there are no items in the vector and move on.
>
> Iterating the deleted std::vector usually leads to crashes because all
> iteration-related methods such as size() rely on immediately deleted and
> changed pointers (std::vector does not store its size as a data member
> but uses memory pointers to compute the number of stored elements).
>
> This difference leads to std::vector migration problems such as this
> trivially reproducible one:
>
>> Adaptation::AccessRules &
>> Adaptation::AllRules()
>> {
>> static AccessRules TheRules;
>> return TheRules;
>> }
>
> After TheRules array is deallocated during global destructions,
> iterating AllRules() becomes unsafe. The old Vector-based code usually
> worked because deallocated TheRules had zero size.
>
> The specific bug mentioned above is trivial to work around by allocating
> TheRules dynamically (and never deleting it). However, if similar bugs
> exist in other code where Vector is used as a data member of some
> transaction-related structure, then they will lead to crashes. It is
> quite possible that the affected structure's memory itself is never
> deleted when the offending code accesses it (due to cbdata locks, for
> example) so the [equally buggy] Vector-based code "works".
>
> One way we can try to catch such cases is by temporary switching back to
> Vector and then:
>
> * Modifying Vector to mark deleted Vector objects:
>
> Vector::~Vector() {
> clean();
> deleted = true;
> }
>
> * And adjusting *every* Vector method to assert if a deleted object is
> still being used. For example:
>
> template<class E>
> size_t
> Vector<E>::size() const
> {
> assert(!deleted);
> return count;
> }
>
> If one of those asserts is triggered, we would be closer to solving this
> mystery.
>
>
> Kinkie, if you agree with this analysis, would you be able to make the
> above modifications and test?
>
>
> Thank you,
>
> Alex.
>
>
> Goes into Squid's main.cc:
>> +#include <vector>
>> +#include "base/Vector.h"
>> int
>> main(int argc, char **argv)
>> {
>> + typedef std::vector<int> StdVector;
>> + StdVector *stdv = new StdVector;
>> + stdv->push_back(1);
>> + std::cout << "alive std::vector size: " << stdv->size() << "\n";
>> + delete stdv;
>> + std::cout << "deleted std::vector size: " << stdv->size() << "\n";
>> +
>> + typedef Vector<int> SqdVector;
>> + SqdVector *sqdv = new SqdVector;
>> + sqdv->push_back(1);
>> + std::cout << "alive Squid Vector size: " << sqdv->size() << "\n";
>> + delete sqdv;
>> + std::cout << "deleted Squid Vector size: " << sqdv->size() << "\n";
>> + if (true) return 0;
>> return SquidMainSafe(argc, argv);
>> }
>
>
=== modified file 'src/adaptation/AccessRule.cc'
--- src/adaptation/AccessRule.cc 2013-05-13 22:48:23 +0000
+++ src/adaptation/AccessRule.cc 2014-04-08 12:48:07 +0000
@@ -31,45 +31,58 @@
// try to add a one-service group
if (FindService(groupId) != NULL) {
ServiceGroupPointer g = new SingleService(groupId);
g->finalize(); // explicit groups were finalized before rules
AllGroups().push_back(g);
}
}
if (!group()) {
debugs(93, DBG_CRITICAL, "ERROR: Unknown adaptation service or group name: '" <<
groupId << "'"); // TODO: fail on failures
}
}
Adaptation::ServiceGroupPointer
Adaptation::AccessRule::group()
{
return FindGroup(groupId);
}
+class RulesCfg {
+public:
+ Adaptation::AccessRules rules;
+ ~RulesCfg() {
+ while (!rules.empty()) {
+ Adaptation::AccessRule *r = rules.back();
+ rules.pop_back();
+ delete r;
+ }
+ }
+};
+
+static RulesCfg TheRulesCfg;
+
Adaptation::AccessRules &
Adaptation::AllRules()
{
- static AccessRules TheRules;
- return TheRules;
+ return TheRulesCfg.rules;
}
// TODO: make AccessRules::find work
Adaptation::AccessRule *
Adaptation::FindRule(const AccessRule::Id &id)
{
typedef AccessRules::iterator ARI;
for (ARI i = AllRules().begin(); i != AllRules().end(); ++i) {
if ((*i)->id == id)
return *i;
}
return NULL;
}
Adaptation::AccessRule *
Adaptation::FindRuleByGroupId(const String &groupId)
{
typedef AccessRules::iterator ARI;
for (ARI i = AllRules().begin(); i != AllRules().end(); ++i) {
=== modified file 'src/adaptation/Config.cc'
--- src/adaptation/Config.cc 2014-02-19 17:01:30 +0000
+++ src/adaptation/Config.cc 2014-04-08 12:40:23 +0000
@@ -148,45 +148,40 @@
removeService((*cfg)->key);
serviceConfigs.clear();
debugs(93, 3, HERE << "rules: " << AllRules().size() << ", groups: " <<
AllGroups().size() << ", services: " << serviceConfigs.size());
}
void
Adaptation::Config::parseService()
{
ServiceConfigPointer cfg = newServiceConfig();
if (!cfg->parse()) {
fatalf("%s:%d: malformed adaptation service configuration",
cfg_filename, config_lineno);
}
serviceConfigs.push_back(cfg);
}
void
Adaptation::Config::freeService()
{
- FreeAccess();
- FreeServiceGroups();
-
- DetachServices();
-
serviceConfigs.clear();
}
void
Adaptation::Config::dumpService(StoreEntry *entry, const char *name) const
{
typedef Services::iterator SCI;
for (SCI i = AllServices().begin(); i != AllServices().end(); ++i) {
const ServiceConfig &cfg = (*i)->cfg();
storeAppendPrintf(entry, "%s " SQUIDSTRINGPH "_%s %s %d " SQUIDSTRINGPH "\n",
name,
SQUIDSTRINGPRINT(cfg.key),
cfg.methodStr(), cfg.vectPointStr(), cfg.bypass,
SQUIDSTRINGPRINT(cfg.uri));
}
}
bool
Adaptation::Config::finalize()
{
@@ -308,20 +303,27 @@
LOCAL_ARRAY(char, nom, 64);
typedef AccessRules::iterator CI;
for (CI i = AllRules().begin(); i != AllRules().end(); ++i) {
snprintf(nom, 64, "%s " SQUIDSTRINGPH, name, SQUIDSTRINGPRINT((*i)->groupId));
dump_acl_access(entry, nom, (*i)->acl);
}
}
Adaptation::Config::Config() :
onoff(0), service_failure_limit(0), oldest_service_failure(0),
service_revival_delay(0)
{}
// XXX: this is called for ICAP and eCAP configs, but deals mostly
// with global arrays shared by those individual configs
Adaptation::Config::~Config()
{
freeService();
}
+
+void Adaptation::Config::Clear()
+{
+ FreeAccess();
+ FreeServiceGroups();
+ DetachServices();
+}
=== modified file 'src/adaptation/Config.h'
--- src/adaptation/Config.h 2014-02-10 16:39:10 +0000
+++ src/adaptation/Config.h 2014-04-08 12:39:30 +0000
@@ -10,40 +10,41 @@
#include "SquidString.h"
class ConfigParser;
class HttpRequest;
class HttpReply;
namespace Adaptation
{
class Config
{
public:
static void Finalize(bool enable);
static void ParseServiceSet(void);
static void ParseServiceChain(void);
static void ParseAccess(ConfigParser &parser);
static void FreeAccess(void);
static void DumpAccess(StoreEntry *, const char *);
+ static void Clear();
friend class AccessCheck;
public:
static bool Enabled; // true if at least one adaptation mechanism is
// these are global squid.conf options, documented elsewhere
static char *masterx_shared_name; // global TODO: do we need TheConfig?
static int service_iteration_limit;
static int send_client_ip;
static int send_username;
static int use_indirect_client;
// Options below are accessed via Icap::TheConfig or Ecap::TheConfig
// TODO: move ICAP-specific options to Icap::Config and add TheConfig
int onoff;
int service_failure_limit;
time_t oldest_service_failure;
int service_revival_delay;
=== modified file 'src/adaptation/Service.cc'
--- src/adaptation/Service.cc 2014-02-04 19:47:14 +0000
+++ src/adaptation/Service.cc 2014-04-08 12:55:11 +0000
@@ -34,45 +34,57 @@
if (cfg().point != filter.point)
return false;
// sending a message to a broken service is likely to cause errors
if (cfg().bypass && broken())
return false;
if (up()) {
// Sending a message to a service that does not want it is useless.
// note that we cannot check wantsUrl for service that is not "up"
// note that even essential services are skipped on unwanted URLs!
return wantsUrl(filter.request->urlpath);
}
// The service is down and is either not bypassable or not probed due
// to the bypass && broken() test above. Thus, we want to use it!
return true;
}
+class ServicesCfg{
+public:
+ Adaptation::Services services;
+ void detach(){
+ while (!services.empty()) {
+ services.back()->detach();
+ services.pop_back();
+ }
+ }
+ ~ServicesCfg() {
+ detach();
+ }
+};
+
+static ServicesCfg TheServicesCfg;
+
Adaptation::Services &
Adaptation::AllServices()
{
- static Services TheServices;
- return TheServices;
+ return TheServicesCfg.services;
}
Adaptation::ServicePointer
Adaptation::FindService(const Service::Id& key)
{
typedef Services::iterator SI;
for (SI i = AllServices().begin(); i != AllServices().end(); ++i) {
if ((*i)->cfg().key == key)
return *i;
}
return NULL;
}
void Adaptation::DetachServices()
{
- while (!AllServices().empty()) {
- AllServices().back()->detach();
- AllServices().pop_back();
- }
+ TheServicesCfg.detach();
}
=== modified file 'src/adaptation/ServiceGroups.cc'
--- src/adaptation/ServiceGroups.cc 2014-02-02 09:42:23 +0000
+++ src/adaptation/ServiceGroups.cc 2014-04-08 12:43:40 +0000
@@ -294,39 +294,50 @@
Adaptation::ServicePointer
Adaptation::ServicePlan::next(const ServiceFilter &filter)
{
if (!atEof && !group->findLink(filter, ++pos))
atEof = true;
return current();
}
std::ostream &
Adaptation::ServicePlan::print(std::ostream &os) const
{
if (!group)
return os << "[nil]";
return os << group->id << '[' << pos << ".." << group->services.size() <<
(atEof ? ".]" : "]");
}
/* globals */
+class GroupsCfg {
+public:
+ Adaptation::Groups groups;
+ ~GroupsCfg() {
+ while (!groups.empty()) {
+ // groups are refcounted so we do not explicitly delete them
+ groups.pop_back();
+ }
+ }
+};
+
+static GroupsCfg TheGroupsCfg;
Adaptation::Groups &
Adaptation::AllGroups()
{
- static Groups TheGroups;
- return TheGroups;
+ return TheGroupsCfg.groups;
}
Adaptation::ServiceGroupPointer
Adaptation::FindGroup(const ServiceGroup::Id &id)
{
typedef Groups::iterator GI;
for (GI i = AllGroups().begin(); i != AllGroups().end(); ++i) {
if ((*i)->id == id)
return *i;
}
return NULL;
}
=== modified file 'src/cache_cf.cc'
--- src/cache_cf.cc 2014-03-30 12:00:34 +0000
+++ src/cache_cf.cc 2014-04-08 12:44:33 +0000
@@ -4362,78 +4362,80 @@
}
static void
parse_adaptation_access_type()
{
Adaptation::Config::ParseAccess(LegacyParser);
}
#endif /* USE_ADAPTATION */
#if ICAP_CLIENT
static void
parse_icap_service_type(Adaptation::Icap::Config * cfg)
{
cfg->parseService();
}
static void
free_icap_service_type(Adaptation::Icap::Config * cfg)
{
+ Adaptation::Config::Clear();
cfg->freeService();
}
static void
dump_icap_service_type(StoreEntry * entry, const char *name, const Adaptation::Icap::Config &cfg)
{
cfg.dumpService(entry, name);
}
static void
parse_icap_class_type()
{
debugs(93, DBG_CRITICAL, "WARNING: 'icap_class' is depricated. " <<
"Use 'adaptation_service_set' instead");
Adaptation::Config::ParseServiceSet();
}
static void
parse_icap_access_type()
{
debugs(93, DBG_CRITICAL, "WARNING: 'icap_access' is depricated. " <<
"Use 'adaptation_access' instead");
Adaptation::Config::ParseAccess(LegacyParser);
}
#endif
#if USE_ECAP
static void
parse_ecap_service_type(Adaptation::Ecap::Config * cfg)
{
cfg->parseService();
}
static void
free_ecap_service_type(Adaptation::Ecap::Config * cfg)
{
+ Adaptation::Config::Clear();
cfg->freeService();
}
static void
dump_ecap_service_type(StoreEntry * entry, const char *name, const Adaptation::Ecap::Config &cfg)
{
cfg.dumpService(entry, name);
}
#endif /* USE_ECAP */
#if ICAP_CLIENT
static void parse_icap_service_failure_limit(Adaptation::Icap::Config *cfg)
{
char *token;
time_t d;
time_t m;
cfg->service_failure_limit = GetInteger();
if ((token = ConfigParser::NextToken()) == NULL)