On Mon, 2009-11-16 at 17:08 +0100, Martin Nagy wrote: > Simo Sorce wrote: > > On Mon, 2009-11-16 at 08:46 +0100, Martin Nagy wrote: > > > Simo Sorce wrote: > > > > While working on a patch to use failover in the ldap driver I found a > > > > few bugs and a few things I felt missing. > > > > > > > > Attached a patch to fix bugs and add a function to get back a server > > > > name from a fo_server structure. > > > > > > > > Simo. > > > > > > > > P.S: if you are interested in the failover patch I have a working but > > > > unfinished patch here: > > > > http://fedorapeople.org/gitweb?p=simo/public_git/sssd.git;a=commit;h=29830482f5e44b5425fb91f82fd5a4ee692f3ae2 > > > > > > Just one thing. I was counting on the fact that the user of the fail > > > over will make sure that he doesn't try to create the same service > > > twice and doing so will mean a bug in the code. Looking at your patch > > > and having done a very quick look at your branch, it seems like you > > > count for this behavior (which I think is fairly reasonable). In that > > > case, could you also please change this debugging statement? > > > > > > DEBUG(1, ("Service %s already exists\n", name)); > > > > > > Please either completely remove it, or (IMHO better) change the debug > > > level. > > > > I will probably raise the debug level, but it's not a big deal for now. > > > > > Regarding the ares error code, I'm not 100% sure what's the right > > > thing to do. Perhaps we should use a constant, like EIO and count on > > > the logging statement few lines up to be sufficient for the user. Any > > > other ideas? > > > > Not sure yet, what we need from callers are 3 type of errors. > > 1. an error that tells us that no server is resolvable/available (to > > avoid looping forever) > > 2. an error that tells us the current server is not resolvable/available > > 3. any other error should mean something really bad happened and we > > should abort. > > > > We miss n.2 for now so I am treating any error except ENOENT to mean the > > current server is not resolvable and we need to try the next one. > > > > Simo. > > OK. Nothing actually wrong with the patch, so ack.
Ok I found another bug, so I am re-spinning the patch. After the SERVER_NOT_WORKING timeout for a server was expired, we were setting the status to SERVER_NAME_RESOLVED This is wrong for 2 reasons: 1. we may have marked the server as SERVER_NOT_WORKING because we could not resolve the name, this means the name was not resolved at all, by setting it as SERVER_NAME_RESOLVED we have no way to force a new lookup, we will basically never be able to lookup the server again. 2. If a sever becomes unavailable because the server has been moved, we would never have a chance to resolve it to the new address again because setting the server to SERVER_NOT_WORKING status wouldn't work (see point 1). Even if the server was correctly resolved and SERVER_NOT_WORKING was set because of other reasons (a service stopped working for example), we want to try to resolve the name again just in case. Simo. -- Simo Sorce * Red Hat, Inc * New York
>From 732cf6b816fe4fa0b23d8092470e3711cb375872 Mon Sep 17 00:00:00 2001 From: Simo Sorce <sso...@redhat.com> Date: Sun, 15 Nov 2009 18:07:36 -0500 Subject: [PATCH 1/2] Failover fixes and additions --- server/providers/fail_over.c | 28 ++++++++++++++++++++++------ server/providers/fail_over.h | 2 ++ server/resolv/async_resolv.c | 2 +- 3 files changed, 25 insertions(+), 7 deletions(-) diff --git a/server/providers/fail_over.c b/server/providers/fail_over.c index fa599e3..1b9e017 100644 --- a/server/providers/fail_over.c +++ b/server/providers/fail_over.c @@ -129,8 +129,8 @@ get_server_status(struct fo_server *server) if (timeout != 0 && server->common->server_status == SERVER_NOT_WORKING) { gettimeofday(&tv, NULL); if (STATUS_DIFF(server->common, tv) > timeout) { - server->common->server_status = SERVER_NAME_RESOLVED; - server->last_status_change.tv_sec = tv.tv_sec; + server->common->server_status = SERVER_NAME_NOT_RESOLVED; + server->last_status_change.tv_sec = 0; } } @@ -196,6 +196,9 @@ fo_new_service(struct fo_ctx *ctx, const char *name, ret = fo_get_service(ctx, name, &service); if (ret == EOK) { DEBUG(1, ("Service %s already exists\n", name)); + if (_service) { + *_service = service; + } return EEXIST; } else if (ret != ENOENT) { return ret; @@ -215,7 +218,9 @@ fo_new_service(struct fo_ctx *ctx, const char *name, DLIST_ADD(ctx->service_list, service); talloc_set_destructor(service, service_destructor); - *_service = service; + if (_service) { + *_service = service; + } return EOK; } @@ -227,7 +232,7 @@ fo_get_service(struct fo_ctx *ctx, const char *name, struct fo_service *service; DLIST_FOR_EACH(service, ctx->service_list) { - if (!strcasecmp(name, service->name)) { + if (!strcmp(name, service->name)) { *_service = service; return EOK; } @@ -476,6 +481,10 @@ fo_resolve_service_done(struct tevent_req *subreq) while ((request = common->request_list) != NULL) { DLIST_REMOVE(common->request_list, request); if (resolv_status) { + /* FIXME FIXME: resolv_status is an ARES error. + * but any caller will expect classic error codes. + * also the send() function may return ENOENT, so this mix + * IS explosive (ENOENT = 2 = ARES_EFORMER) */ tevent_req_error(request->req, resolv_status); } else { tevent_req_done(request->req); @@ -490,11 +499,13 @@ fo_resolve_service_recv(struct tevent_req *req, struct fo_server **server) state = tevent_req_data(req, struct resolve_service_state); - TEVENT_REQ_RETURN_ON_ERROR(req); - + /* always return the server if asked for, otherwise the caller + * cannot mark it as faulty in case we return an error */ if (server) *server = state->server; + TEVENT_REQ_RETURN_ON_ERROR(req); + return EOK; } @@ -532,6 +543,11 @@ fo_get_server_port(struct fo_server *server) return server->port; } +const char *fo_get_server_name(struct fo_server *server) +{ + return server->common->name; +} + struct hostent * fo_get_server_hostent(struct fo_server *server) { diff --git a/server/providers/fail_over.h b/server/providers/fail_over.h index b6e2f2f..5fa9ff0 100644 --- a/server/providers/fail_over.h +++ b/server/providers/fail_over.h @@ -101,6 +101,8 @@ void *fo_get_server_user_data(struct fo_server *server); int fo_get_server_port(struct fo_server *server); +const char *fo_get_server_name(struct fo_server *server); + struct hostent *fo_get_server_hostent(struct fo_server *server); #endif /* !__FAIL_OVER_H__ */ diff --git a/server/resolv/async_resolv.c b/server/resolv/async_resolv.c index c05461d..6ac5e41 100644 --- a/server/resolv/async_resolv.c +++ b/server/resolv/async_resolv.c @@ -78,7 +78,7 @@ return_code(int ares_code) return ENOMEM; case ARES_EFILE: default: - return -1; + return EIO; } } -- 1.6.2.5
_______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/sssd-devel