Re: [SSSD] RFC: Improving the debug messages

Dan Lavu Mon, 29 Jun 2015 20:08:13 -0700

I've been watching various logs for the past few minutes, FWIW, I think a more 
casual message will help people better understand what SSSD is doing. Look at 
debug level 4, the first instance of a user name look up (getent passwd dlavu) 
a common command we tell folks to test to see if SSSD is working

(Mon Jun 29 22:10:59 2015) [sssd[be[lab.runlevelone.lan]]] 
[be_get_account_info] (0x0200): Got request for [0x1001][1][name=dlavu] 

Just imagine if you knew nothing about SSSD and how it worked, what can you 
gather from this line of text? Timestamp, process, domain, get_account_info, 
got_request and userid and the rest is sort of gibberish and what is the 
difference between be_get_account_info and got_request_for? I think it's 
confusing, now looking at some other logs messages. 

Jun 29 21:50:37 freeipa named-pkcs11[2101]: validating @0x7f1a1c60cfc0: 
fedorahosted.org NSEC: no valid signature found 

Jun 29 22:04:15 freeipa puppet-agent[2661]: The package type's allow_virtual 
parameter will be changing its default value from false to true in a future 
release. If you do not want to allow virtual packages, please explicitly set 
allow_virtual to false. 

[28/Jun/2015:15:45:27 -0400] NSMMReplicationPlugin - 
agmt="cn=meTofreebeer.lab.runlevelone.lan" (freebeer:389): Replication bind 
with GSSAPI auth failed: LDAP error -2 (Local error) (SASL(-1): generic 
failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more 
information (Cannot contact any KDC for realm 'LAB.RUNLEVELONE.LAN')) 

The following messages are pretty clear in what they are doing and what the 
issue is, also some logs have a structure to help readability (not sure if it 
was intentional or not) but it would be a few short messages, then the long 
message, something to break it up. I'm not saying we remove any of the logs, 
but maybe insert more layman descriptions that way people can use them as 
markers. So they can find the relevant event and look after that message. 

Maybe we format those as a standard syslog message 
(timestamp,loghost,application,level,message_id,message) and enable it by 
default? It might help administrators catch errors early and might help because 
its a little more familiar, I don't think we should roll with systemd just yet. 
;) 

Also I'll take a look at the ldap man page and see if I can do anything to make 
it easier to follow. 

Dan 

----- Original Message -----

> From: "Simo Sorce" <s...@redhat.com>
> To: "Development of the System Security Services Daemon"
> <sssd-devel@lists.fedorahosted.org>
> Cc: "George Agriogiannis" <gagri...@redhat.com>, "Dan Lavu"
> <dl...@redhat.com>
> Sent: Monday, June 29, 2015 5:53:45 PM
> Subject: Re: [SSSD] RFC: Improving the debug messages

> On Mon, 2015-06-29 at 18:13 +0200, Jakub Hrozek wrote:
> > Hi,
> >
> > I spent many hours debugging SSSD in different scenarios last week
> > and I
> > admit it wasn't always easy -- and I have the source code knowledge
> > I
> > can use. I imagine it's considerably harder for users and admins..
> >
> > So this is a brainstorm request on how can we make debugging with
> > SSSD
> > easier. Maybe there are some low-hanging fruits that can be fixed
> > easily. Off the top of my head:
> >
> > - it should be easier to see start and end of a request in the back
> > end.
> > Instead of:
> > [be_get_account_info] (0x0200): Got request for
> > [0x1001][1][name=admin]
> > [acctinfo_callback] (0x0100): Request processed. Returned
> > 0,0,Success (Success)
> > We could make the debug messages more explicit:
> > [be_get_account_info] (0x0200): Received request for
> > [object=user][key=name][value=admin]
> > [acctinfo_callback] (0x0200): Finished request for
> > [object=user][key=name][value=admin]. Returned 0,0,Success
> >
> > Then we could document the messages in our troubleshooting
> > document.
> > Please note I'm not proposing to turn debug messages into any kind
> > of
> > API and keep them the same forever, but decorate the usual flow
> > with
> > messages that make sense without source level knowledge.
> >
> > - same for authentication
> >
> > - same for responder cache requests. We seem to have gotten better
> > with
> > the new cache_req code there, so this is mostly about using the new
> > code in all responders. But also the commands we receive from
> > sockets
> > should be printed in human-readable form.
> >
> > - Running sssd in environment where all actions complete
> > successfully
> > should emit no debug messages. Default log level should be moved to
> > SSSDBG_OP_FAILURE or CRIT_FAILURE. (This basically amounts to
> > checking
> > all OP, FATAL and CRIT failure messages..)
> >
> > The reason is that sometimes sssd fails, but because logging is
> > totally silent, we don't know what happened at all. Currently we
> > have
> > a couple of small bugs where we might print a loud DEBUG message
> > just
> > because we search for an entry which is not there etc.
> >
> > - anything that causes SSSD to fail to start should also emit a
> > syslog
> > message. Admins don't really know about sssd debug logs.
> >
> > - our man pages are not structured well, especially the LDAP man
> > page is
> > too big and contains too many options.
> >
> > One reason I'm bringing this up now is that we'll have a new SSSD
> > developer
> > starting soon and these might be nice tasks to start with AND
> > they're
> > also needed.

> +1

> Simo.

> --
> Simo Sorce * Red Hat, Inc * New York

> _______________________________________________
> sssd-devel mailing list
> sssd-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/sssd-devel

_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel

Re: [SSSD] RFC: Improving the debug messages

Reply via email to