On 05/21/2018 03:51 PM, Simo Sorce wrote:
On Mon, 2018-05-21 at 11:52 +0200, Pavel Březina wrote:
On 05/18/2018 09:50 PM, Simo Sorce wrote:
On Fri, 2018-05-18 at 16:11 +0200, Sumit Bose wrote:
On Fri, May 18, 2018 at 02:33:32PM +0200, Pavel Březina wrote:
Hi folks,
I sent a mail about new sbus implementation (I'll refer to it as sbus2) [1].

Sorry Pavel,
but I need to ask, why a new bus instead of somthing like varlink ?

This is an old work, we did not know about varlink until this work was
already finished. But since we still provide public D-Bus API, we need a
way to work with it anyway.

Ack, thanks, wasn't sure how old the approach was, so I just asked :-)

Now, I'm integrating it into SSSD. The work is quite difficult since it
touches all parts of SSSD and the changes are usually interconnected but I'm
slowly moving towards the goal [2].

At this moment, I'm trying to take "miminum changes" paths so the code can
be built and function with sbus2, however to take full advantage of it, it
will take further improvements (that will not be very difficult).

There is one big change that I would like to take though, that needs to be
discussed. It is about how we currently handle sbus connections.

In current state, monitor and each backend creates a private sbus server.
The current implementation of a private sbus server is not a message bus, it
only serves as an address to create point to point nameless connection. Thus
each client must maintain several connections:
   - each responder is connected to monitor and to all backends
   - each backend is connected to monitor
   - we have monitor + number of backends private servers
   - each private server maintains about 10 active connections

This has several disadvantages - there are many connections, we cannot
broadcast signals, if a process wants to talk to other process it needs to
connect to its server and maintain the connection. Since responders do not
currently provider a server, they cannot talk between each other.

This design has a key advantage, a single process going down does not
affect all other processes communication. How do you recover if the
"switch-board" goes down during message processing with sbus ?

The "switch-board" will be restarted and other processes will reconnect.
The same way as it is today when one backend dies.

Yes, but what about in-flight operations ?
Both client and server will abort and retry ?
Will the server just keep around data forever ?
It'd be nice to understand the mechanics of recovery to make sure the
actual clients do not end up being impacted, by lack of service.

See below.

sbus2 implements proper private message bus. So it can work in the same way
as session or system bus. It is a server that maintains the connections,
keep tracks of their names and then routes messages from one connection to
another.

My idea is to have only one sbus server managed by monitor.

This conflict wth the idea of getting rid of the monitor process, do
not know if this is currently still pursued but it was brought up over
and over many times that we might want to use systemd as the "monitor"
and let socket activation deal with the rest.

I chose monitor process for the message bus, since 1) it is stable, 2)
it is idle most of the time. However, it can be a process on its own.

Not sure that moving it to another process makes a difference, the
concern would be the same I think.

Yes.

That being said, it does not conflict with removing the monitoring
functionality. We only leave a single message bus.

Right but at that point might as well retain monitoring ...


   Other processes
will connect to this server with a named connection (e.g. sssd.nss,
sssd.backend.dom1, sssd.backend.dom2). We can then send message to this
message bus (only one connection) and set destination to name (e.g. sssd.nss
to invalidate memcache). We can also send signals to this bus and it will
broadcast it to all connections that listens to this signals. So, it is
proper way how to do it. It will simplify things and allow us to send
signals and have better IPC in general.

I know we want to eventually get rid of the monitor, the process would stay
as an sbus server. It would become a single point of failure, but the
process can be restarted automatically by systemd in case of crash.

Also here is a bonus question - do any of you remember why we use private
server at all?

In the very original design there was a "switch-board" process which
received a request from one component and forwarded it to the right
target. I guess at this time we didn't know a lot about DBus to
implement this properly. In the end we thought it was a useless overhead
and removed it. I think we didn't thought about signals to all components
or the backend sending requests to the frontends.

Why don't we connect to system message bus?

Mainly because we do not trust it to handle plain text passwords and
other credentials with the needed care.

That and because at some point there was a potential chicken-egg issue
at startup, and also because we didn't want to handle additional error
recovery if the system message bus was restarted.

Fundamentally the system message bus is useful only for services
offering a "public" service, otherwise it is just an overhead, and has
security implications.

Thank you for explanation.

I do not see any benefit in having a private server.

There is no way to break into sssd via a bug in the system message bus.
This is one good reason, aside for the other above.

Fundamentally we needed a private structured messaging system we could
easily integrate with tevent. The only usable option back then was
dbus, and given we already had ideas about offering some plugic
interface over the message bus we went that way so we could later reuse
the integration.

Today we'd probably go with something a lot more lightweight like
varlink.

If I understood you correctly we not only have 'a' private server but 4
for a typically minimal setup (monitor, pam, nss, backend).

Given your arguments above I think using a private message bus would
have benefits. Currently two questions came to my mind. First, what
happens to ongoing requests if the monitor dies and is restarted. E.g.
If the backend is processing a user lookup request and the monitor is
restarted can the backend just send the reply to the freshly stared
instance and the nss responder will finally get it? Or is there some
state lost which would force the nss responder to resend the request?

It works the same way as now. If backend dies, responders will reconnect
once it is up again. So no messages are lost.

If the message bus die, clients will reconnect and then send awaiting
replies. Also the sbus code will be pretty much stable so it is far less
likely to crash (of course I expect some issues during review).

So you expect requests to be still serviceable if the message bus dies.
How does a client find out if a service dies and needs to send a new
request ? Will it have to time out and try again ? Or is there any
messaging that let's a client know it has to restart asap ?
And if the message bus dies and a service dies before it comes back up
how does a client find out ?

How would the responder even know the other side died, is there a way
for clients to know that services died and all requests in flight need
to be resent ?

If client send request to a destination that is not available it will
return a specific error code. The client can decide how to deal with it
(return cached data or an error, resend the message once it is available).

I am not concerned with messages sent while a service is down, more
about what happens while the client is waiting.

There are D-Bus signals (NameOwnerChanged/NameOwnerLost/...) that can be
used as well.

Given our use case, we can queue it in message bus until the destination
is available (this is currently not implemented, but it is doable).

It is important to recover speedily, if at any point a crash leads to a
cascade of timeouts this will be very disruptive, and will have a much
bigger impact than the current behavior.

The reconnection mechanism is the same as it is now, I just polished and fixed one or two corner cases. Code is available here:
https://github.com/pbrezina/sbus/blob/master/src/sbus/connection/sbus_dispatcher.c#L88
https://github.com/pbrezina/sbus/blob/master/src/sbus/connection/sbus_reconnect.c

1) Dispatcher wants to send/read messages and it finds out that the connection was dropped 2) We try to reconnect until we get the new connection or we reached maximum attempts 3) We get new connection. I am not sure if the dispatcher will process queued messages or they are overwritten, this is internal D-Bus stuff. The behavior is the same as with current sbus implementation.

Depending on 3), timeout may or may not occur. This needs to be tested.

The second is about the overhead. Do you have any numbers how much
longer e.g. the nss responder has to wait e.g. for a backend if offline
reply? I would expect that we loose more time at other places,
nevertheless it would be good to have some basic understanding about the
overhead.

This needs to be measured. But we currently implement sbus servers in
busy processes so logically, it takes more time to process a message
then routing from a single-purpose process.

I do not think this follows. Processing messages is relatively fast,
the problem with a 3rd process is 2 more context switches. Context
switches add latency, and trash more caches, and may cause severe
performance drop depending on the workload.

Latency is what e should be worried ab out, one other reason to go with
direct connections is that you did not have to wait for 3 processes to
be awake and be scheduled (client/monitor/server) but only 2
(client/server). On busy machines the latency can be (relatively) quite
high if an additional process need to be scheduled just to pass a long
a message.

This needs to be measured in such environment.

Yes, it would be nice to have some numbers with clients never hitting
the fast cache and looping through requests that have to go all the way
to the backend each time. For example creating an ldap server with 10k
users each only with a private group and then issuing 10k getpwnam
requests, and see the difference between current code and new code.
Running multiple tests in the same conditions will be important.
Ie first dummy run to prime LDAP server caches, then a number of runs
to average on.

I'm not going to do this change unless the code works completely with message bus in each backend. If we agree that this is something that can be done, we can get numbers then and switch back if it will have significant latency.
_______________________________________________
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/sssd-devel@lists.fedorahosted.org/message/KH4FLSPYRASGZ6ZQOIYUY75I4OJOMLIO/

Reply via email to