On Tue, Nov 20, 2012 at 06:31:36PM +0100, Lennart Poettering wrote: > On Tue, 20.11.12 03:35, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: > > > > My intention was to speak only HTTP for all of this, so that we can > > > nicely work through firewalls. > > Yeah, probably that's more useful than raw stream for normal purposes, > > since it allows for authentication and whatnot. > > Yeah, and not just that. I also want to beef up the server side so that > it optionally can run as CGI and as fastCGI, so that people can > integrate that into their existing web servers, if they wish. > > But yeah, using HTTP solves many many issues, such as auth, encryption, > firewall/proxy support, and so on. On top of this the semantics of log > syncing fit really nicely into the GET/POST model of HTTP. > > > > I think it would make sense to drop things into > > > /var/log/journal/<hostname>/*.journal by default. The hostname would > > > have to be determined from the URL the user specified on the command > > > line. Ideally we'd use the machine ID here, but since the machine ID is > > > hardly something the user should specify on the command line (and we > > > cannot just take the machine ID supplied form the other side, because we > > > probably should not trust that and hence allow it to tell us to > > > overwrite another hosts' data), the hostname is the next best > > > thing. Currently libsystemd-journald will ignore directories that are > > > not machine IDs when browsing, but we could easily drop that limitation. > > So it seems that this mapping (url/source/whatever -> .journal path) > > will require some thought. > > > > I'd imagine, that people will want to use this most often as a syslogd > > replacement, i.e. launch systemd-journal-remote on a central host, and > > then let all other hosts stream messages live. In this case we know > > only two things: _MACHINE_ID specified remotely, and the remote > > IP:PORT and thus hostname. Actually, I thought that since all those > > things are "unreliable" (IP only to some extent, but still), they > > wouldn't be used to determine the output file, and all output would go > > into one .journal. > > So, my thinking here is that hostnames generally suck for identifying > machines since they are not unique, can change and sometimes are not set > at all. However, that is only true in the general case. In the specific > case where admins want to set up an infrastructure for centralizing logs > they first set up a network, and as part of that I am pretty sure they > came up with a sane naming/addressing scheme first, that makes the name > unique in their local setup, makes the names fixed and ensures the name > is always there. Or to put this in other words: to be able to sync logs > from another hosts you first need to think about how you can contact > that other host, and hence had to introduce a naming scheme first, and > we should be able to just build on that. Exactly. I was thinking about --trust-hostname=no|cert|always as described in the other mail.
> > I remember that samba does (did?) something like what you suggest, and > > kept separate logs based on the information under control of the > > connecting host. On a host connected to the internet this would lead > > to hundreds of log files. > > > > In addition, .journal files have a fairly big overhead: ~180kB for a > > an "empty" file. This overhead might be unwanted if there are many > > sources. > > > > Maybe there's no one answer, and choices will have to be provided. > > I think it definitely makes sense to allow admins to name the local > destination dir as they want. I am mostly just interested in finding a > good default, and I'd vote extracting the "basename" of the URL used to > access the remote journal for that. > > > > > Push mode is not implemented... (but it would be a separate program > > > > anyway). > > > > > > My intention was actually to keep this in the same tool. So that we'd > > > have for input and output: > > > > > > A) HTTP GET > > > B) HTTP POST > > > C) SSH PULL (would invoke "journalctl -o export" via ssh) > > > D) SSH PUSH (would invoke systemd-journald-remote via ssh) > > > E) A directory for direct read access (which would allows us to merge > > > multiplefile into one with this tool) > > > F) A directory for direct write access (which is of course the > > > default) > > > > Also useful: > > B1) socket listen() without HTTP > > Where would I want to use that instead of B? It's much easier to write a non-HTTP client. And it's a natural extension of allowing it locally, through a pipe. > > B2) HTTPS POST (I'm assuming that POST means to listen) > > HTTPS for me is just a special case of HTTP. When I meant HTTP above I > meant HTTP with and without TLS, and with and without authentication. Yeah, but usually one listens for the one or the other. Ugrades from HTTP to HTTPS don't work well. > > E1) a specific file for read access > > F1) a specific file for write access > > That's something we have to think about anyway: i.e. whether we should > allow accessing a separate journal file via libsystemd-journal? > Currently we only allow accessing dirs. The reason for that is more or > less that accessing files probably doesn't do what people assume it > would do, since files are subject to rotation and referencing a file > hence quickly becomes a dangling reference... Reading - for debugging purposes and other special purposes. Writing - for example when I want to transfer a journal file to somebody, it is much easier with one file than with multiple files. > > B1, F, F1 are implemented; A is implemented but ugly (curl). > > E and E1 would require pulling in journalctl functionality. > > > > > We should always require that either E or F is used, but in any > > > combination with any of the others. > > I think it is useful to allow the output directory to be implicit > > (e.g. /var/log/journal/<hostname>/remote.journal can be used). > > Yes, definitely. > > > > > > > Examples: > > > > journalctl -o export | systemd-journal-remoted --stdin -o /tmp/dir/ > > > > > > Sounds pretty cool. Pretty close to what I'd have in mind. > > > > > > To make this even shorter I'd suggest though that we take two normal > > > args for source and dest, and that "-" is used as stdin/stdout > > > respectively, and the dest can be ommited: > > > > It started this way during development, but I'm not so sure if it'll > > be always clear what is meant: > > B, B1, and B2 can also come from socket activation, thus not appearing > > on > > Well, but socket activation can easily be detected, and be treated > specially? I.e. if sd_listen_fds() returns > 0 we could always go into > activation mode? Yes, so for example, I want to start systemd-journal-remote with two sockets from socket activation and write to /data/mylog. In my scheme i'll say ExecStart=.../systemd-journal-remote -o /data/mylog > > the command line, but output might still be specified. > > OTOH, there might be multiple sources, and the implicit output dir. > > Multiple sources? What do you mean? For example, pulling from three hosts: systemd-journal-remote -o /data/mylog http://host1 http://host2 http://host3 Currently, this is more or less working, and I think that it is worth supporting. Zbyszek _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel