Hi,
I am getting back to gmon/profil(2), replying to all feedback. Sorry for the
delay.
On Wed, May 03, 2023 at 08:45:39AM -0500, Scott Cheloha wrote:
> On Tue, Apr 11, 2023 at 09:28:31AM +0200, Sebastien Marie wrote:
>
> > One place where I needed to cheat a bit is at moncontrol(0) call in
> > _mcleanup()
> > function.
> >
> > moncontrol(0) will stop profiling (change the state to GMON_PROF_OFF), and
> > it is
> > calling profil(2) to disable program counter profiling. _mcleanup() is the
> > atexit(3) handler.
> >
> > Instead of changing pledge(2) permission for profil(2) (which could go deep
> > in
> > the kernel to change the clockrate with setstatclockrate()), I just assumed
> > that
> > it isn't necessary to disable it here: we are in atexit(3), so the process
> > is
> > about to call _exit(2), and the kernel will stop the profiling itself if
> > still
> > running.
>
> This is an acceptable workaround if you are using moncontrol(3).
> profil(2) is set up before main() and the teardown is transparent to
> the process. We can probably neglect to turn profil(2) off.
>
> If you wanted to be *really* safe you could remove the write
> permissions from the profil(2) buffer with mprotect(2) during
> _mcleanup(). This would prevent the kernel from racing with you.
>
> On the other hand, it would be really nice if profil(2) could fit into
> the "stdio" pledge(2) promise. I don't quite know what the
> implications of that would be.
>
> profil(2) tells the kernel: write sample data at *this* address.
> Later, if that piece of memory is not readable and writable by the
> calling process, the copyin(9) or copyout(9) will fail when the thread
> is returning from the clock interrupt and profil(2) is quietly
> disabled for the process.
>
> Is that set of actions a security concern? Sorry if that's a dumb
> question.
>
> It seems like the process is basically just operating on its own
> memory, but maybe it violates the spirit of the "stdio" promise?
>
I didn't change profil(2) regarding pledge(2) as I basically doesn't understand
the underline kernel code (which is a very good reason for me). All I saw is it
is playing which {start,stop}profclock(9) which is going deep in the kernel,
and
regulary write in user supplied memory address.
So, as I had a somehow valid workaround (have profil(2) enabled before any
pledge(2) call, and don't call it at end), I didn't looked to add profil(2) to
"stdio". It wasn't strictly required.
Please note that adding it in "stdio" means almost every programs (even sshd
sandbox) will gain it.
> > For post-execution gmon.out extraction, an helper might be needed:
> > ktrace.out
> > could contain multiple gmon.out files.
> >
> > ## compile and collect profil information (-tu option on ktrace is optional)
> > $ cc -static -pg test.c
> > $ ktrace -di -tu ./a.out
> >
> > ## get gmon.out file
> > $ kdump -u gmon.out | unvis > gmon.out
> >
> > ## get gmon.out.$name.$pid for multiple processes
> > ## - first get pid process-name
> > ## - extract each gmon.out for each pid and store in "gmon.out.$name.$pid"
> > file
> > $ kdump -tu | sed -ne 's/^ \([0-9][0-9]*\) \([^ ]*\) .*/\1 \2/p' | sort -u \
> > | while read pid name; do kdump -u gmon.out -p $pid | unvis >
> > gmon.out.$name.$pid ; done
> >
> > kdump diff from otto@ mallocdump is need for 'kdump -u label'.
>
> Have you taken a shot at changing kdump(8) to produce the gmon.out
> files gprof(1) expects?
>
> If not, I will take a shot at it.
I looked a bit at kdump(8) to see how to be more user-friendly. Currently, I
only added an option to produce binary data with -u label (basically to not use
vis(3)).
When you have only on process data, it means using:
$ kdump -U -u gmon.out > gmon.out
I didn't look to make kdump(8) directly create files as it is currently a
read-only tool (pledged "stdio rpath getpw"). But it might be a way.
On Thu, May 04, 2023 at 11:09:34AM +0200, Omar Polo wrote:
> On 2023/04/11 09:28:31 +0200, Sebastien Marie <[email protected]> wrote:
>
> I've used this to profile `gotadmin pack' a couple of times. got uses
> several "libexec helpers" to parse data on disk, so profiling usually
> gets awkward. You'd need to build only the program you're interested
> in (e.g. got-read-pack) with PROFILE=1 and collect the data. $PROFDIR
> doesn't work out-of-the-box due to unveil (PROFILE builds disable
> pledge but leave unveil and permit "gmon.out" rwc.) With gotd it gets
> a little more tricky since it's a single binary that gets fork+exec'd
> multiple times. The argument popped up recently on the mailing list:
>
> http://marc.gameoftrees.org/mail/1683051301.32966_0.html
>
> With this instead, I can do one single run and collect the data for
> all processes, and keep pledge/unveil in place.
>
> To test, I've rebuilt and installed libc (and libz, and libutil -
> probably wasn't needed), ran unifdef -m -UPROFILE on got and rebuild
> with PROFILE=1 (needed to get -pg -static and the various -l*_p.)
>
I did most of my tests with got(1) too, in mostly the same way that you (but I
prefered change the Makefile to not pass -DPROFILE).
Only a new libc are required to test (the diff doesn't change the
instrumentation done by cc).
> I've uploaded the png obtained with `gprof2dot | dot -Tpng' here:
>
> https://ftp.omarpolo.com/tmp/
>
> There are two directories: one for `gotadmin pack -a' in src.git and
> one in got.git.
>
> I've observed that not always the profile data seems fine. Each run,
> I have a few processes that report 100% on every function. At first I
> thought it was an issue in how PROFILE=1 builds are done, but then I
> noticed that "good" and "bad" data can be gathered for different run
> of the same executable.
>
> Compare for example
>
> https://ftp.omarpolo.com/tmp/pack-got/profile.got-read-pack.52583.png
> (bad)
>
> and
>
> https://ftp.omarpolo.com/tmp/pack-got/profile.got-read-pack.37417.png
> (good)
>
> both originating from the same run. I still have the gmon.out files
> if needed.
It might be interesting to know if you have the same problem with current gmon.
> One comment below
>
> > --- lib/libc/gmon/gmon.c
> > +++ lib/libc/gmon/gmon.c
> > @@ -51,6 +52,32 @@ void
> > PROTO_NORMAL(moncontrol);
> > PROTO_DEPRECATED(monstartup);
> >
> > +static void
> > +montrace(void *addr, size_t len)
> > +{
> > + for (;;) {
> > + if (len < KTR_USER_MAXLEN) {
>
> should be len <= KTR_USER_MAXLEN ?
>
yes, utrace(2) will accept KTR_USER_MAXLEN bytes for data.
I think I mixed errors condition presented in the man page (the label should
not
be longer than KTR_USER_MAXIDLEN-1, but for data it is KTR_USER_MAXLEN).
> > + if (utrace("gmon.out", addr, len) == -1)
> > + ERR("error on utrace(), truncated gmon.out");
> > + return;
> > + }
> > + if (utrace("gmon.out", addr, KTR_USER_MAXLEN) == -1)
> > + ERR("error on utrace(), truncated gmon.out");
>
> ERR expands to write(2, "..."), so I'd add a return here (very
> unlikely for utrace to fail, but if we check for failures we should
> also return IMHO)
I added the return.
Below an updated diff. It includes kdump(8) -U option.
Thanks
--
Sebastien Marie
diff e5bbc67172ce757f6e3f1f35e95f307733e7a018
39747cbba49da1d3b062fdff8672c79c825f4ed0
commit - e5bbc67172ce757f6e3f1f35e95f307733e7a018
commit + 39747cbba49da1d3b062fdff8672c79c825f4ed0
blob - f09ffd91837040d44fb17a9e8c0197c72ba9459a
blob + f917f94a8263abcdf28b4a16674d81ba3f58530c
--- lib/libc/gmon/gmon.c
+++ lib/libc/gmon/gmon.c
@@ -29,6 +29,7 @@
*/
#include <sys/time.h>
+#include <sys/ktrace.h>
#include <sys/gmon.h>
#include <sys/mman.h>
#include <sys/sysctl.h>
@@ -51,6 +52,34 @@ void
PROTO_NORMAL(moncontrol);
PROTO_DEPRECATED(monstartup);
+static void
+montrace(void *addr, size_t len)
+{
+ for (;;) {
+ if (len <= KTR_USER_MAXLEN) {
+ if (utrace("gmon.out", addr, len) == -1)
+ ERR("error on utrace(), truncated gmon.out");
+ return;
+ }
+ if (utrace("gmon.out", addr, KTR_USER_MAXLEN) == -1) {
+ ERR("error on utrace(), truncated gmon.out");
+ return;
+ }
+
+ len -= KTR_USER_MAXLEN;
+ addr += KTR_USER_MAXLEN;
+ }
+}
+
+#ifdef DEBUG
+static int
+monlog(char *msg)
+{
+ size_t len = strlen(msg);
+ return utrace("gmon.log", msg, len);
+}
+#endif
+
void
monstartup(u_long lowpc, u_long highpc)
{
@@ -136,7 +165,6 @@ _mcleanup(void)
void
_mcleanup(void)
{
- int fd;
int fromindex;
int endfrom;
u_long frompc;
@@ -147,11 +175,8 @@ _mcleanup(void)
struct clockinfo clockinfo;
const int mib[2] = { CTL_KERN, KERN_CLOCKRATE };
size_t size;
- char *profdir;
- char *proffile;
- char buf[PATH_MAX];
#ifdef DEBUG
- int log, len;
+ int len;
char dbuf[200];
#endif
@@ -169,67 +194,16 @@ _mcleanup(void)
clockinfo.profhz = clockinfo.hz; /* best guess */
}
- moncontrol(0);
+ /*
+ * Do not call moncontrol(0) (neither profil(2)) as we might be pledged.
+ * We are in _mcleanup(), so the process is inside exit(3).
+ */
+ p->state = GMON_PROF_OFF;
- if (issetugid() == 0 && (profdir = getenv("PROFDIR")) != NULL) {
- char *s, *t, *limit;
- pid_t pid;
- long divisor;
-
- /* If PROFDIR contains a null value, no profiling
- output is produced */
- if (*profdir == '\0') {
- return;
- }
-
- limit = buf + sizeof buf - 1 - 10 - 1 -
- strlen(__progname) - 1;
- t = buf;
- s = profdir;
- while((*t = *s) != '\0' && t < limit) {
- t++;
- s++;
- }
- *t++ = '/';
-
- /*
- * Copy and convert pid from a pid_t to a string. For
- * best performance, divisor should be initialized to
- * the largest power of 10 less than PID_MAX.
- */
- pid = getpid();
- divisor=10000;
- while (divisor > pid) divisor /= 10; /* skip leading zeros */
- do {
- *t++ = (pid/divisor) + '0';
- pid %= divisor;
- } while (divisor /= 10);
- *t++ = '.';
-
- s = __progname;
- while ((*t++ = *s++) != '\0')
- ;
-
- proffile = buf;
- } else {
- proffile = "gmon.out";
- }
-
- fd = open(proffile , O_CREAT|O_TRUNC|O_WRONLY, 0664);
- if (fd == -1) {
- perror( proffile );
- return;
- }
#ifdef DEBUG
- log = open("gmon.log", O_CREAT|O_TRUNC|O_WRONLY, 0664);
- if (log == -1) {
- perror("mcount: gmon.log");
- close(fd);
- return;
- }
snprintf(dbuf, sizeof dbuf, "[mcleanup1] kcount 0x%x ssiz %d\n",
p->kcount, p->kcountsize);
- write(log, dbuf, strlen(dbuf));
+ monlog(dbuf);
#endif
hdr = (struct gmonhdr *)&gmonhdr;
bzero(hdr, sizeof(*hdr));
@@ -238,8 +212,8 @@ _mcleanup(void)
hdr->ncnt = p->kcountsize + sizeof(gmonhdr);
hdr->version = GMONVERSION;
hdr->profrate = clockinfo.profhz;
- write(fd, (char *)hdr, sizeof *hdr);
- write(fd, p->kcount, p->kcountsize);
+ montrace(hdr, sizeof *hdr);
+ montrace(p->kcount, p->kcountsize);
endfrom = p->fromssize / sizeof(*p->froms);
for (fromindex = 0; fromindex < endfrom; fromindex++) {
if (p->froms[fromindex] == 0)
@@ -254,15 +228,14 @@ _mcleanup(void)
"[mcleanup2] frompc 0x%x selfpc 0x%x count %d\n" ,
frompc, p->tos[toindex].selfpc,
p->tos[toindex].count);
- write(log, dbuf, strlen(dbuf));
+ monlog(dbuf);
#endif
rawarc.raw_frompc = frompc;
rawarc.raw_selfpc = p->tos[toindex].selfpc;
rawarc.raw_count = p->tos[toindex].count;
- write(fd, &rawarc, sizeof rawarc);
+ montrace(&rawarc, sizeof rawarc);
}
}
- close(fd);
#ifdef notyet
if (p->kcount != NULL) {
munmap(p->kcount, p->kcountsize);
blob - 936c630a1096cec78204ac075c6df83a2744e8e2
blob + 8c91cddf450404159851a2429154c2e283ad0bac
--- usr.bin/kdump/kdump.1
+++ usr.bin/kdump/kdump.1
@@ -37,7 +37,7 @@
.Nd display kernel trace data
.Sh SYNOPSIS
.Nm kdump
-.Op Fl dHlnRTXx
+.Op Fl dHlnRTUXx
.Op Fl f Ar file
.Op Fl m Ar maxdata
.Op Fl p Ar pid
@@ -116,9 +116,19 @@ escaping, without
.Ar label
as strings with
.Xr vis 3
-escaping, without
+escaping (if
+.Fl U
+isn't used),
+without
.Xr ktrace 2
header information.
+.It Fl U
+Display
+.Xr utrace 2
+tracepoints as binary data (without
+.Xr vis 3
+escaping) when used with
+.Fl u label .
.It Fl X
Display I/O data with hexadecimal data and printable ASCII characters
side by side.
blob - 6b1c118cdcd75fb5003c08990fb4c46c4455c827
blob + 791c409ae1dc84fa697f64e6b6425a034a974f3a
--- usr.bin/kdump/kdump.c
+++ usr.bin/kdump/kdump.c
@@ -84,11 +84,11 @@ int needtid, tail, basecol;
} timestamp = TIMESTAMP_NONE;
int decimal, iohex, fancy = 1, maxdata = INT_MAX;
-int needtid, tail, basecol;
+int needtid, tail, basecol, utracevis = 1;
char *tracefile = DEF_TRACEFILE;
struct ktr_header ktr_header;
pid_t pid_opt = -1;
-char* utracefilter;
+char *utracefilter;
#define eqs(s1, s2) (strcmp((s1), (s2)) == 0)
@@ -169,7 +169,7 @@ main(int argc, char *argv[])
screenwidth = 80;
}
- while ((ch = getopt(argc, argv, "f:dHlm:np:RTt:u:xX")) != -1)
+ while ((ch = getopt(argc, argv, "f:dHlm:np:RTt:u:UxX")) != -1)
switch (ch) {
case 'f':
tracefile = optarg;
@@ -218,6 +218,11 @@ main(int argc, char *argv[])
utracefilter = optarg;
trpoints = KTRFAC_USER;
break;
+ case 'U':
+ if (isatty(STDOUT_FILENO) == 1)
+ errx(1, "couldn't enable utrace binary output
on terminal");
+ utracevis = 0;
+ break;
case 'x':
iohex = 1;
break;
@@ -1267,8 +1272,13 @@ showbuf(unsigned char *dp, size_t datalen)
char visbuf[4 * KTR_USER_MAXLEN + 1];
if (utracefilter != NULL) {
- strvisx(visbuf, dp, datalen, VIS_SAFE | VIS_OCTAL);
- printf("%s", visbuf);
+ if (utracevis) {
+ strvisx(visbuf, dp, datalen, VIS_SAFE | VIS_OCTAL);
+ printf("%s", visbuf);
+ } else {
+ for (i = 0; i < datalen; i++)
+ putchar(dp[i]);
+ }
return;
}
if (iohex == 1) {