Hi,

I am getting back to gmon/profil(2), replying to all feedback. Sorry for the 
delay.

On Wed, May 03, 2023 at 08:45:39AM -0500, Scott Cheloha wrote:
> On Tue, Apr 11, 2023 at 09:28:31AM +0200, Sebastien Marie wrote:
> 
> > One place where I needed to cheat a bit is at moncontrol(0) call in 
> > _mcleanup() 
> > function.
> > 
> > moncontrol(0) will stop profiling (change the state to GMON_PROF_OFF), and 
> > it is 
> > calling profil(2) to disable program counter profiling. _mcleanup() is the 
> > atexit(3) handler.
> > 
> > Instead of changing pledge(2) permission for profil(2) (which could go deep 
> > in 
> > the kernel to change the clockrate with setstatclockrate()), I just assumed 
> > that 
> > it isn't necessary to disable it here: we are in atexit(3), so the process 
> > is 
> > about to call _exit(2), and the kernel will stop the profiling itself if 
> > still 
> > running.
> 
> This is an acceptable workaround if you are using moncontrol(3).
> profil(2) is set up before main() and the teardown is transparent to
> the process.  We can probably neglect to turn profil(2) off.
> 
> If you wanted to be *really* safe you could remove the write
> permissions from the profil(2) buffer with mprotect(2) during
> _mcleanup().  This would prevent the kernel from racing with you.
> 
> On the other hand, it would be really nice if profil(2) could fit into
> the "stdio" pledge(2) promise.  I don't quite know what the
> implications of that would be.
> 
> profil(2) tells the kernel: write sample data at *this* address.
> Later, if that piece of memory is not readable and writable by the
> calling process, the copyin(9) or copyout(9) will fail when the thread
> is returning from the clock interrupt and profil(2) is quietly
> disabled for the process.
> 
> Is that set of actions a security concern?  Sorry if that's a dumb
> question.
> 
> It seems like the process is basically just operating on its own
> memory, but maybe it violates the spirit of the "stdio" promise?
> 

I didn't change profil(2) regarding pledge(2) as I basically doesn't understand 
the underline kernel code (which is a very good reason for me). All I saw is it 
is playing which {start,stop}profclock(9) which is going deep in the kernel, 
and 
regulary write in user supplied memory address.

So, as I had a somehow valid workaround (have profil(2) enabled before any 
pledge(2) call, and don't call it at end), I didn't looked to add profil(2) to 
"stdio". It wasn't strictly required.

Please note that adding it in "stdio" means almost every programs (even sshd 
sandbox) will gain it.

> > For post-execution gmon.out extraction, an helper might be needed: 
> > ktrace.out 
> > could contain multiple gmon.out files.
> > 
> > ## compile and collect profil information (-tu option on ktrace is optional)
> > $ cc -static -pg test.c
> > $ ktrace -di -tu ./a.out
> > 
> > ## get gmon.out file
> > $ kdump -u gmon.out | unvis > gmon.out
> > 
> > ## get gmon.out.$name.$pid for multiple processes
> > ##  - first get pid process-name
> > ##  - extract each gmon.out for each pid and store in "gmon.out.$name.$pid" 
> > file
> > $ kdump -tu | sed -ne 's/^ \([0-9][0-9]*\) \([^ ]*\) .*/\1 \2/p' | sort -u \
> >     | while read pid name; do kdump -u gmon.out -p $pid | unvis > 
> > gmon.out.$name.$pid ; done
> > 
> > kdump diff from otto@ mallocdump is need for 'kdump -u label'.
> 
> Have you taken a shot at changing kdump(8) to produce the gmon.out
> files gprof(1) expects?
> 
> If not, I will take a shot at it.

I looked a bit at kdump(8) to see how to be more user-friendly. Currently, I 
only added an option to produce binary data with -u label (basically to not use 
vis(3)).

When you have only on process data, it means using:

$ kdump -U -u gmon.out > gmon.out

I didn't look to make kdump(8) directly create files as it is currently a 
read-only tool (pledged "stdio rpath getpw"). But it might be a way.


On Thu, May 04, 2023 at 11:09:34AM +0200, Omar Polo wrote:
> On 2023/04/11 09:28:31 +0200, Sebastien Marie <sema...@online.fr> wrote:
> 
> I've used this to profile `gotadmin pack' a couple of times.  got uses
> several "libexec helpers" to parse data on disk, so profiling usually
> gets awkward.  You'd need to build only the program you're interested
> in (e.g. got-read-pack) with PROFILE=1 and collect the data.  $PROFDIR
> doesn't work out-of-the-box due to unveil (PROFILE builds disable
> pledge but leave unveil and permit "gmon.out" rwc.)  With gotd it gets
> a little more tricky since it's a single binary that gets fork+exec'd
> multiple times.  The argument popped up recently on the mailing list:
> 
> http://marc.gameoftrees.org/mail/1683051301.32966_0.html
> 
> With this instead, I can do one single run and collect the data for
> all processes, and keep pledge/unveil in place.
> 
> To test, I've rebuilt and installed libc (and libz, and libutil -
> probably wasn't needed), ran unifdef -m -UPROFILE on got and rebuild
> with PROFILE=1 (needed to get -pg -static and the various -l*_p.)
> 

I did most of my tests with got(1) too, in mostly the same way that you (but I 
prefered change the Makefile to not pass -DPROFILE).

Only a new libc are required to test (the diff doesn't change the 
instrumentation done by cc).

> I've uploaded the png obtained with `gprof2dot | dot -Tpng' here:
> 
>       https://ftp.omarpolo.com/tmp/
> 
> There are two directories: one for `gotadmin pack -a' in src.git and
> one in got.git.
> 
> I've observed that not always the profile data seems fine.  Each run,
> I have a few processes that report 100% on every function.  At first I
> thought it was an issue in how PROFILE=1 builds are done, but then I
> noticed that "good" and "bad" data can be gathered for different run
> of the same executable.
> 
> Compare for example
> 
> https://ftp.omarpolo.com/tmp/pack-got/profile.got-read-pack.52583.png
> (bad)
> 
> and
> 
> https://ftp.omarpolo.com/tmp/pack-got/profile.got-read-pack.37417.png
> (good)
> 
> both originating from the same run.  I still have the gmon.out files
> if needed.

It might be interesting to know if you have the same problem with current gmon.

> One comment below
> 
> > --- lib/libc/gmon/gmon.c
> > +++ lib/libc/gmon/gmon.c
> > @@ -51,6 +52,32 @@ void
> >  PROTO_NORMAL(moncontrol);
> >  PROTO_DEPRECATED(monstartup);
> >  
> > +static void
> > +montrace(void *addr, size_t len)
> > +{
> > +   for (;;) {
> > +           if (len < KTR_USER_MAXLEN) {
> 
> should be len <= KTR_USER_MAXLEN ?
> 

yes, utrace(2) will accept KTR_USER_MAXLEN bytes for data.

I think I mixed errors condition presented in the man page (the label should 
not 
be longer than KTR_USER_MAXIDLEN-1, but for data it is KTR_USER_MAXLEN).

> > +                   if (utrace("gmon.out", addr, len) == -1)
> > +                           ERR("error on utrace(), truncated gmon.out");
> > +                   return;
> > +           }
> > +           if (utrace("gmon.out", addr, KTR_USER_MAXLEN) == -1)
> > +                   ERR("error on utrace(), truncated gmon.out");
> 
> ERR expands to write(2, "..."), so I'd add a return here (very
> unlikely for utrace to fail, but if we check for failures we should
> also return IMHO)

I added the return.

Below an updated diff. It includes kdump(8) -U option.

Thanks
-- 
Sebastien Marie

diff e5bbc67172ce757f6e3f1f35e95f307733e7a018 
39747cbba49da1d3b062fdff8672c79c825f4ed0
commit - e5bbc67172ce757f6e3f1f35e95f307733e7a018
commit + 39747cbba49da1d3b062fdff8672c79c825f4ed0
blob - f09ffd91837040d44fb17a9e8c0197c72ba9459a
blob + f917f94a8263abcdf28b4a16674d81ba3f58530c
--- lib/libc/gmon/gmon.c
+++ lib/libc/gmon/gmon.c
@@ -29,6 +29,7 @@
  */
 
 #include <sys/time.h>
+#include <sys/ktrace.h>
 #include <sys/gmon.h>
 #include <sys/mman.h>
 #include <sys/sysctl.h>
@@ -51,6 +52,34 @@ void
 PROTO_NORMAL(moncontrol);
 PROTO_DEPRECATED(monstartup);
 
+static void
+montrace(void *addr, size_t len)
+{
+       for (;;) {
+               if (len <= KTR_USER_MAXLEN) {
+                       if (utrace("gmon.out", addr, len) == -1)
+                               ERR("error on utrace(), truncated gmon.out");
+                       return;
+               }
+               if (utrace("gmon.out", addr, KTR_USER_MAXLEN) == -1) {
+                       ERR("error on utrace(), truncated gmon.out");
+                       return;
+               }
+               
+               len -= KTR_USER_MAXLEN;
+               addr += KTR_USER_MAXLEN;
+       }
+}
+
+#ifdef DEBUG
+static int
+monlog(char *msg)
+{
+       size_t len = strlen(msg);
+       return utrace("gmon.log", msg, len);
+}
+#endif
+
 void
 monstartup(u_long lowpc, u_long highpc)
 {
@@ -136,7 +165,6 @@ _mcleanup(void)
 void
 _mcleanup(void)
 {
-       int fd;
        int fromindex;
        int endfrom;
        u_long frompc;
@@ -147,11 +175,8 @@ _mcleanup(void)
        struct clockinfo clockinfo;
        const int mib[2] = { CTL_KERN, KERN_CLOCKRATE };
        size_t size;
-       char *profdir;
-       char *proffile;
-       char  buf[PATH_MAX];
 #ifdef DEBUG
-       int log, len;
+       int len;
        char dbuf[200];
 #endif
 
@@ -169,67 +194,16 @@ _mcleanup(void)
                clockinfo.profhz = clockinfo.hz;        /* best guess */
        }
 
-       moncontrol(0);
+       /*
+        * Do not call moncontrol(0) (neither profil(2)) as we might be pledged.
+        * We are in _mcleanup(), so the process is inside exit(3).
+        */
+       p->state = GMON_PROF_OFF;
 
-       if (issetugid() == 0 && (profdir = getenv("PROFDIR")) != NULL) {
-               char *s, *t, *limit;
-               pid_t pid;
-               long divisor;
-
-               /* If PROFDIR contains a null value, no profiling
-                  output is produced */
-               if (*profdir == '\0') {
-                       return;
-               }
-
-               limit = buf + sizeof buf - 1 - 10 - 1 -
-                   strlen(__progname) - 1;
-               t = buf;
-               s = profdir;
-               while((*t = *s) != '\0' && t < limit) {
-                       t++;
-                       s++;
-               }
-               *t++ = '/';
-
-               /*
-                * Copy and convert pid from a pid_t to a string.  For
-                * best performance, divisor should be initialized to
-                * the largest power of 10 less than PID_MAX.
-                */
-               pid = getpid();
-               divisor=10000;
-               while (divisor > pid) divisor /= 10;    /* skip leading zeros */
-               do {
-                       *t++ = (pid/divisor) + '0';
-                       pid %= divisor;
-               } while (divisor /= 10);
-               *t++ = '.';
-
-               s = __progname;
-               while ((*t++ = *s++) != '\0')
-                       ;
-
-               proffile = buf;
-       } else {
-               proffile = "gmon.out";
-       }
-
-       fd = open(proffile , O_CREAT|O_TRUNC|O_WRONLY, 0664);
-       if (fd == -1) {
-               perror( proffile );
-               return;
-       }
 #ifdef DEBUG
-       log = open("gmon.log", O_CREAT|O_TRUNC|O_WRONLY, 0664);
-       if (log == -1) {
-               perror("mcount: gmon.log");
-               close(fd);
-               return;
-       }
        snprintf(dbuf, sizeof dbuf, "[mcleanup1] kcount 0x%x ssiz %d\n",
            p->kcount, p->kcountsize);
-       write(log, dbuf, strlen(dbuf));
+       monlog(dbuf);
 #endif
        hdr = (struct gmonhdr *)&gmonhdr;
        bzero(hdr, sizeof(*hdr));
@@ -238,8 +212,8 @@ _mcleanup(void)
        hdr->ncnt = p->kcountsize + sizeof(gmonhdr);
        hdr->version = GMONVERSION;
        hdr->profrate = clockinfo.profhz;
-       write(fd, (char *)hdr, sizeof *hdr);
-       write(fd, p->kcount, p->kcountsize);
+       montrace(hdr, sizeof *hdr);
+       montrace(p->kcount, p->kcountsize);
        endfrom = p->fromssize / sizeof(*p->froms);
        for (fromindex = 0; fromindex < endfrom; fromindex++) {
                if (p->froms[fromindex] == 0)
@@ -254,15 +228,14 @@ _mcleanup(void)
                        "[mcleanup2] frompc 0x%x selfpc 0x%x count %d\n" ,
                                frompc, p->tos[toindex].selfpc,
                                p->tos[toindex].count);
-                       write(log, dbuf, strlen(dbuf));
+                       monlog(dbuf);
 #endif
                        rawarc.raw_frompc = frompc;
                        rawarc.raw_selfpc = p->tos[toindex].selfpc;
                        rawarc.raw_count = p->tos[toindex].count;
-                       write(fd, &rawarc, sizeof rawarc);
+                       montrace(&rawarc, sizeof rawarc);
                }
        }
-       close(fd);
 #ifdef notyet
        if (p->kcount != NULL) {
                munmap(p->kcount, p->kcountsize);
blob - 936c630a1096cec78204ac075c6df83a2744e8e2
blob + 8c91cddf450404159851a2429154c2e283ad0bac
--- usr.bin/kdump/kdump.1
+++ usr.bin/kdump/kdump.1
@@ -37,7 +37,7 @@
 .Nd display kernel trace data
 .Sh SYNOPSIS
 .Nm kdump
-.Op Fl dHlnRTXx
+.Op Fl dHlnRTUXx
 .Op Fl f Ar file
 .Op Fl m Ar maxdata
 .Op Fl p Ar pid
@@ -116,9 +116,19 @@ escaping, without
 .Ar label
 as strings with
 .Xr vis 3
-escaping, without
+escaping (if
+.Fl U
+isn't used),
+without
 .Xr ktrace 2
 header information.
+.It Fl U
+Display
+.Xr utrace 2
+tracepoints as binary data (without
+.Xr vis 3
+escaping) when used with
+.Fl u label .
 .It Fl X
 Display I/O data with hexadecimal data and printable ASCII characters
 side by side.
blob - 6b1c118cdcd75fb5003c08990fb4c46c4455c827
blob + 791c409ae1dc84fa697f64e6b6425a034a974f3a
--- usr.bin/kdump/kdump.c
+++ usr.bin/kdump/kdump.c
@@ -84,11 +84,11 @@ int needtid, tail, basecol;
 } timestamp = TIMESTAMP_NONE;
 
 int decimal, iohex, fancy = 1, maxdata = INT_MAX;
-int needtid, tail, basecol;
+int needtid, tail, basecol, utracevis = 1;
 char *tracefile = DEF_TRACEFILE;
 struct ktr_header ktr_header;
 pid_t pid_opt = -1;
-char* utracefilter;
+char *utracefilter;
 
 #define eqs(s1, s2)    (strcmp((s1), (s2)) == 0)
 
@@ -169,7 +169,7 @@ main(int argc, char *argv[])
                        screenwidth = 80;
        }
 
-       while ((ch = getopt(argc, argv, "f:dHlm:np:RTt:u:xX")) != -1)
+       while ((ch = getopt(argc, argv, "f:dHlm:np:RTt:u:UxX")) != -1)
                switch (ch) {
                case 'f':
                        tracefile = optarg;
@@ -218,6 +218,11 @@ main(int argc, char *argv[])
                        utracefilter = optarg;
                        trpoints = KTRFAC_USER;
                        break;
+               case 'U':
+                       if (isatty(STDOUT_FILENO) == 1)
+                               errx(1, "couldn't enable utrace binary output 
on terminal");
+                       utracevis = 0;
+                       break;
                case 'x':
                        iohex = 1;
                        break;
@@ -1267,8 +1272,13 @@ showbuf(unsigned char *dp, size_t datalen)
        char visbuf[4 * KTR_USER_MAXLEN + 1];
 
        if (utracefilter != NULL) {
-               strvisx(visbuf, dp, datalen, VIS_SAFE | VIS_OCTAL);
-               printf("%s", visbuf);
+               if (utracevis) {
+                       strvisx(visbuf, dp, datalen, VIS_SAFE | VIS_OCTAL);
+                       printf("%s", visbuf);
+               } else {
+                       for (i = 0; i < datalen; i++)
+                               putchar(dp[i]);
+               }
                return;
        }
        if (iohex == 1) {

Reply via email to