Hi all,

For a while now I've been meaning to improve our profiler.  I finally
got around to writing a new profiler a few weeks ago and I've been
testing it on and off since then.  I think now it's ready to start
integrating it.  I've successfully made some performance improvements in
uri-generic and intarweb.  The irregex performance improvement I sent to
chicken-hackers was also a result from running it with the new profiler.

I believe the profiler needs improving because of these limitations:

* Instrumentation adds a whole lot of overhead and prevents various
   optimizations due to requiring dynamic-wind and turning all calls
   to such procedures into a CPS call.  This skews the results,
   sometimes so extremely to make the profile output useless.
   If you want to see an example of this, try profiling the "nbody"
   benchmark from Mario's chicken-benchmarks repo.  It goes from
   1s to 17s on my laptop.  Other examples include "knucleotide",
   "primes" and "ray2".
* Only toplevel procedures are instrumented.  This means that programs
   with large procedures or programs which rely heavily on closures will
   be hard to profile.  It's possible to work around this by tweaking
   the source code, but this is very annoying to do.
* Checking performance of a program requires recompilation with
   -profile.  This is only a minor annoyance, but sometimes you
   want to look "deeper" inside the library calls that your program
   performs, and that means you'll have to recompile those libraries
   with -profile as well (and possibly libraries they use, and so on),
   which gets annoying pretty quickly.  Also, you can easily forget you
   did that for a library, and this will slow down all programs using
   this library.

The attached patches add a *second* profiler to CHICKEN, which uses
statistical sampling to figure out where a program spends most of its
time.  It registers a timer with setitimer().  This will trigger a
signal to be delivered every 10ms (this is customisable via "-:P").

When a signal arrives, we look at the top of the trace buffer to
find the currently running procedure.  This means it needs *no*
additional instrumentation or even recompilation: libraries or
programs which are compiled without -no-trace or -d0 (i.e., most
of them) are instantly profileable!

This statistical sampling method is basically what gprof(1) and profil(3)
also do, from what I understand of their documentation.

To see it in action, just compile the target program as normal, and then
when running it, pass it the "-:p" switch.  This is recognised by the
runtime system and therefore any program can be passed this switch.
On exit, it will write out a "PROFILE.1234" file, where 1234 is the pid,
just like under the instrumentation-based profiler.  This can only be
read with the new version of chicken-profile, which I've tweaked a tiny
bit to understand the new profile format, which has a header now.

The tweak to chicken-profile is necessary because the statistical
profiler has very little context to work with: it only sees the top
of the trace buffer, so it knows what procedure is currently running.
Because it takes a sample every 10 ms, it can count how much time
is spent in that procedure, but it doesn't really know how it got there,
as it can't carefully track procedure entry/exit events like the
instrumentation does under the old profiler.

So, we know how much time is spent in a procedure, we can only calculate
the percentage of the entire running time spent, individually per
procedure.  This means the percentages add up to 100, which is different
from the original profiler.  There, the percentage will be 100 for the
"main" procedure, and less for everything called by it.  Because of this,
both profilers now write a "header" symbol to the start of the profile,
which chicken-profile recognises and will change its behaviour on.

The new profiler has none of the disadvantages of the old profiler, but
comes with its own set of caveats:

* If a program blocks signals, profiling is unreliable.  I got a report
   from Kooda that SDL's vsync() seems to do this, so this is not always
   directly under the programmer's control.
* The profile results are very different: it shows how much time is
   spent in the procedure that was called last, making it very low-level.
   This can sometimes result in a harder to read profile, with lots of
   details.
   The old profiler would simply state "procedure X is slow", while the
   new will say "call (A) on line 1 is slowest followed by call (B) on
   line 1507 and call (C) on line 5", even though lines 1 and 5 are both
   part of procedure X.  Ideally you would want some way to sum the
   calls belonging to the same procedure.  On the other hand, it can
   also be useful to know what each *part* of a procedure contributes
   to the profile.
* Due to the statistical approach, the call counts can be *completely*
   wrong.  If you have very fast procedures which are called extremely
   often, the number of calls will be way too low because it has missed
   several calls in between two samples.

Finally, I should mention two more caveats, but I don't think these are
huge problems:

* Due to the statistical approach, the time spent can be over- or under-
   allocated.  We simply allocate the 10ms to the currently running
   procedure at the time we take the sample.  This should in general not
   be a problem because we take so many samples that it will average out:
   In the long run, small, fast procedures should not be running too often
   while a sample is taken.
* The code is all in C, this is required to reduce the impact on
   performance as much as possible, and to avoid having to recompile.
   Also, code running with interrupts disabled would not receive the
   signal if we processed the signal in Scheme.  Finally, the profiler
   might always see itself running when taking a sample if we did it in
   Scheme!

Currently, profiling only works for compiled programs, but I think with
some more or less invasive changes it may be possible to even profile
programs running under the interpreter.  This could not be made to work
with the old profiler, as far as I can see.

Given the above laundry list of caveats and the elegant approach of
the old profiler (all in Scheme, very little code, precise and more
high-level results), I think we should not be removing that one just
yet.  In fact, I think we could even merge some of these parts of the
code, which should also remove some of the overhead of the old profiler,
reducing some of its problems.

I'm sure the above text sounds very rambling, so please feel free to
ask if you have questions.  I'm also preparing a blog post that
describes the new profiler in a more structured and easy-going way.

There's a lot more we could potentially do to improve either profiler,
but let's get started by applying this first.

Cheers,
Peter
From 1f86fb74396e07b0d230e0c204b6c7baab95f8b7 Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Sat, 5 Dec 2015 16:28:59 +0100
Subject: [PATCH 1/3] Add simple statistical profiler to runtime library

This enables collection of profiling data via statistical sampling to
every program built with CHICKEN.  It relies on trace information for
determining which procedure is running.  This also means it has a finer
granularity than the default instrumentation-based profiler.  This can
be an advantage or disadvantage depending on what you're trying to do.
---
 chicken.h                 |   1 +
 library.scm               |   2 +
 manual/Using the compiler |   6 +-
 runtime.c                 | 214 ++++++++++++++++++++++++++++++++++++++++++++--
 support.scm               |   2 +-
 5 files changed, 216 insertions(+), 9 deletions(-)

diff --git a/chicken.h b/chicken.h
index 5f78ac6..5a41f73 100644
--- a/chicken.h
+++ b/chicken.h
@@ -1827,6 +1827,7 @@ C_fctexport C_cpsproc(C_dump_heap_state) C_noret;
 C_fctexport C_cpsproc(C_filter_heap_objects) C_noret;
 
 C_fctexport time_t C_fcall C_seconds(C_long *ms) C_regparm;
+C_fctexport C_word C_i_dump_statistical_profile();
 C_fctexport C_word C_a_i_list(C_word **a, int c, ...);
 C_fctexport C_word C_a_i_string(C_word **a, int c, ...);
 C_fctexport C_word C_a_i_record(C_word **a, int c, ...);
diff --git a/library.scm b/library.scm
index a07d314..9fa4761 100644
--- a/library.scm
+++ b/library.scm
@@ -3946,6 +3946,8 @@ EOF
   (when (##sys#fudge 37)		; -:H given?
     (##sys#print "\n" #f ##sys#standard-error)
     (##sys#dump-heap-state))
+  (when (##sys#fudge 45)		; -:p or -:P given?
+    (##core#inline "C_i_dump_statistical_profile"))
   (let loop ()
     (let ((tasks ##sys#cleanup-tasks))
       (set! ##sys#cleanup-tasks '())
diff --git a/manual/Using the compiler b/manual/Using the compiler
index 39ec22c..3ea2f78 100644
--- a/manual/Using the compiler	
+++ b/manual/Using the compiler	
@@ -137,7 +137,7 @@ the source text should be read from standard input.
 ; -prelude EXPRESSIONS : Add {{EXPRESSIONS}} before all other toplevel expressions in the compiled file.  This option may be given multiple times. Processing of this option takes place before processing of {{-prologue}}.
 
 ; -profile : 
-; -accumulate-profile : Instruments the source code to count procedure calls and execution times. After the program terminates (either via an explicit {{exit}} or implicitly), profiling statistics are written to a file named {{PROFILE.<randomnumber>}}. Each line of the generated file contains a list with the procedure name, the number of calls and the time spent executing it. Use the {{chicken-profile}} program to display the profiling information in a more user-friendly form. Enter {{chicken-profile -help}} at the command line to get a list of available options. The {{-accumulate-profile}} option is similar to {{-profile}}, but the resulting profile information will be appended to any existing {{PROFILE}} file. {{chicken-profile}} will merge and sum up the accumulated timing information, if several entries for the same procedure calls exist. Only profiling information for global procedures will be collected.
+; -accumulate-profile : Instruments the source code to count procedure calls and execution times. After the program terminates (either via an explicit {{exit}} or implicitly), profiling statistics are written to a file named {{PROFILE.<randomnumber>}}. Each line of the generated file contains a list with the procedure name, the number of calls and the time spent executing it. Use the {{chicken-profile}} program to display the profiling information in a more user-friendly form. Enter {{chicken-profile -help}} at the command line to get a list of available options. The {{-accumulate-profile}} option is similar to {{-profile}}, but the resulting profile information will be appended to any existing {{PROFILE}} file. {{chicken-profile}} will merge and sum up the accumulated timing information, if several entries for the same procedure calls exist. Only profiling information for global procedures will be collected.  See the {{-:p}} option under [[#runtime-options|"Runtime options"]] below for statistical profiling support.
 
 ; -profile-name FILENAME : Specifies name of the generated profile information (which defaults to {{PROFILE.<randomnumber>}}. Implies {{-profile}}.
 
@@ -224,6 +224,10 @@ compiler itself) accept a small set of runtime options:
 
 ; {{-:o}} : Disables detection of stack overflows at run-time.
 
+; {{-:p}} : Enable collection of statistics for profiling purposes and write to PROFILE.{{pid}} on exit.  This functions at a granularity defined by the trace information in the binary and libraries: each traced function will show up in the output.  See the {{-profile}} compiler option for instrumentation-based profiling.  The {{PROFILE.pid}} format is compatible with the format generated by instrumentation-based profiling.
+
+; {{-:Pfreq}} : Same as {{-:p}} but set the sampling frequency in microseconds (default is 10000 microseconds or every 10 milliseconds).
+
 ; {{-:r}} : Writes trace output to stderr. This option has no effect with in files compiled with the {{-no-trace}} options.
 
 ; {{-:sNUMBER}} : Specifies stack size.
diff --git a/runtime.c b/runtime.c
index d65c3a5..57ebdc7 100644
--- a/runtime.c
+++ b/runtime.c
@@ -63,6 +63,15 @@
 # define EOVERFLOW  0
 #endif
 
+/* ITIMER_PROF is more precise, but Cygwin doesn't support it... */
+#ifdef __CYGWIN__
+# define C_PROFILE_SIGNAL SIGALRM
+# define C_PROFILE_TIMER  ITIMER_REAL
+#else
+# define C_PROFILE_SIGNAL SIGPROF
+# define C_PROFILE_TIMER  ITIMER_PROF
+#endif
+
 /* TODO: Include sys/select.h? Windows doesn't seem to have it... */
 #ifndef NO_POSIX_POLL
 #  include <poll.h>
@@ -154,6 +163,7 @@ static C_TLS int timezone;
 #define TEMPORARY_STACK_SIZE	       4096
 #define STRING_BUFFER_SIZE             4096
 #define DEFAULT_MUTATION_STACK_SIZE    1024
+#define PROFILE_TABLE_SIZE             1024
 
 #define MAX_PENDING_INTERRUPTS         100
 
@@ -302,6 +312,14 @@ typedef struct hdump_bucket_struct
   struct hdump_bucket_struct *next;
 } HDUMP_BUCKET;
 
+typedef struct profile_bucket_struct
+{
+  C_char *key;
+  C_uword sample_count; /* Multiplied by profile freq = time spent */
+  C_uword call_count;   /* Distinct calls seen while sampling */
+  struct profile_bucket_struct *next;
+} PROFILE_BUCKET;
+
 
 /* Variables: */
 
@@ -351,7 +369,9 @@ C_TLS C_uword
   C_heap_growth,
   C_heap_shrinkage;
 C_TLS C_uword C_maximal_heap_size;
-C_TLS time_t C_startup_time_seconds;
+C_TLS time_t
+  C_startup_time_seconds,
+  profile_frequency = 10000;
 C_TLS char 
   **C_main_argv,
   *C_dlerror;
@@ -424,7 +444,9 @@ static C_TLS int
   chicken_ran_once,
   pass_serious_signals = 1,
   callback_continuation_level;
-static volatile C_TLS int serious_signal_occurred = 0;
+static volatile C_TLS int
+  serious_signal_occurred = 0,
+  profiling = 0;
 static C_TLS unsigned int
   mutation_count,
   tracked_mutation_count,
@@ -459,6 +481,7 @@ static C_TLS FINALIZER_NODE
 static C_TLS void *current_module_handle;
 static C_TLS int flonum_print_precision = FLONUM_PRINT_PRECISION;
 static C_TLS HDUMP_BUCKET **hdump_table;
+static C_TLS PROFILE_BUCKET **profile_table = NULL;
 static C_TLS int 
   pending_interrupts[ MAX_PENDING_INTERRUPTS ],
   pending_interrupts_count,
@@ -491,6 +514,7 @@ static void C_fcall really_remark(C_word *x) C_regparm;
 static C_word C_fcall intern0(C_char *name) C_regparm;
 static void C_fcall update_locative_table(int mode) C_regparm;
 static LF_LIST *find_module_handle(C_char *name);
+static void take_profile_sample();
 
 static C_cpsproc(call_cc_wrapper) C_noret;
 static C_cpsproc(call_cc_values_wrapper) C_noret;
@@ -715,12 +739,15 @@ int CHICKEN_initialize(int heap, int stack, int symbols, void *toplevel)
   dlopen_flags = 0;
 #endif
 
-  /* setup signal handlers */
-  if(!pass_serious_signals) {
 #ifdef HAVE_SIGACTION
     sa.sa_flags = 0;
     sigfillset(&sa.sa_mask); /* See note in C_establish_signal_handler() */
     sa.sa_handler = global_signal_handler;
+#endif
+
+  /* setup signal handlers */
+  if(!pass_serious_signals) {
+#ifdef HAVE_SIGACTION
     C_sigaction(SIGBUS, &sa, NULL);
     C_sigaction(SIGFPE, &sa, NULL);
     C_sigaction(SIGILL, &sa, NULL);
@@ -759,6 +786,21 @@ int CHICKEN_initialize(int heap, int stack, int symbols, void *toplevel)
   callback_continuation_level = 0;
   gc_ms = 0;
   (void)C_randomize(C_fix(time(NULL)));
+
+  if (profiling) {
+#ifdef HAVE_SIGACTION
+    C_sigaction(C_PROFILE_SIGNAL, &sa, NULL);
+#else
+    C_signal(C_PROFILE_SIGNAL, global_signal_handler);
+#endif
+
+    profile_table = (PROFILE_BUCKET **)C_malloc(PROFILE_TABLE_SIZE * sizeof(PROFILE_BUCKET *));
+
+    if(profile_table == NULL)
+      panic(C_text("out of memory - can not allocate profile table"));
+
+    C_memset(profile_table, 0, sizeof(PROFILE_BUCKET *) * PROFILE_TABLE_SIZE);
+  }
   
   /* create k to invoke code for system-startup: */
   k0 = (C_SCHEME_BLOCK *)C_align((C_word)C_fromspace_top);
@@ -1081,7 +1123,10 @@ void global_signal_handler(int signum)
   }
 #endif
 
-  C_raise_interrupt(signal_mapping_table[ signum ]);
+  /* TODO: Make full use of sigaction: check that /our/ timer expired */
+  if (signum == C_PROFILE_SIGNAL && profiling) take_profile_sample();
+  else C_raise_interrupt(signal_mapping_table[ signum ]);
+
 #ifndef HAVE_SIGACTION
   /* not necessarily needed, but older UNIXen may not leave the handler installed: */
   C_signal(signum, global_signal_handler);
@@ -1246,6 +1291,8 @@ void CHICKEN_parse_command_line(int argc, char *argv[], C_word *heap, C_word *st
 		 " -:hsPERCENTAGE   set heap shrink percentage\n"
 		 " -:hSIZE          set fixed heap size\n"
 		 " -:r              write trace output to stderr\n"
+		 " -:p              collect statistical profile and write to file at exit\n"
+		 " -:PFREQ          like -:p, specifying sampling frequency in us (default: 10000)\n"
 		 " -:sSIZE          set nursery (stack) size\n"
 		 " -:tSIZE          set symbol-table size\n"
                  " -:fSIZE          set maximal number of pending finalizers\n"
@@ -1340,6 +1387,15 @@ void CHICKEN_parse_command_line(int argc, char *argv[], C_word *heap, C_word *st
 	  C_enable_gcweak = 1;
 	  break;
 
+	case 'P':
+	  profiling = 1;
+	  profile_frequency = arg_val(ptr);
+          goto next;
+
+	case 'p':
+	  profiling = 1;
+          break;
+
 	case 'r':
 	  show_trace = 1;
 	  break;
@@ -1408,6 +1464,18 @@ C_word CHICKEN_run(void *toplevel)
   chicken_is_running = chicken_ran_once = 1;
   return_to_host = 0;
 
+  if(profiling) {
+    struct itimerval itv;
+
+    itv.it_value.tv_sec = profile_frequency / 1000000;
+    itv.it_value.tv_usec = profile_frequency % 1000000;
+    itv.it_interval.tv_sec = itv.it_value.tv_sec;
+    itv.it_interval.tv_usec = itv.it_value.tv_usec;
+
+    if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1)
+      panic(C_text("error setting timer for profiling"));
+  }
+
 #if C_STACK_GROWS_DOWNWARD
   C_stack_limit = (C_word *)((C_byte *)C_stack_pointer - stack_size);
 #else
@@ -1436,6 +1504,18 @@ C_word CHICKEN_run(void *toplevel)
     ((C_proc)C_restart_trampoline)(C_restart_c, p);
   }
 
+  if(profiling) {
+    struct itimerval itv;
+
+    itv.it_value.tv_sec = 0;
+    itv.it_value.tv_usec = 0;
+    itv.it_interval.tv_sec = itv.it_value.tv_sec;
+    itv.it_interval.tv_usec = itv.it_value.tv_usec;
+
+    if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1)
+      panic(C_text("error clearing timer for profiling"));
+  }
+
   chicken_is_running = 0;
   return C_restore;
 }
@@ -3786,6 +3866,59 @@ C_regparm void *C_fcall C_retrieve2_symbol_proc(C_word val, char *name)
   return C_fast_retrieve_proc(val);
 }
 
+/* Bump profile count for current top of trace buffer */
+static void take_profile_sample()
+{
+  PROFILE_BUCKET **bp, *b;
+  C_char *key;
+  TRACE_INFO *tb;
+  /* To count distinct calls of a procedure, remember last call */
+  static C_char *prev_key = NULL;
+  static TRACE_INFO *prev_tb = NULL;
+
+  /* trace_buffer_top points *beyond* the topmost entry: Go back one */
+  if (trace_buffer_top == trace_buffer) {
+    if (!trace_buffer_full) return; /* No data yet */
+    tb = trace_buffer_limit - 1;
+  } else {
+    tb = trace_buffer_top - 1;
+  }
+
+  key = tb->raw;
+  if (key == NULL) return; /* May happen while in C_trace() */
+
+  /* We could also just hash the pointer but that's a bit trickier */
+  bp = profile_table + hash_string(C_strlen(key), key, PROFILE_TABLE_SIZE, 0, 0);
+  b = *bp;
+
+  /* First try to find pre-existing item in hash table */
+  while(b != NULL) {
+    if(b->key == key) {
+      b->sample_count++;
+      if (prev_key != key && prev_tb != tb)
+        b->call_count++;
+      goto done;
+    }
+    else b = b->next;
+  }
+
+  /* Not found, allocate a new item and use it as bucket's new head */
+  b = (PROFILE_BUCKET *)C_malloc(sizeof(PROFILE_BUCKET));
+
+  if(b == NULL)
+    panic(C_text("out of memory - cannot allocate profile table-bucket"));
+
+  b->next = *bp;
+  b->key = key;
+  *bp = b;
+  b->sample_count = 1;
+  b->call_count = 1;
+
+done:
+  prev_tb = tb;
+  prev_key = key;
+}
+
 
 C_regparm void C_fcall C_trace(C_char *name)
 {
@@ -3869,7 +4002,9 @@ C_char *C_dump_trace(int start)
 
 C_regparm void C_fcall C_clear_trace_buffer(void)
 {
-  int i;
+  int i, old_profiling = profiling;
+
+  profiling = 0;
 
   if(trace_buffer == NULL) {
     if(C_trace_buffer_size < MIN_TRACE_BUFFER_SIZE)
@@ -3890,15 +4025,18 @@ C_regparm void C_fcall C_clear_trace_buffer(void)
     trace_buffer[ i ].cooked2 = C_SCHEME_FALSE;
     trace_buffer[ i ].thread = C_SCHEME_FALSE;
   }
+  profiling = old_profiling;
 }
 
 C_word C_resize_trace_buffer(C_word size) {
-  int old_size = C_trace_buffer_size;
+  int old_size = C_trace_buffer_size, old_profiling = profiling;
   assert(trace_buffer);
+  profiling = 0;
   free(trace_buffer);
   trace_buffer = NULL;
   C_trace_buffer_size = C_unfix(size);
   C_clear_trace_buffer();
+  profiling = old_profiling;
   return(C_fix(old_size));
 }
 
@@ -4417,6 +4555,9 @@ C_regparm C_word C_fcall C_fudge(C_word fudge_factor)
   case C_fix(44):  /* whether debugger is active */
     return C_mk_bool(C_debugging);
 
+  case C_fix(45):  /* Whether we're currently profiling */
+    return C_mk_bool(profiling);
+
   default: return C_SCHEME_UNDEFINED;
   }
 }
@@ -9264,6 +9405,65 @@ C_i_get_keyword(C_word kw, C_word args, C_word def)
   return def;
 }
 
+C_word C_i_dump_statistical_profile()
+{
+  PROFILE_BUCKET *b, *b2, **bp;
+  FILE *fp;
+  C_char *k1, *k2 = NULL;
+  int n;
+  double ms;
+  struct itimerval itv;
+
+  assert(profiling);
+  assert(profile_table != NULL);
+
+  itv.it_value.tv_sec = 0;
+  itv.it_value.tv_usec = 0;
+  itv.it_interval.tv_sec = itv.it_value.tv_sec;
+  itv.it_interval.tv_usec = itv.it_value.tv_usec;
+
+  if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1)
+    panic(C_text("error clearing timer for profiling"));
+
+  profiling = 0; /* In case a SIGPROF is delivered late */
+  bp = profile_table;
+
+  C_snprintf(buffer, STRING_BUFFER_SIZE, C_text("PROFILE.%d"), C_getpid());
+
+  if(debug_mode)
+    C_dbg(C_text("debug"), C_text("dumping statistical profile to `%s'...\n"), buffer);
+
+  fp = C_fopen(buffer, "w");
+  if (fp == NULL)
+    panic(C_text("could not write profile!"));
+
+  for(n = 0; n < PROFILE_TABLE_SIZE; ++n) {
+    for(b = bp[ n ]; b != NULL; b = b2) {
+      b2 = b->next;
+
+      k1 = b->key;
+      C_fputs(C_text("(|"), fp);
+      /* Dump raw C string as if it were a symbol */
+      while((k2 = C_strpbrk(k1, C_text("\\|"))) != NULL) {
+        C_fwrite(k1, 1, k2-k1, fp);
+        C_fputc('\\', fp);
+        C_fputc(*k2, fp);
+        k1 = k2+1;
+      }
+      C_fputs(k1, fp);
+      ms = (double)b->sample_count * (double)profile_frequency / 1000.0;
+      C_fprintf(fp, C_text("| " UWORD_COUNT_FORMAT_STRING " %lf)\n"),
+                b->call_count, ms);
+      C_free(b);
+    }
+  }
+
+  C_fclose(fp);
+  C_free(profile_table);
+  profile_table = NULL;
+
+  return C_SCHEME_UNDEFINED;
+}
 
 void C_ccall C_dump_heap_state(C_word c, C_word *av)
 {
diff --git a/support.scm b/support.scm
index 888933d..28437ff 100644
--- a/support.scm
+++ b/support.scm
@@ -168,7 +168,7 @@
 	((string? x) (string->symbol x))
 	(else (string->symbol (sprintf "~a" x))) ) )
 
-(define (backslashify s) (string-translate (->string s) "\\" "\\\\"))
+(define (backslashify s) (string-translate* (->string s) '(("\\" . "\\\\"))))
 
 (define (uncommentify s) (string-translate* (->string s) '(("*/" . "*_/"))))
   
-- 
2.1.4

From 4a3a494b9845d102ec73e31e43d26a640776bd40 Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Sun, 6 Dec 2015 17:02:12 +0100
Subject: [PATCH 2/3] Support profiling on Windows with native timers

In MingW there's no setitimer support, so we'll have to use a native
Windows API.  This API unfortunately only supports millisecond precision.
The CreateWaitableTimer API seems like it supports better precision, but
it requires that you wait for it using WaitFor{Single,Multiple}Object.
---
 runtime.c | 100 ++++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 59 insertions(+), 41 deletions(-)

diff --git a/runtime.c b/runtime.c
index 57ebdc7..8c09564 100644
--- a/runtime.c
+++ b/runtime.c
@@ -63,15 +63,6 @@
 # define EOVERFLOW  0
 #endif
 
-/* ITIMER_PROF is more precise, but Cygwin doesn't support it... */
-#ifdef __CYGWIN__
-# define C_PROFILE_SIGNAL SIGALRM
-# define C_PROFILE_TIMER  ITIMER_REAL
-#else
-# define C_PROFILE_SIGNAL SIGPROF
-# define C_PROFILE_TIMER  ITIMER_PROF
-#endif
-
 /* TODO: Include sys/select.h? Windows doesn't seem to have it... */
 #ifndef NO_POSIX_POLL
 #  include <poll.h>
@@ -83,8 +74,19 @@
 # include <sys/resource.h>
 # include <sys/wait.h>
 
+/* ITIMER_PROF is more precise, but Cygwin doesn't support it... */
+# ifdef __CYGWIN__
+#  define C_PROFILE_SIGNAL SIGALRM
+#  define C_PROFILE_TIMER  ITIMER_REAL
+# else
+#  define C_PROFILE_SIGNAL SIGPROF
+#  define C_PROFILE_TIMER  ITIMER_PROF
+# endif
+
 #else
 
+# define C_PROFILE_SIGNAL -1          /* Stupid way to avoid error */
+
 #ifdef ECOS
 #include <cyg/kernel/kapi.h>
 static C_TLS int timezone;
@@ -514,6 +516,7 @@ static void C_fcall really_remark(C_word *x) C_regparm;
 static C_word C_fcall intern0(C_char *name) C_regparm;
 static void C_fcall update_locative_table(int mode) C_regparm;
 static LF_LIST *find_module_handle(C_char *name);
+static void set_profile_timer(C_uword freq);
 static void take_profile_sample();
 
 static C_cpsproc(call_cc_wrapper) C_noret;
@@ -788,10 +791,12 @@ int CHICKEN_initialize(int heap, int stack, int symbols, void *toplevel)
   (void)C_randomize(C_fix(time(NULL)));
 
   if (profiling) {
-#ifdef HAVE_SIGACTION
+#ifndef C_NONUNIX
+# ifdef HAVE_SIGACTION
     C_sigaction(C_PROFILE_SIGNAL, &sa, NULL);
-#else
+# else
     C_signal(C_PROFILE_SIGNAL, global_signal_handler);
+# endif
 #endif
 
     profile_table = (PROFILE_BUCKET **)C_malloc(PROFILE_TABLE_SIZE * sizeof(PROFILE_BUCKET *));
@@ -1464,17 +1469,7 @@ C_word CHICKEN_run(void *toplevel)
   chicken_is_running = chicken_ran_once = 1;
   return_to_host = 0;
 
-  if(profiling) {
-    struct itimerval itv;
-
-    itv.it_value.tv_sec = profile_frequency / 1000000;
-    itv.it_value.tv_usec = profile_frequency % 1000000;
-    itv.it_interval.tv_sec = itv.it_value.tv_sec;
-    itv.it_interval.tv_usec = itv.it_value.tv_usec;
-
-    if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1)
-      panic(C_text("error setting timer for profiling"));
-  }
+  if(profiling) set_profile_timer(profile_frequency);
 
 #if C_STACK_GROWS_DOWNWARD
   C_stack_limit = (C_word *)((C_byte *)C_stack_pointer - stack_size);
@@ -1504,17 +1499,7 @@ C_word CHICKEN_run(void *toplevel)
     ((C_proc)C_restart_trampoline)(C_restart_c, p);
   }
 
-  if(profiling) {
-    struct itimerval itv;
-
-    itv.it_value.tv_sec = 0;
-    itv.it_value.tv_usec = 0;
-    itv.it_interval.tv_sec = itv.it_value.tv_sec;
-    itv.it_interval.tv_usec = itv.it_value.tv_usec;
-
-    if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1)
-      panic(C_text("error clearing timer for profiling"));
-  }
+  if(profiling) set_profile_timer(0);
 
   chicken_is_running = 0;
   return C_restore;
@@ -3866,6 +3851,46 @@ C_regparm void *C_fcall C_retrieve2_symbol_proc(C_word val, char *name)
   return C_fast_retrieve_proc(val);
 }
 
+#ifdef C_NONUNIX
+VOID CALLBACK win_timer(PVOID data_ignored, BOOLEAN wait_or_fired)
+{
+  if (profiling) take_profile_sample();
+}
+#endif
+
+static void set_profile_timer(C_uword freq)
+{
+#ifdef C_NONUNIX
+  static HANDLE timer = NULL;
+
+  if (freq == 0) {
+    assert(timer != NULL);
+    if (!DeleteTimerQueueTimer(NULL, timer, NULL)) goto error;
+    timer = NULL;
+  } else if (freq < 1000) {
+    panic(C_text("On Windows, sampling can only be done in milliseconds"));
+  } else {
+    if (!CreateTimerQueueTimer(&timer, NULL, win_timer, NULL, 0, freq/1000, 0))
+      goto error;
+  }
+#else
+  struct itimerval itv;
+
+  itv.it_value.tv_sec = freq / 1000000;
+  itv.it_value.tv_usec = freq % 1000000;
+  itv.it_interval.tv_sec = itv.it_value.tv_sec;
+  itv.it_interval.tv_usec = itv.it_value.tv_usec;
+
+  if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1) goto error;
+#endif
+
+  return;
+
+error:
+  if (freq == 0) panic(C_text("error clearing timer for profiling"));
+  else panic(C_text("error setting timer for profiling"));
+}
+
 /* Bump profile count for current top of trace buffer */
 static void take_profile_sample()
 {
@@ -9412,18 +9437,11 @@ C_word C_i_dump_statistical_profile()
   C_char *k1, *k2 = NULL;
   int n;
   double ms;
-  struct itimerval itv;
 
   assert(profiling);
   assert(profile_table != NULL);
 
-  itv.it_value.tv_sec = 0;
-  itv.it_value.tv_usec = 0;
-  itv.it_interval.tv_sec = itv.it_value.tv_sec;
-  itv.it_interval.tv_usec = itv.it_value.tv_usec;
-
-  if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1)
-    panic(C_text("error clearing timer for profiling"));
+  set_profile_timer(0);
 
   profiling = 0; /* In case a SIGPROF is delivered late */
   bp = profile_table;
-- 
2.1.4

From 3bd25d9033889b8482e95e3422547cfc70cb490b Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Sun, 6 Dec 2015 20:07:30 +0100
Subject: [PATCH 3/3] Fix statistical percentage chicken-profile output

Now both profiling types write a header in the first line to indicate
the type of file, so that chicken-profiles knows whether to take the
highest or the total of the run time as 100% when dividing.
---
 chicken-profile.scm | 24 +++++++++++++++++-------
 profiler.scm        |  8 +++++++-
 runtime.c           |  1 +
 3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/chicken-profile.scm b/chicken-profile.scm
index 63dd804..b54674a 100644
--- a/chicken-profile.scm
+++ b/chicken-profile.scm
@@ -152,15 +152,17 @@ EOF
       (error "invalid argument to -decimals option" arg)))
 
 (define (read-profile)
-  (let ((hash (make-hash-table eq?)))
-    (do ((line (read) (read)))
+  (let* ((hash (make-hash-table eq?))
+	 (header (read))
+	 (type (if (symbol? header) header 'instrumented)))
+    (do ((line (if (symbol? header) (read) header) (read)))
 	((eof-object? line))
       (hash-table-set!
        hash (first line)
        (map (lambda (x y) (and x y (+ x y)))
 	    (hash-table-ref/default hash (first line) '(0 0)) 
 	    (cdr line))))
-    (hash-table->alist hash)))
+    (cons type (hash-table->alist hash))))
 
 (define (format-string str cols #!optional right (padc #\space))
   (let* ((len (string-length str))
@@ -183,17 +185,25 @@ EOF
 
 (define (write-profile)
   (print "reading `" file "' ...\n")
-  (let* ((data0 (with-input-from-file file read-profile))
-	 (max-t (foldl (lambda (r t) (max r (third t))) 0 data0))
+  (let* ((type&data0 (with-input-from-file file read-profile))
+	 (type  (car type&data0))
+	 (data0 (cdr type&data0))
+	 ;; Instrumented profiling results in total runtime being
+	 ;; counted for the outermost "main" procedure, while
+	 ;; statistical counts time spent only inside the procedure
+	 ;; itself.  Ideally we'd have both, but that's tricky to do.
+	 (total-t (foldl (if (eq? type 'instrumented)
+			     (lambda (r t) (max r (third t)))
+			     (lambda (r t) (+ r (third t)))) 0 data0))
 	 (data (sort (map
 		      (lambda (t)
 			(append
 			 t
 			 (let ((c (second t)) ; count
-			       (t (third t))) ; total time
+			       (t (third t))) ; time tallied to procedure
 			   (list (or (and c (> c 0) (/ t c)) ; time / count
 				     0)
-				 (or (and (> max-t 0) (* (/ t max-t) 100)) ; % of max-time
+				 (or (and (> total-t 0) (* (/ t total-t) 100)) ; % of total-time
 				     0)
 				 ))))
 		      data0)
diff --git a/profiler.scm b/profiler.scm
index a4b4958..2a7bc63 100644
--- a/profiler.scm
+++ b/profiler.scm
@@ -35,6 +35,9 @@
 
 (define-foreign-variable profile-id int "C_getpid()")
 
+(define empty-file? (foreign-lambda* bool ((scheme-object p))
+		      "C_return(ftell(C_port_file(p)) == 0);"))
+
 (define-constant profile-info-entry-size 5)
 
 
@@ -121,7 +124,10 @@
 	(##sys#print "[debug] writing profile...\n" #f ##sys#standard-error) )
       (apply
        with-output-to-file ##sys#profile-name
-       (lambda () 
+       (lambda ()
+	 (when (empty-file? (current-output-port)) ; header needed?
+	   (write 'instrumented)
+	   (write-char #\newline))
 	 (for-each
 	  (lambda (vec)
 	    (let ([len (##sys#size vec)])
diff --git a/runtime.c b/runtime.c
index 8c09564..58b8f49 100644
--- a/runtime.c
+++ b/runtime.c
@@ -9455,6 +9455,7 @@ C_word C_i_dump_statistical_profile()
   if (fp == NULL)
     panic(C_text("could not write profile!"));
 
+  C_fputs(C_text("statistical\n"), fp);
   for(n = 0; n < PROFILE_TABLE_SIZE; ++n) {
     for(b = bp[ n ]; b != NULL; b = b2) {
       b2 = b->next;
-- 
2.1.4

From b1422f120990a660110624004e6cb6ac893a6b45 Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Sat, 2 Jan 2016 17:20:30 +0100
Subject: [PATCH 1/3] Add simple statistical profiler to runtime library

This enables collection of profiling data via statistical sampling to
every program built with CHICKEN.  It relies on trace information for
determining which procedure is running.  This also means it has a finer
granularity than the default instrumentation-based profiler.  This can
be an advantage or disadvantage depending on what you're trying to do.

Conflicts:
	chicken.h
	support.scm
---
 chicken.h                 |   1 +
 library.scm               |   2 +
 manual/Using the compiler |   6 +-
 runtime.c                 | 214 ++++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 215 insertions(+), 8 deletions(-)

diff --git a/chicken.h b/chicken.h
index aaa82fd..854bd95 100644
--- a/chicken.h
+++ b/chicken.h
@@ -1964,6 +1964,7 @@ C_fctexport time_t C_fcall C_seconds(C_long *ms) C_regparm;
 C_fctexport C_word C_fcall C_bignum_simplify(C_word big) C_regparm;
 C_fctexport C_word C_fcall C_allocate_scratch_bignum(C_word **ptr, C_word size, C_word negp, C_word initp) C_regparm;
 C_fctexport C_word C_fcall C_bignum_rewrap(C_word **p, C_word big) C_regparm;
+C_fctexport C_word C_i_dump_statistical_profile();
 C_fctexport C_word C_a_i_list(C_word **a, int c, ...);
 C_fctexport C_word C_a_i_string(C_word **a, int c, ...);
 C_fctexport C_word C_a_i_record(C_word **a, int c, ...);
diff --git a/library.scm b/library.scm
index 5203c52..6548982 100644
--- a/library.scm
+++ b/library.scm
@@ -4677,6 +4677,8 @@ EOF
   (when (##sys#fudge 37)		; -:H given?
     (##sys#print "\n" #f ##sys#standard-error)
     (##sys#dump-heap-state))
+  (when (##sys#fudge 45)		; -:p or -:P given?
+    (##core#inline "C_i_dump_statistical_profile"))
   (let loop ()
     (let ((tasks ##sys#cleanup-tasks))
       (set! ##sys#cleanup-tasks '())
diff --git a/manual/Using the compiler b/manual/Using the compiler
index 79b53c0..137736c 100644
--- a/manual/Using the compiler	
+++ b/manual/Using the compiler	
@@ -137,7 +137,7 @@ the source text should be read from standard input.
 ; -prelude EXPRESSIONS : Add {{EXPRESSIONS}} before all other toplevel expressions in the compiled file.  This option may be given multiple times. Processing of this option takes place before processing of {{-prologue}}.
 
 ; -profile : 
-; -accumulate-profile : Instruments the source code to count procedure calls and execution times. After the program terminates (either via an explicit {{exit}} or implicitly), profiling statistics are written to a file named {{PROFILE.<randomnumber>}}. Each line of the generated file contains a list with the procedure name, the number of calls and the time spent executing it. Use the {{chicken-profile}} program to display the profiling information in a more user-friendly form. Enter {{chicken-profile -help}} at the command line to get a list of available options. The {{-accumulate-profile}} option is similar to {{-profile}}, but the resulting profile information will be appended to any existing {{PROFILE}} file. {{chicken-profile}} will merge and sum up the accumulated timing information, if several entries for the same procedure calls exist. Only profiling information for global procedures will be collected.
+; -accumulate-profile : Instruments the source code to count procedure calls and execution times. After the program terminates (either via an explicit {{exit}} or implicitly), profiling statistics are written to a file named {{PROFILE.<randomnumber>}}. Each line of the generated file contains a list with the procedure name, the number of calls and the time spent executing it. Use the {{chicken-profile}} program to display the profiling information in a more user-friendly form. Enter {{chicken-profile -help}} at the command line to get a list of available options. The {{-accumulate-profile}} option is similar to {{-profile}}, but the resulting profile information will be appended to any existing {{PROFILE}} file. {{chicken-profile}} will merge and sum up the accumulated timing information, if several entries for the same procedure calls exist. Only profiling information for global procedures will be collected.  See the {{-:p}} option under [[#runtime-options|"Runtime options"]] below for statistical profiling support.
 
 ; -profile-name FILENAME : Specifies name of the generated profile information (which defaults to {{PROFILE.<randomnumber>}}. Implies {{-profile}}.
 
@@ -224,6 +224,10 @@ compiler itself) accept a small set of runtime options:
 
 ; {{-:o}} : Disables detection of stack overflows at run-time.
 
+; {{-:p}} : Enable collection of statistics for profiling purposes and write to PROFILE.{{pid}} on exit.  This functions at a granularity defined by the trace information in the binary and libraries: each traced function will show up in the output.  See the {{-profile}} compiler option for instrumentation-based profiling.  The {{PROFILE.pid}} format is compatible with the format generated by instrumentation-based profiling.
+
+; {{-:Pfreq}} : Same as {{-:p}} but set the sampling frequency in microseconds (default is 10000 microseconds or every 10 milliseconds).
+
 ; {{-:r}} : Writes trace output to stderr. This option has no effect with in files compiled with the {{-no-trace}} options.
 
 ; {{-:sNUMBER}} : Specifies stack size.
diff --git a/runtime.c b/runtime.c
index 25ba96a..6286ec8 100644
--- a/runtime.c
+++ b/runtime.c
@@ -63,6 +63,15 @@
 # define EOVERFLOW  0
 #endif
 
+/* ITIMER_PROF is more precise, but Cygwin doesn't support it... */
+#ifdef __CYGWIN__
+# define C_PROFILE_SIGNAL SIGALRM
+# define C_PROFILE_TIMER  ITIMER_REAL
+#else
+# define C_PROFILE_SIGNAL SIGPROF
+# define C_PROFILE_TIMER  ITIMER_PROF
+#endif
+
 /* TODO: Include sys/select.h? Windows doesn't seem to have it... */
 #ifndef NO_POSIX_POLL
 #  include <poll.h>
@@ -165,6 +174,7 @@ static C_TLS int timezone;
 #define TEMPORARY_STACK_SIZE	       4096
 #define STRING_BUFFER_SIZE             4096
 #define DEFAULT_MUTATION_STACK_SIZE    1024
+#define PROFILE_TABLE_SIZE             1024
 
 #define MAX_PENDING_INTERRUPTS         100
 
@@ -319,6 +329,14 @@ typedef struct hdump_bucket_struct
   struct hdump_bucket_struct *next;
 } HDUMP_BUCKET;
 
+typedef struct profile_bucket_struct
+{
+  C_char *key;
+  C_uword sample_count; /* Multiplied by profile freq = time spent */
+  C_uword call_count;   /* Distinct calls seen while sampling */
+  struct profile_bucket_struct *next;
+} PROFILE_BUCKET;
+
 
 /* Variables: */
 
@@ -375,7 +393,9 @@ C_TLS C_uword
   C_heap_growth,
   C_heap_shrinkage;
 C_TLS C_uword C_maximal_heap_size;
-C_TLS time_t C_startup_time_seconds;
+C_TLS time_t
+  C_startup_time_seconds,
+  profile_frequency = 10000;
 C_TLS char 
   **C_main_argv,
 #ifdef SEARCH_EXE_PATH
@@ -452,7 +472,9 @@ static C_TLS int
   chicken_ran_once,
   pass_serious_signals = 1,
   callback_continuation_level;
-static volatile C_TLS int serious_signal_occurred = 0;
+static volatile C_TLS int
+  serious_signal_occurred = 0,
+  profiling = 0;
 static C_TLS unsigned int
   mutation_count,
   tracked_mutation_count,
@@ -487,6 +509,7 @@ static C_TLS FINALIZER_NODE
 static C_TLS void *current_module_handle;
 static C_TLS int flonum_print_precision = FLONUM_PRINT_PRECISION;
 static C_TLS HDUMP_BUCKET **hdump_table;
+static C_TLS PROFILE_BUCKET **profile_table = NULL;
 static C_TLS int 
   pending_interrupts[ MAX_PENDING_INTERRUPTS ],
   pending_interrupts_count,
@@ -548,6 +571,7 @@ static void C_fcall really_remark(C_word *x) C_regparm;
 static C_word C_fcall intern0(C_char *name) C_regparm;
 static void C_fcall update_locative_table(int mode) C_regparm;
 static LF_LIST *find_module_handle(C_char *name);
+static void take_profile_sample();
 
 static C_cpsproc(call_cc_wrapper) C_noret;
 static C_cpsproc(call_cc_values_wrapper) C_noret;
@@ -785,12 +809,15 @@ int CHICKEN_initialize(int heap, int stack, int symbols, void *toplevel)
   dlopen_flags = 0;
 #endif
 
-  /* setup signal handlers */
-  if(!pass_serious_signals) {
 #ifdef HAVE_SIGACTION
     sa.sa_flags = 0;
     sigfillset(&sa.sa_mask); /* See note in C_establish_signal_handler() */
     sa.sa_handler = global_signal_handler;
+#endif
+
+  /* setup signal handlers */
+  if(!pass_serious_signals) {
+#ifdef HAVE_SIGACTION
     C_sigaction(SIGBUS, &sa, NULL);
     C_sigaction(SIGFPE, &sa, NULL);
     C_sigaction(SIGILL, &sa, NULL);
@@ -834,6 +861,21 @@ int CHICKEN_initialize(int heap, int stack, int symbols, void *toplevel)
   callback_continuation_level = 0;
   gc_ms = 0;
   (void)C_randomize(C_fix(time(NULL)));
+
+  if (profiling) {
+#ifdef HAVE_SIGACTION
+    C_sigaction(C_PROFILE_SIGNAL, &sa, NULL);
+#else
+    C_signal(C_PROFILE_SIGNAL, global_signal_handler);
+#endif
+
+    profile_table = (PROFILE_BUCKET **)C_malloc(PROFILE_TABLE_SIZE * sizeof(PROFILE_BUCKET *));
+
+    if(profile_table == NULL)
+      panic(C_text("out of memory - can not allocate profile table"));
+
+    C_memset(profile_table, 0, sizeof(PROFILE_BUCKET *) * PROFILE_TABLE_SIZE);
+  }
   
   /* create k to invoke code for system-startup: */
   k0 = (C_SCHEME_BLOCK *)C_align((C_word)C_fromspace_top);
@@ -1169,7 +1211,10 @@ void global_signal_handler(int signum)
   }
 #endif
 
-  C_raise_interrupt(signal_mapping_table[ signum ]);
+  /* TODO: Make full use of sigaction: check that /our/ timer expired */
+  if (signum == C_PROFILE_SIGNAL && profiling) take_profile_sample();
+  else C_raise_interrupt(signal_mapping_table[ signum ]);
+
 #ifndef HAVE_SIGACTION
   /* not necessarily needed, but older UNIXen may not leave the handler installed: */
   C_signal(signum, global_signal_handler);
@@ -1335,6 +1380,8 @@ void CHICKEN_parse_command_line(int argc, char *argv[], C_word *heap, C_word *st
 		 " -:hsPERCENTAGE   set heap shrink percentage\n"
 		 " -:hSIZE          set fixed heap size\n"
 		 " -:r              write trace output to stderr\n"
+		 " -:p              collect statistical profile and write to file at exit\n"
+		 " -:PFREQ          like -:p, specifying sampling frequency in us (default: 10000)\n"
 		 " -:sSIZE          set nursery (stack) size\n"
 		 " -:tSIZE          set symbol-table size\n"
                  " -:fSIZE          set maximal number of pending finalizers\n"
@@ -1429,6 +1476,15 @@ void CHICKEN_parse_command_line(int argc, char *argv[], C_word *heap, C_word *st
 	  C_enable_gcweak = 1;
 	  break;
 
+	case 'P':
+	  profiling = 1;
+	  profile_frequency = arg_val(ptr);
+          goto next;
+
+	case 'p':
+	  profiling = 1;
+          break;
+
 	case 'r':
 	  show_trace = 1;
 	  break;
@@ -1497,6 +1553,18 @@ C_word CHICKEN_run(void *toplevel)
   chicken_is_running = chicken_ran_once = 1;
   return_to_host = 0;
 
+  if(profiling) {
+    struct itimerval itv;
+
+    itv.it_value.tv_sec = profile_frequency / 1000000;
+    itv.it_value.tv_usec = profile_frequency % 1000000;
+    itv.it_interval.tv_sec = itv.it_value.tv_sec;
+    itv.it_interval.tv_usec = itv.it_value.tv_usec;
+
+    if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1)
+      panic(C_text("error setting timer for profiling"));
+  }
+
 #if C_STACK_GROWS_DOWNWARD
   C_stack_limit = (C_word *)((C_byte *)C_stack_pointer - stack_size);
 #else
@@ -1526,6 +1594,18 @@ C_word CHICKEN_run(void *toplevel)
     ((C_proc)C_restart_trampoline)(C_restart_c, p);
   }
 
+  if(profiling) {
+    struct itimerval itv;
+
+    itv.it_value.tv_sec = 0;
+    itv.it_value.tv_usec = 0;
+    itv.it_interval.tv_sec = itv.it_value.tv_sec;
+    itv.it_interval.tv_usec = itv.it_value.tv_usec;
+
+    if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1)
+      panic(C_text("error clearing timer for profiling"));
+  }
+
   chicken_is_running = 0;
   return C_restore;
 }
@@ -4205,6 +4285,59 @@ C_regparm void *C_fcall C_retrieve2_symbol_proc(C_word val, char *name)
   return C_fast_retrieve_proc(val);
 }
 
+/* Bump profile count for current top of trace buffer */
+static void take_profile_sample()
+{
+  PROFILE_BUCKET **bp, *b;
+  C_char *key;
+  TRACE_INFO *tb;
+  /* To count distinct calls of a procedure, remember last call */
+  static C_char *prev_key = NULL;
+  static TRACE_INFO *prev_tb = NULL;
+
+  /* trace_buffer_top points *beyond* the topmost entry: Go back one */
+  if (trace_buffer_top == trace_buffer) {
+    if (!trace_buffer_full) return; /* No data yet */
+    tb = trace_buffer_limit - 1;
+  } else {
+    tb = trace_buffer_top - 1;
+  }
+
+  key = tb->raw;
+  if (key == NULL) return; /* May happen while in C_trace() */
+
+  /* We could also just hash the pointer but that's a bit trickier */
+  bp = profile_table + hash_string(C_strlen(key), key, PROFILE_TABLE_SIZE, 0, 0);
+  b = *bp;
+
+  /* First try to find pre-existing item in hash table */
+  while(b != NULL) {
+    if(b->key == key) {
+      b->sample_count++;
+      if (prev_key != key && prev_tb != tb)
+        b->call_count++;
+      goto done;
+    }
+    else b = b->next;
+  }
+
+  /* Not found, allocate a new item and use it as bucket's new head */
+  b = (PROFILE_BUCKET *)C_malloc(sizeof(PROFILE_BUCKET));
+
+  if(b == NULL)
+    panic(C_text("out of memory - cannot allocate profile table-bucket"));
+
+  b->next = *bp;
+  b->key = key;
+  *bp = b;
+  b->sample_count = 1;
+  b->call_count = 1;
+
+done:
+  prev_tb = tb;
+  prev_key = key;
+}
+
 
 C_regparm void C_fcall C_trace(C_char *name)
 {
@@ -4288,7 +4421,9 @@ C_char *C_dump_trace(int start)
 
 C_regparm void C_fcall C_clear_trace_buffer(void)
 {
-  int i;
+  int i, old_profiling = profiling;
+
+  profiling = 0;
 
   if(trace_buffer == NULL) {
     if(C_trace_buffer_size < MIN_TRACE_BUFFER_SIZE)
@@ -4309,15 +4444,18 @@ C_regparm void C_fcall C_clear_trace_buffer(void)
     trace_buffer[ i ].cooked2 = C_SCHEME_FALSE;
     trace_buffer[ i ].thread = C_SCHEME_FALSE;
   }
+  profiling = old_profiling;
 }
 
 C_word C_resize_trace_buffer(C_word size) {
-  int old_size = C_trace_buffer_size;
+  int old_size = C_trace_buffer_size, old_profiling = profiling;
   assert(trace_buffer);
+  profiling = 0;
   free(trace_buffer);
   trace_buffer = NULL;
   C_trace_buffer_size = C_unfix(size);
   C_clear_trace_buffer();
+  profiling = old_profiling;
   return(C_fix(old_size));
 }
 
@@ -4836,6 +4974,9 @@ C_regparm C_word C_fcall C_fudge(C_word fudge_factor)
   case C_fix(44):  /* whether debugger is active */
     return C_mk_bool(C_debugging);
 
+  case C_fix(45):  /* Whether we're currently profiling */
+    return C_mk_bool(profiling);
+
   default: return C_SCHEME_UNDEFINED;
   }
 }
@@ -13038,6 +13179,65 @@ C_i_get_keyword(C_word kw, C_word args, C_word def)
   return def;
 }
 
+C_word C_i_dump_statistical_profile()
+{
+  PROFILE_BUCKET *b, *b2, **bp;
+  FILE *fp;
+  C_char *k1, *k2 = NULL;
+  int n;
+  double ms;
+  struct itimerval itv;
+
+  assert(profiling);
+  assert(profile_table != NULL);
+
+  itv.it_value.tv_sec = 0;
+  itv.it_value.tv_usec = 0;
+  itv.it_interval.tv_sec = itv.it_value.tv_sec;
+  itv.it_interval.tv_usec = itv.it_value.tv_usec;
+
+  if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1)
+    panic(C_text("error clearing timer for profiling"));
+
+  profiling = 0; /* In case a SIGPROF is delivered late */
+  bp = profile_table;
+
+  C_snprintf(buffer, STRING_BUFFER_SIZE, C_text("PROFILE.%d"), C_getpid());
+
+  if(debug_mode)
+    C_dbg(C_text("debug"), C_text("dumping statistical profile to `%s'...\n"), buffer);
+
+  fp = C_fopen(buffer, "w");
+  if (fp == NULL)
+    panic(C_text("could not write profile!"));
+
+  for(n = 0; n < PROFILE_TABLE_SIZE; ++n) {
+    for(b = bp[ n ]; b != NULL; b = b2) {
+      b2 = b->next;
+
+      k1 = b->key;
+      C_fputs(C_text("(|"), fp);
+      /* Dump raw C string as if it were a symbol */
+      while((k2 = C_strpbrk(k1, C_text("\\|"))) != NULL) {
+        C_fwrite(k1, 1, k2-k1, fp);
+        C_fputc('\\', fp);
+        C_fputc(*k2, fp);
+        k1 = k2+1;
+      }
+      C_fputs(k1, fp);
+      ms = (double)b->sample_count * (double)profile_frequency / 1000.0;
+      C_fprintf(fp, C_text("| " UWORD_COUNT_FORMAT_STRING " %lf)\n"),
+                b->call_count, ms);
+      C_free(b);
+    }
+  }
+
+  C_fclose(fp);
+  C_free(profile_table);
+  profile_table = NULL;
+
+  return C_SCHEME_UNDEFINED;
+}
 
 void C_ccall C_dump_heap_state(C_word c, C_word *av)
 {
-- 
2.1.4

From 9f9772c99a2feecd0afc539481ae127090b62bd8 Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Sat, 2 Jan 2016 17:20:58 +0100
Subject: [PATCH 2/3] Support profiling on Windows with native timers

In MingW there's no setitimer support, so we'll have to use a native
Windows API.  This API unfortunately only supports millisecond precision.
The CreateWaitableTimer API seems like it supports better precision, but
it requires that you wait for it using WaitFor{Single,Multiple}Object.
---
 runtime.c | 100 ++++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 59 insertions(+), 41 deletions(-)

diff --git a/runtime.c b/runtime.c
index 6286ec8..8e171cf 100644
--- a/runtime.c
+++ b/runtime.c
@@ -63,15 +63,6 @@
 # define EOVERFLOW  0
 #endif
 
-/* ITIMER_PROF is more precise, but Cygwin doesn't support it... */
-#ifdef __CYGWIN__
-# define C_PROFILE_SIGNAL SIGALRM
-# define C_PROFILE_TIMER  ITIMER_REAL
-#else
-# define C_PROFILE_SIGNAL SIGPROF
-# define C_PROFILE_TIMER  ITIMER_PROF
-#endif
-
 /* TODO: Include sys/select.h? Windows doesn't seem to have it... */
 #ifndef NO_POSIX_POLL
 #  include <poll.h>
@@ -83,8 +74,19 @@
 # include <sys/resource.h>
 # include <sys/wait.h>
 
+/* ITIMER_PROF is more precise, but Cygwin doesn't support it... */
+# ifdef __CYGWIN__
+#  define C_PROFILE_SIGNAL SIGALRM
+#  define C_PROFILE_TIMER  ITIMER_REAL
+# else
+#  define C_PROFILE_SIGNAL SIGPROF
+#  define C_PROFILE_TIMER  ITIMER_PROF
+# endif
+
 #else
 
+# define C_PROFILE_SIGNAL -1          /* Stupid way to avoid error */
+
 #ifdef ECOS
 #include <cyg/kernel/kapi.h>
 static C_TLS int timezone;
@@ -571,6 +573,7 @@ static void C_fcall really_remark(C_word *x) C_regparm;
 static C_word C_fcall intern0(C_char *name) C_regparm;
 static void C_fcall update_locative_table(int mode) C_regparm;
 static LF_LIST *find_module_handle(C_char *name);
+static void set_profile_timer(C_uword freq);
 static void take_profile_sample();
 
 static C_cpsproc(call_cc_wrapper) C_noret;
@@ -863,10 +866,12 @@ int CHICKEN_initialize(int heap, int stack, int symbols, void *toplevel)
   (void)C_randomize(C_fix(time(NULL)));
 
   if (profiling) {
-#ifdef HAVE_SIGACTION
+#ifndef C_NONUNIX
+# ifdef HAVE_SIGACTION
     C_sigaction(C_PROFILE_SIGNAL, &sa, NULL);
-#else
+# else
     C_signal(C_PROFILE_SIGNAL, global_signal_handler);
+# endif
 #endif
 
     profile_table = (PROFILE_BUCKET **)C_malloc(PROFILE_TABLE_SIZE * sizeof(PROFILE_BUCKET *));
@@ -1553,17 +1558,7 @@ C_word CHICKEN_run(void *toplevel)
   chicken_is_running = chicken_ran_once = 1;
   return_to_host = 0;
 
-  if(profiling) {
-    struct itimerval itv;
-
-    itv.it_value.tv_sec = profile_frequency / 1000000;
-    itv.it_value.tv_usec = profile_frequency % 1000000;
-    itv.it_interval.tv_sec = itv.it_value.tv_sec;
-    itv.it_interval.tv_usec = itv.it_value.tv_usec;
-
-    if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1)
-      panic(C_text("error setting timer for profiling"));
-  }
+  if(profiling) set_profile_timer(profile_frequency);
 
 #if C_STACK_GROWS_DOWNWARD
   C_stack_limit = (C_word *)((C_byte *)C_stack_pointer - stack_size);
@@ -1594,17 +1589,7 @@ C_word CHICKEN_run(void *toplevel)
     ((C_proc)C_restart_trampoline)(C_restart_c, p);
   }
 
-  if(profiling) {
-    struct itimerval itv;
-
-    itv.it_value.tv_sec = 0;
-    itv.it_value.tv_usec = 0;
-    itv.it_interval.tv_sec = itv.it_value.tv_sec;
-    itv.it_interval.tv_usec = itv.it_value.tv_usec;
-
-    if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1)
-      panic(C_text("error clearing timer for profiling"));
-  }
+  if(profiling) set_profile_timer(0);
 
   chicken_is_running = 0;
   return C_restore;
@@ -4285,6 +4270,46 @@ C_regparm void *C_fcall C_retrieve2_symbol_proc(C_word val, char *name)
   return C_fast_retrieve_proc(val);
 }
 
+#ifdef C_NONUNIX
+VOID CALLBACK win_timer(PVOID data_ignored, BOOLEAN wait_or_fired)
+{
+  if (profiling) take_profile_sample();
+}
+#endif
+
+static void set_profile_timer(C_uword freq)
+{
+#ifdef C_NONUNIX
+  static HANDLE timer = NULL;
+
+  if (freq == 0) {
+    assert(timer != NULL);
+    if (!DeleteTimerQueueTimer(NULL, timer, NULL)) goto error;
+    timer = NULL;
+  } else if (freq < 1000) {
+    panic(C_text("On Windows, sampling can only be done in milliseconds"));
+  } else {
+    if (!CreateTimerQueueTimer(&timer, NULL, win_timer, NULL, 0, freq/1000, 0))
+      goto error;
+  }
+#else
+  struct itimerval itv;
+
+  itv.it_value.tv_sec = freq / 1000000;
+  itv.it_value.tv_usec = freq % 1000000;
+  itv.it_interval.tv_sec = itv.it_value.tv_sec;
+  itv.it_interval.tv_usec = itv.it_value.tv_usec;
+
+  if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1) goto error;
+#endif
+
+  return;
+
+error:
+  if (freq == 0) panic(C_text("error clearing timer for profiling"));
+  else panic(C_text("error setting timer for profiling"));
+}
+
 /* Bump profile count for current top of trace buffer */
 static void take_profile_sample()
 {
@@ -13186,18 +13211,11 @@ C_word C_i_dump_statistical_profile()
   C_char *k1, *k2 = NULL;
   int n;
   double ms;
-  struct itimerval itv;
 
   assert(profiling);
   assert(profile_table != NULL);
 
-  itv.it_value.tv_sec = 0;
-  itv.it_value.tv_usec = 0;
-  itv.it_interval.tv_sec = itv.it_value.tv_sec;
-  itv.it_interval.tv_usec = itv.it_value.tv_usec;
-
-  if (setitimer(C_PROFILE_TIMER, &itv, NULL) == -1)
-    panic(C_text("error clearing timer for profiling"));
+  set_profile_timer(0);
 
   profiling = 0; /* In case a SIGPROF is delivered late */
   bp = profile_table;
-- 
2.1.4

From 030bc9eb9e8e66703736b90460a7688ca48e0756 Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Sat, 2 Jan 2016 17:24:37 +0100
Subject: [PATCH 3/3] Fix statistical percentage chicken-profile output

Now both profiling types write a header in the first line to indicate
the type of file, so that chicken-profiles knows whether to take the
highest or the total of the run time as 100% when dividing.

Conflicts:
	chicken-profile.scm
---
 chicken-profile.scm | 24 +++++++++++++++++-------
 profiler.scm        |  8 +++++++-
 runtime.c           |  1 +
 3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/chicken-profile.scm b/chicken-profile.scm
index 61ec005..6534f8f 100644
--- a/chicken-profile.scm
+++ b/chicken-profile.scm
@@ -159,8 +159,10 @@ EOF
   (make-vector symbol-table-size '()))
 
 (define (read-profile)
-  (let ((hash (make-symbol-table)))
-    (do ((line (read) (read)))
+  (let* ((hash (make-symbol-table))
+	 (header (read))
+	 (type (if (symbol? header) header 'instrumented)))
+    (do ((line (if (symbol? header) (read) header) (read)))
 	((eof-object? line))
       (##sys#hash-table-set!
        hash (first line)
@@ -172,7 +174,7 @@ EOF
        (lambda (sym counts)
 	 (set! alist (alist-cons sym counts alist)))
        hash)
-      alist)))
+      (cons type alist))))
 
 (define (format-string str cols #!optional right (padc #\space))
   (let* ((len (string-length str))
@@ -195,17 +197,25 @@ EOF
 
 (define (write-profile)
   (print "reading `" file "' ...\n")
-  (let* ((data0 (with-input-from-file file read-profile))
-	 (max-t (foldl (lambda (r t) (max r (third t))) 0 data0))
+  (let* ((type&data0 (with-input-from-file file read-profile))
+	 (type  (car type&data0))
+	 (data0 (cdr type&data0))
+	 ;; Instrumented profiling results in total runtime being
+	 ;; counted for the outermost "main" procedure, while
+	 ;; statistical counts time spent only inside the procedure
+	 ;; itself.  Ideally we'd have both, but that's tricky to do.
+	 (total-t (foldl (if (eq? type 'instrumented)
+			     (lambda (r t) (max r (third t)))
+			     (lambda (r t) (+ r (third t)))) 0 data0))
 	 (data (sort (map
 		      (lambda (t)
 			(append
 			 t
 			 (let ((c (second t)) ; count
-			       (t (third t))) ; total time
+			       (t (third t))) ; time tallied to procedure
 			   (list (or (and c (> c 0) (/ t c)) ; time / count
 				     0)
-				 (or (and (> max-t 0) (* (/ t max-t) 100)) ; % of max-time
+				 (or (and (> total-t 0) (* (/ t total-t) 100)) ; % of total-time
 				     0)
 				 ))))
 		      data0)
diff --git a/profiler.scm b/profiler.scm
index 621f4ed..8586d2d 100644
--- a/profiler.scm
+++ b/profiler.scm
@@ -35,6 +35,9 @@
 
 (define-foreign-variable profile-id int "C_getpid()")
 
+(define empty-file? (foreign-lambda* bool ((scheme-object p))
+		      "C_return(ftell(C_port_file(p)) == 0);"))
+
 (define-constant profile-info-entry-size 5)
 
 
@@ -122,7 +125,10 @@
 	(##sys#print "[debug] writing profile...\n" #f ##sys#standard-error) )
       (apply
        with-output-to-file ##sys#profile-name
-       (lambda () 
+       (lambda ()
+	 (when (empty-file? (current-output-port)) ; header needed?
+	   (write 'instrumented)
+	   (write-char #\newline))
 	 (for-each
 	  (lambda (vec)
 	    (let ([len (##sys#size vec)])
diff --git a/runtime.c b/runtime.c
index 8e171cf..230603f 100644
--- a/runtime.c
+++ b/runtime.c
@@ -13229,6 +13229,7 @@ C_word C_i_dump_statistical_profile()
   if (fp == NULL)
     panic(C_text("could not write profile!"));
 
+  C_fputs(C_text("statistical\n"), fp);
   for(n = 0; n < PROFILE_TABLE_SIZE; ++n) {
     for(b = bp[ n ]; b != NULL; b = b2) {
       b2 = b->next;
-- 
2.1.4

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Chicken-hackers mailing list
Chicken-hackers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-hackers

Reply via email to