I found an issue with my use of setFinal to free allocated strings that was causing a memory leak. I hadn't appreciated a behavioural detail of setFinal, though it seems obvious now I know. In particular, for setFinal f v, it is (the C value referenced by) v at the time f is called, not at the time f is registered using setFinal, that is passed to f.

To help me understand exactly what is going on, I wrote some notes on setFinal covering matters in detail - see attached. I don't know how accurate this is though. Although the detail is, perhaps, excessive for some users, I found a low-level description quite useful.

Phil

Notes on setFinal, Phil Clayton, 2011-11-18

= Description =

setFinal f v registers the C function f as the finaliser for the volatile v.  As
a result, f is called when v is garbage-collected, with (the C value referenced
by) v passed as the argument to f.

Importantly, the argument to f is (the C value referenced by) v at the time f is
called, not at the time f is registered using setFinal.  Therefore, as a general
rule, a volatile that has a finaliser should not be reused for other purposes.


= Implementation detail =

For each volatile v, there is an associated entry, vol, in the vol array that
references a C-space value, x.  A finaliser f is associated with vol, not the
value of x.  It is helpful to see the relationship between SML heap values, the
vol array and the C-space:

    SML         VOL ARRAY               C SPACE

    v   <---->  vol
                  C_finaliser   ----->  void f (void *)
                  C_pointer     ----->  x

In the Poly/ML implementation (foreign.cpp), each entry in the vol array is a
C struct that includes

  void *C_pointer
  void (*C_finaliser) (void *)

When v is garbage-collected, C_finaliser is (effectively) called with argument
*(void **) C_pointer, that is, f is called with argument x.  Importantly, f is
passed the value of x at the time of the call, not at the time f was registered
with setFinal.


= Warning =

As the C-space value referenced by a volatile can be updated, in particular,
between registering and calling a finaliser, there is considerable scope for
subtle bugs.  Typically, finalisers are used to free allocated memory so such
bugs would include memory leaks (free not called) or assertion failures/memory
access faults (attempt to free non-allocated memory).

The onus is on the application to ensure that a volatile references the C-space
value intended for the finaliser.  The C-space value referenced by a volatile
can be updated either by SML code (using CInterface.assign) or by C code (which
has been passed a reference to the C-space value).  An example of the latter is
shown below, with an explanation of why it goes wrong and a work around.


== Example of memory leak ==

Consider the following C function whose parameter could be considered as an
"in out" char *, achieved by an extra level of indirection.  The function
exports a new string that is a copy of the imported string:

  void dup (char **ptrstr) {
    *ptrstr = strdup (*ptrstr)
  }

This is imported into SML by

  val dup = call1 (get_sym "<libname>.so" "dup") POINTER VOID

Now consider the following program, where volatile v1 is an allocated string
which we duplicate using the above function dup.  Both the original string and
the duplicated string must be freed (using C function free_sym) when no longer
reachable, so a finaliser is added for both:

  setFinal free_sym v1  (* free original string when not reachable *)

  val v2 = address v1   (* v2 is reference to v1 *)

  dup v2                (* dup updates reference: v1 is duplicated string *)

  setFinal free_sym v1  (* free duplicated string when not reachable *)

Upon completion, v1 references the duplicated the string.  Consequently, the
original string will never be freed.  When v1 is garbage-collected, the
duplicated string will be freed.  Registering free_sym as a finaliser for v1
after dup v2 has no effect, as free_sym was already a finaliser for v1.  In this
example, we simply get a silent memory leak.

The following sections show what happens after each step.


=== Step 1 ===

Set free_sym as the finaliser for volatile v1:

  setFinal free_sym v1

Afterwards, we have the following in memory:

    SML         VOL ARRAY               C SPACE

    v1  <---->  vol1
                  C_finaliser   ----->  void free_sym (void *)
                  C_pointer     ----->  char *p1
                                           |
                  ...                      `->  char a0
                                                char a1
                                                char a2
                                                ...

(The C names p1, a0, a1, a2 are introduced for explanatory purposes.)


=== Step 2 ===

Take the address of v1:

  val v2 = address v1

We now have:

    SML         VOL ARRAY               C SPACE

    v2  <---->  vol2
                  C_pointer     ----->  char **p2
                                           |
                                           |
                                           |
    v1  <---->  vol1                       |
                  C_finaliser   ----->     |    void free_sym (void *)
                  C_pointer     ----->     `->  char *p1
                  C_finaliser                      |
                  ...                              `->  char a0
                                                        char a1
                                                        char a2
                                                        ...


=== Step 3 ===

Evaluate

  dup v2

The C function dup is called with ptrstr = p2.  Therefore, the assignment to
*ptrstr overwrites the C space memory location p1.  We now have:

    SML         VOL ARRAY               C SPACE

    v2  <---->  vol2
                  C_pointer     ----->  char **p2
                                           |
                                           |
                                           |
    v1  <---->  vol1                       |
                  C_finaliser   ----->     |    void free_sym (void *)
                  C_pointer     ----->     `->  char *p1
                  ...                              |
                                                   |    char a0
                                                   |    char a1
                                                   |    char a2
                                                   |    ...
                                                   |
                                                   `->  char b0
                                                        char b1
                                                        char b2
                                                        ...

Consequently, when the finaliser function is called, *C_pointer is no longer 
the original string but the the duplicate string.  It can be seen that the 
string
with characters a0, a1, a2, ... is no longer referenced, so will never be freed.


=== Step 4 ===

  setFinal free_sym v1

has no effect because vol1 already has free_sym set as the finaliser.


=== Work around ===

After step 1, it is necessary to create a copy of v1 as follows:

  v3 = alloc 1 voidStar
  assign voidStar v3 v1

Then we have:

    SML         VOL ARRAY               C SPACE

    v3  <---->  vol3
                  C_pointer     ----->  char *p3
                                           |
                                           |
                                           |
    v1  <---->  vol1                       |
                  C_finaliser   ----->     |    void free_sym (void *)
                  C_pointer     ----->     |    char *p1
                                           |       |
                  ...                      `->     `->  char a0
                                                        char a1
                                                        char a2
                                                        ...

Now we can safely continue with v3 instead of v1 because p3 (above), not p1, is
overwritten by the assignment in C to *ptrstr, inside the call to dup:

  val v2 = address v3

  dup v2

  setFinal free_sym v3
_______________________________________________
polyml mailing list
polyml@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/polyml

Reply via email to