linprocfs...

Bruce Evans Tue, 02 May 2017 06:32:16 -0700

On Tue, 2 May 2017, Konstantin Belousov wrote:

On Tue, May 02, 2017 at 09:25:40PM +1000, Bruce Evans wrote:

On Tue, 2 May 2017, Konstantin Belousov wrote:

On Mon, May 01, 2017 at 06:25:11PM +1000, Bruce Evans wrote:

XX Index: subr_counter.c
XX ===================================================================
XX --- subr_counter.c   (revision 317604)
XX +++ subr_counter.c   (working copy)
XX @@ -78,11 +78,15 @@
XX  sysctl_handle_counter_u64(SYSCTL_HANDLER_ARGS)
XX  {
XX      uint64_t out;
XX +    uint32_t out32;
XX      int error;
XX
XX      out = counter_u64_fetch(*(counter_u64_t *)arg1);
XX
XX -    error = SYSCTL_OUT(req, &out, sizeof(uint64_t));
XX +    if (req->oldlen == 4 && (out32 = out) == out)
XX +            error = SYSCTL_OUT(req, &out32, sizeof(out32));
XX +    else
XX +            error = SYSCTL_OUT(req, &out, sizeof(out));
XX
XX      if (error || !req->newptr)
XX              return (error);

This does the minimum necessary to fix the current problem.

This works until the counters become too large for a u_int.  There
could be an option to get truncation with no error, but that would
require an API change, so applications should be required to do the
truncation for themself if that is what they want.


Yes, this is approximately what I consider to be enough fix, but with
some details which I do not like and which did not worked well on my
test machine with enough memory to generate lot of increments.

Below is what I consider to be good enough.  I will proceed with this
version unless very strong objections appear.


I prefer my version.  It is the same as for sysctl_bufspace(), except
for simplifications and bug fixes that are easier because the types
are unsigned.

Your version is especially broken for the case of a lot of increments.

You consider the 'differentiators' as the main users, but I only want
systat and top to not abort, I have no intent of making them work
for updated counters if overflown.  Instead, I want something human-visible
so that person looking at raw counters values using old tools detected the
failure.


Clamping doesn't work for that either.  Truncating either just works using
differences like it has done for 30 years or so, or gives weirder values
that clamping.

Arguably, the API is that the counters may wrap, and applications must
handle this as well as possible by sampling them more often than they
wrap, so that only the initial values might be wrong.  Applications should
also have attempt to guess when the initial values are wrong, and print
hints about this that are more visible than printing blindly wrapped
value.

Clamping breaks many cases in systat -v which just needs unsigned counters
to calculate small differences.  Not many counters overflow faster than
the refresh interval in systat or top.  These cases used to appear to
work perfectly for most counters when the kernel did blind truncations.
Clamping breaks the API.  Not returing ENOMEM breaks the API.

In my version, this is handled as specified by the API by truncating and
returning ENOMEM.  Then:
- top sees the ENOMEM and mishandles it by aborting
- systat sees the ENOMEM (and also sees a short count, if it has the newer
   breakage of hard-coding 64-bit counters instead of u_int ones, and is
   run on a kernel with u_int counters) and mishandles it better by
   printing an error message but not aborting.  Thus it appears to work
   just as well as when the kernel did blind truncation, except for the
   error message.

ENOMEM is, of course, the situation which I want to avoid.


Then you have to return no error, but truncate the value instead of
clamping.  Anything else is incompatible.

To find the buggy applications, it is better to return a new errno, not
ENOMEM.  EOVERFLOW resulted in top aborting with a useful error message.

...
+       out = counter_u64_fetch(*(counter_u64_t *)arg1);
+#ifdef COMPAT_FREEBSD11
+       if (req->oldlen == sizeof(uint32_t)) {


sizeof(uint32_t) is an obfuscated spelling of '4'.  The size is already
hard-coded in '32'.  This depends on sizeof(char) being 1.  It is, and
won't change.  POSIX started specifying this in 2001, and the mere
existence of uint8_t probably requires 8-bit chars since u_char must
be the smallest type of an object.  This depends on the size of a u_int
being 32 bits/4 bytes in the old ABI.  This won't change, and is already
hard-coded as 32, as it must be because sizeof(u_int) may change.

I prefer the sizeof over 4.


sizeof is really bloated when you handle all the sizes as in my larger
patch.  The basic sizes are 1, 2, 4 and 8 bytes (and possibly a larger
intmax_t), and it is a detail that the sizes in the integer types are
in bits.  However, once we have set up the variables to match the sizes
in bytes, it is not so bad to use sizeof(var) instead of sizeof(typename).
The size is then hard-coded in only 1 place.  That works here too:
use sizeof(out32).

+               out1 = 0xffffffff;
+               if (out < out1)
+                       out1 = out;


This does the clamping.  Clamping is probably better expressed as:

                out1 = MIN(out, 0xffffffff)

Changed as well.


I forgot to mention that I don't really like MIN().  It is an unsafe macro
with ugly spelling indicating that it is unsafe.  4.4BSD removed it in the
kernel and tried to replace it by the imin() family of inline functions, but
this API is hard to use because it is not type-generic.  The kernel was
reinfected with MIN() via contrib'ed code using home-made versions of MIN().
uqmin() would work here, but quad_t is another mistake...

+               return (SYSCTL_OUT(req, &out1, sizeof(uint32_t)));


sizeof(var) works even better here when it is used above.


Here sizeof(uint32_t) is an obfuscated spelling of sizeof(out1).  I think
the sizeof() is justified in this case, to limit the hard-coding to of
sizes to 1 place.  Truncation by assigment to the destination variable also
helps limit this
    (sysctl_bufspace() could also be improved by truncation by assignment.
    Its integers are signed, so the result is implementation-defined, but
    we know that it is benign 2's complement and don't really care what it
    is provided it is reasonable.  This is better than laboriously checking
    the lower limit or neglecting to check it).
Here the limit 0xffffffff also depends on the size.

Yes, there is no UINT32_MAX, and never will.


You mean that you don't want to use it here.

Of course, UINT32_MAX does exist, and is quite delicate due to
complications to give it the correct type.  It must have the type of
its default promotion, and this depends on the size of int and long.
So it is defined in an MD file, and has an unnecessary U suffix on
i386.  This contrasts with UINT16_MAX, which is also defined in an MD
file but has type signed int on all arches and must be defined without
a U suffix, since all arches have more than 16 bits in it.  There is
also the UINT32_C() macro which has to append U on x86 since smaller
constants or the same constant in decimal would not be naturally
unsigned.

+       }
+#endif
+       return (SYSCTL_OUT(req, &out, sizeof(uint64_t)));
+}
+
#define VM_STATS(parent, var, descr) \
-    SYSCTL_COUNTER_U64(parent, OID_AUTO, var, CTLFLAG_RD, &vm_cnt.var, descr)
+    SYSCTL_OID(parent, OID_AUTO, var, CTLTYPE_U64 | CTLFLAG_MPSAFE | \
+    CTLFLAG_RD, &vm_cnt.var, 0, sysctl_handle_vmstat, "QU", descr);


I see a new problem, shared with my version.  The sysctl data says that this
sysctl is pure 64 bits, but it can return 32 bits.  The sysctl data just
can't describe the variation.  I guess this is not much a problem, since it
takes magic to read the sysctl data from the kernel and not many programs
except sysctl know how to do it.  There programs will see the pure 64
bits and only attempt to use 64 bit types.


It can return 32bit only on improper use, which we allow for ABI compat.
So I do not see this as a problem either.


I'm trying to fix the API.

Another bug in the public API is that there is no way to probe for the
sign of a type.  This is the only useful thing in the sysct data.

diff --git a/sys/vm/vm_meter.c b/sys/vm/vm_meter.c
index 5f4cd46ab1e..67f24609b5a 100644
--- a/sys/vm/vm_meter.c
+++ b/sys/vm/vm_meter.c
@@ -266,8 +266,27 @@ static SYSCTL_NODE(_vm_stats, OID_AUTO, vm, CTLFLAG_RW, 0,
        "VM meter vm stats");
SYSCTL_NODE(_vm_stats, OID_AUTO, misc, CTLFLAG_RW, 0, "VM meter misc stats");

+static int
+sysctl_handle_vmstat(SYSCTL_HANDLER_ARGS)
+{
+       uint64_t out;
+#ifdef COMPAT_FREEBSD11
+       uint32_t out32;
+#endif
+
+       out = counter_u64_fetch(*(counter_u64_t *)arg1);
+#ifdef COMPAT_FREEBSD11
+       if (req->oldlen == sizeof(uint32_t)) {
+               out32 = MIN(out, 0xffffffff);


Incompatible for negative advantages.  It should truncate.

+               return (SYSCTL_OUT(req, &out32, sizeof(uint32_t)));


It seems to be necessary to drop the error reporting for compatility.
How does sysctl_bufspace() work when it intentionally keeps the error
reporting and even has a comment about this?

+       }
+#endif
+       return (SYSCTL_OUT(req, &out, sizeof(uint64_t)));
+}


This still has the style bug 'sizeof(typename)'.

If you want to use the type name too much, then you can also use it to
spell 0xffffffff as (uint32_t)-1.  Then the size 4 bytes would be
consistently represented as 32 bits.  Going from type variable out32
to its maximum should be spellable as __maxof(out32) but __maxof()
doesn't exist.

+
#define VM_STATS(parent, var, descr) \
-    SYSCTL_COUNTER_U64(parent, OID_AUTO, var, CTLFLAG_RD, &vm_cnt.var, descr)
+    SYSCTL_OID(parent, OID_AUTO, var, CTLTYPE_U64 | CTLFLAG_MPSAFE | \
+    CTLFLAG_RD, &vm_cnt.var, 0, sysctl_handle_vmstat, "QU", descr);
#define VM_STATS_VM(var, descr)         VM_STATS(_vm_stats_vm, var, descr)
#define VM_STATS_SYS(var, descr)        VM_STATS(_vm_stats_sys, var, descr)


Bruce
_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r317061 - in head: libexec/rpc.rstatd sys/amd64/amd64 sys/amd64/include sys/arm/arm sys/arm/include sys/arm64/include sys/cddl/contrib/opensolaris/uts/common/fs/zfs sys/compat/linprocfs...

Reply via email to