It's good to put the power or two thing in its own function. 

That said, about the comment changes,  hash size changes and auto tuning 
changes respectively....

Comments: Dude... It's not like removing the helpful comments is going to speed 
up either compiling this code or how fast it can process packets.  Please leave 
those in. 

The hash: Your comment about it not being perfect is incorrect. By making it 
1/8 you guarantee that it can never be perfect when fully loaded. If it is so 
important for you to guarantee that the minimum hash traversal on this hash is 
at LEAST FOUR when fully loaded then by all means make the change.  I think 
it's unbelievably wrong, penny wise/dollar foolish or maybe bit wise/CPU 
foolish but for some reason others agree with you for reasons that still do not 
make sense to me. So again by all means pessimize the hash table to save a few 
bytes. 

The tuning: Additionally you've removed the informational output that shows 
what the input was and what it was changed to. It's as if you want to make it 
harder for my techs to figure out what had gone wrong. 

Tuning 2: Additionally it appears you've removed the safety net in this code 
for clipping it down to min 512.

I don't get why this is such an issue, you do realize this code is run only 
once and is not in the critical path?  We should be optimizing those code for 
utility, user-friendliness and general readability not as if it was part of 
TCP's fast path. 


Sent from my iPhone

On Dec 7, 2012, at 7:06 AM, Andre Oppermann <an...@freebsd.org> wrote:

> On 28.11.2012 04:04, Alfred Perlstein wrote:
>> On 11/27/12 2:12 PM, Andre Oppermann wrote:
>>> On 27.11.2012 04:04, Alfred Perlstein wrote:
>>>> Author: alfred
>>>> Date: Tue Nov 27 03:04:24 2012
>>>> New Revision: 243594
>>>> URL: http://svnweb.freebsd.org/changeset/base/243594
>>>> 
>>>> Log:
>>>>   Auto size the tcbhashsize structure based on max sockets.
>>>> 
>>>>   While here, also make the code that enforces power-of-two more
>>>>   forgiving, instead of just resetting to 512, graciously round-down
>>>>   to the next lower power of two.
>>> 
>>> The porthash (also a parameter to in_pcbinfo_init()) should not be
>>> scaled the same as tcbhashsize.  It's unlikely that anyone is going
>>> to have more than a thousand listen ports.
>>> 
>>> The default tcbhashsize scaling factor of maxsockets / 4 is IMHO
>>> excessive.  If someone is running at the limit he's going to up
>>> maxsockets to double the target load.  Dividing by 8 is a better
>>> balance for normal and well used large memory machines.
>>> The hashsize calculation logic is done a bit in a roundabout way.
>>> 
>>> What do think of the attached patch to fix the porthash and to simplify
>>> the hashsize logic?
>> I like the porthash fix.
> 
> After tracking down how the porthash works in in_pcb.c it became
> apparent that every connection ends up there too.  So in the end
> the port hash should be the same size as the main hash for now.
> Having every connection on the port hash doesn't make a lot of
> sense, at least for TCP.  Only ports from listen sockets should
> end up there.  Changing that seems to be non-trivial though.
> 
>> However your change:
>> 1) removes explanation of what the code is doing. (comments/clarity)
> 
> Added your comments back in, in a more condensed form.  The comment
> about the perfect hash is nice but applies to every hash usage and
> is in every book about data structures.  The problem with the inpcb
> hash is that it can never be perfect because the input in not known.
> Even with an 1:1 mapping of maxsockets to the inpcb hash there will
> be collisions based on pure chance since the hash is derived from
> srcip:dstip:srcport:dstport.
> 
>> 2) removes the more forgiving code that rounds-down the tcp_tcbhashsize.
> >
>> Can you please maintain the comments (probably keeping the logic of 
>> maketcp_hashsize() in a separate
>> function).
> 
> We have a quite a number of places where this power-of-2 requirement
> applies.  I think it's better to add a generic macro for this rounding
> instead of re-inventing the wheel every time we have the same situation.
> 
> The comment about being more forgiving is sufficient in a commit
> message as the reader of the code wouldn't know that it previously
> was clamped down to TCBMINHASHSIZE (after printing a warning though).
> 
>> Can you please maintain the more forgiving code for dealing with non-power 
>> of 2 tunable?  I like
>> this, it makes it easier for admins and less error prone.
>> 
>> -       if (!powerof2(hashsize)) {
>> -               int oldhashsize = hashsize;
>> 
>> -               hashsize = maketcp_hashsize(hashsize);
>> -               /* prevent absurdly low value */
>> -               if (hashsize < 16)
>> -                       hashsize = 16;
>> -               printf("%s: WARNING: TCB hash size not a power of 2, "
>> -                   "clipped from %d to %d.\n", __func__, oldhashsize,
>> -                   hashsize);
> 
> Please see revised patch below.
> 
> -- 
> Andre
> 
> Index: netinet/tcp_subr.c
> ===================================================================
> --- netinet/tcp_subr.c    (revision 243985)
> +++ netinet/tcp_subr.c    (working copy)
> @@ -229,13 +229,13 @@
>            void *ip4hdr, const void *ip6hdr);
> 
> /*
> - * Target size of TCP PCB hash tables. Must be a power of two.
> + * Minimal size of TCP PCB hash tables. Must be a power of two.
>  *
>  * Note that this can be overridden by the kernel environment
>  * variable net.inet.tcp.tcbhashsize
>  */
> -#ifndef TCBHASHSIZE
> -#define TCBHASHSIZE    0
> +#ifndef TCBMINHASHSIZE
> +#define TCBMINHASHSIZE    512
> #endif
> 
> /*
> @@ -282,35 +282,10 @@
>    return (0);
> }
> 
> -/*
> - * Take a value and get the next power of 2 that doesn't overflow.
> - * Used to size the tcp_inpcb hash buckets.
> - */
> -static int
> -maketcp_hashsize(int size)
> -{
> -    int hashsize;
> -
> -    /*
> -     * auto tune.
> -     * get the next power of 2 higher than maxsockets.
> -     */
> -    hashsize = 1 << fls(size);
> -    /* catch overflow, and just go one power of 2 smaller */
> -    if (hashsize < size) {
> -        hashsize = 1 << (fls(size) - 1);
> -    }
> -    return (hashsize);
> -}
> -
> void
> tcp_init(void)
> {
> -    const char *tcbhash_tuneable;
> -    int hashsize;
> 
> -    tcbhash_tuneable = "net.inet.tcp.tcbhashsize";
> -
>    if (hhook_head_register(HHOOK_TYPE_TCP, HHOOK_TCP_EST_IN,
>        &V_tcp_hhh[HHOOK_TCP_EST_IN], HHOOK_NOWAIT|HHOOK_HEADISINVNET) != 0)
>        printf("%s: WARNING: unable to register helper hook\n", __func__);
> @@ -318,48 +293,21 @@
>        &V_tcp_hhh[HHOOK_TCP_EST_OUT], HHOOK_NOWAIT|HHOOK_HEADISINVNET) != 0)
>        printf("%s: WARNING: unable to register helper hook\n", __func__);
> 
> -    hashsize = TCBHASHSIZE;
> -    TUNABLE_INT_FETCH(tcbhash_tuneable, &hashsize);
> -    if (hashsize == 0) {
> -        /*
> -         * Auto tune the hash size based on maxsockets.
> -         * A perfect hash would have a 1:1 mapping
> -         * (hashsize = maxsockets) however it's been
> -         * suggested that O(2) average is better.
> -         */
> -        hashsize = maketcp_hashsize(maxsockets / 4);
> -        /*
> -         * Our historical default is 512,
> -         * do not autotune lower than this.
> -         */
> -        if (hashsize < 512)
> -            hashsize = 512;
> -        if (bootverbose)
> -            printf("%s: %s auto tuned to %d\n", __func__,
> -                tcbhash_tuneable, hashsize);
> -    }
>    /*
> -     * We require a hashsize to be a power of two.
> -     * Previously if it was not a power of two we would just reset it
> -     * back to 512, which could be a nasty surprise if you did not notice
> -     * the error message.
> -     * Instead what we do is clip it to the closest power of two lower
> -     * than the specified hash value.
> +     * Auto tune the hash size based on maxsockets.
> +     * It is set to 1/8 of maxsockets and rounded down
> +     * to the next power of two with a lower bound of
> +     * TCBMINHASHSIZE.
>     */
> -    if (!powerof2(hashsize)) {
> -        int oldhashsize = hashsize;
> -
> -        hashsize = maketcp_hashsize(hashsize);
> -        /* prevent absurdly low value */
> -        if (hashsize < 16)
> -            hashsize = 16;
> -        printf("%s: WARNING: TCB hash size not a power of 2, "
> -            "clipped from %d to %d.\n", __func__, oldhashsize,
> -            hashsize);
> +    tcp_tcbhashsize = powerof2rdown(maxsockets / 8);
> +    TUNABLE_INT_FETCH("net.inet.tcp.tcbhashsize", &tcp_tcbhashsize);
> +    if (!powerof2(tcp_tcbhashsize)) {
> +        tcp_tcbhashsize = powerof2rdfloor(tcp_tcbhashsize, TCBMINHASHSIZE);
> +        printf("WARNING: TCB hash size not a power of 2, rounded down\n");
>    }
> -    in_pcbinfo_init(&V_tcbinfo, "tcp", &V_tcb, hashsize, hashsize,
> -        "tcp_inpcb", tcp_inpcb_init, NULL, UMA_ZONE_NOFREE,
> -        IPI_HASHFIELDS_4TUPLE);
> +    in_pcbinfo_init(&V_tcbinfo, "tcp", &V_tcb, tcp_tcbhashsize,
> +        tcp_tcbhashsize, "tcp_inpcb", tcp_inpcb_init, NULL,
> +        UMA_ZONE_NOFREE, IPI_HASHFIELDS_4TUPLE);
> 
>    /*
>     * These have to be type stable for the benefit of the timers.
> @@ -393,7 +341,6 @@
>        tcp_rexmit_min = 1;
>    tcp_rexmit_slop = TCPTV_CPU_VAR;
>    tcp_finwait2_timeout = TCPTV_FINWAIT2_TIMEOUT;
> -    tcp_tcbhashsize = hashsize;
> 
>    TUNABLE_INT_FETCH("net.inet.tcp.soreceive_stream", &tcp_soreceive_stream);
>    if (tcp_soreceive_stream) {
> Index: sys/param.h
> ===================================================================
> --- sys/param.h    (revision 243985)
> +++ sys/param.h    (working copy)
> @@ -284,6 +284,9 @@
> #define    MIN(a,b) (((a)<(b))?(a):(b))
> #define    MAX(a,b) (((a)>(b))?(a):(b))
> 
> +#define    powerof2rdown(x)    (powerof2(x) ? (x) : (0x1 << fls((x) - 1) - 
> 1))
> +#define    powerof2rdfloor(x, y)    (MAX(powerof2rdown(x), (y)))
> +
> #ifdef _KERNEL
> /*
>  * Basic byte order function prototypes for non-inline functions.
> 
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to