netdisco-users Digest, Vol 184, Issue 5

netdisco-users-request Sat, 06 Nov 2021 18:37:28 -0700

Send netdisco-users mailing list submissions to
        netdisco-users@lists.sourceforge.net


To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.sourceforge.net/lists/listinfo/netdisco-users
or, via email, send a message with subject or body 'help' to
        netdisco-users-requ...@lists.sourceforge.net

You can reach the person managing the list at
        netdisco-users-ow...@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of netdisco-users digest..."

Today's Topics:

   1. Re: Issues with netdisco and postgresql (Christian Ramseyer)
   2. Re: Issues with netdisco and postgresql (Michael Butash)

--- Begin Message --- We did a lot of experimenting and tuning over the years, but in the endmacsucking through a large Cisco network is just very slow due to thecommunity based indexing. It effectively turns a linear problem into aquadratish one, and a DC-style switch with a bunch of FEX and a largeamount of VLANS can clog up a poller for ten minutes easily.

I was thinking lately about trying non-SNMP variants of macsuck, e.g. byleveraging ntc-templates for the parsing of all "show mac-address table"IOS/NXOS variants and then writing the results back via the API. I'mpretty sure that would speed up things a lot. In the meantime, we justthrow CPUs and workers: at the problem until we finish in the desiredcycle time :)



Cheers
Christian


On 06.11.21 08:42, Oliver Gorwits wrote:

    All you can really do is distribute collection, spread out timed jobs

Yes, it's worth looking into the features we have... you can runmultiple backends (restricted to certain device IPs if you wish, ornot), you can spread out the polling by having different arpwalk/etccommands limited to certain device IPs, and of course simply alter thenumber of pollers on the backend.

Similar to Mike I would be interested to hear examples of how peoplehave used these features (I do know that distributed backends has beenhelpful to those running global networks, to have the SNMP happenlocally and DB data sent over the WAN back home) -- and any tips for DBscaling as that becomes the central bottleneck for it all.


Thanks,

On Sat, 6 Nov 2021 at 07:11, Michael Butash <mich...@butash.net<mailto:mich...@butash.net>> wrote:


    Do you have a lot of fex or stack switch ports?  With ciscos I
    always notice vss/stack/fex's are terribly slow in returning snmp
    data and put huge delays into them having to remote query cluster
    devices with enough they'll back monitoring up.  I've seen this in
    bigger environments when polling a lot of pseudo-devices like those,
    particularly remotely a few states away over vpn.  I find these
    often good reasons not to use vss/stack/fex's, and seen enough tools
    conversely also impact (ie crash) the devices adversely from
    over-polling.  Hell, last asr1000 I pointed netdisco at began
    causing 100% cpu, never underestimate vendor bugs/stupidity.

    All you can really do is distribute collection, spread out timed
    jobs, or squeeze performance out of the app/db themselves debugging
    the box.  Check iotop and db tools for performance bottlenecks too.

    I'd be curious what a scale out model of netdisco would look like
    for large/huge environments if distributed layers of collection,
    database, graphic ui's, etc, but never needed to.

    -mb


    On Fri, Nov 5, 2021 at 9:01 PM Muris <alcat...@gmail.com
    <mailto:alcat...@gmail.com>> wrote:

        Hi Trent, thanks for that.____

        __ __

        What I found out was, because the environment is so big, the
        arpnips and macsucks are overlapping, I have them on a hourly
        basis..so I don’t think they finishing on time... so I have
        separated them out by 2 – 3 hours now.. and it seems to be
        working a lot better. When arpnips/macsucks are running seems to
        use 90-100% cpu.. and then web requests can timeout looking up
        things.____

        __ __

        Will checkout the vacuum things too.____

        __ __

        Muris____

        __ __

        *From: *Trent Curtis <trent.cur...@gmail.com
        <mailto:trent.cur...@gmail.com>>
        *Date: *Thursday, 4 November 2021 at 22:07
        *To: *Dominik Müller <mue...@t-online.de
        <mailto:mue...@t-online.de>>
        *Cc: *<netdisco-users@lists.sourceforge.net
        <mailto:netdisco-users@lists.sourceforge.net>>
        *Subject: *Re: [Netdisco] Issues with netdisco and postgresql____

        __ __

        I would suggest stopping services, restarting postgres, and
        performing a full database vacuum, and potentially clearing the
        admin table.____

        __ __

        1. Sudo as netdisco and Stop Netdisco-backend and Netdisco-web: ____

        ~/bin/netdisco-backend stop____

        ~/bin/netdisco-web stop____

        __ __

        2. Restart postgres service per your distro instructions. ____

        __ __

        3. Sudo as postgres user and execute: ____

        psql netdisco____

        set statement_timeout=0;____

        vacuum full; ____

        __ __

        The vacuum will take a while depending on the table sizes.  I'd
        suggest also making the pgtune tweaks if you have not already.
        It would also good to take a look at your postgres logs to see
        if the vacuum is actually running. Another tip would be to clear
        the admin table all together to clear any schedule backlogs,
        this could be one of the culprits of the slowness. ____

        __ __

        To clear the admin queue, sudo as netdisco and execute: ____

        ~/bin/netdisco-do psql -e "DELETE FROM admin;"____

        __ __

        Also please share your schedule config from the deployment.yml
        file. Ultimately you want to avoid any job overlaps to prevent
        scheduling backlogs.____

        __ __

        Hope this helps.____

        __ __

        - Trent____

        __ __

        On Thu, Nov 4, 2021, 1:21 AM Dominik Müller <mue...@t-online.de
        <mailto:mue...@t-online.de>> wrote:____

            Hi,____

            __ __

            sounds like there is a constant writing on your db.____

            How often do you discover your Network?____

            __ __

            BR____

            __ __

            Dominik____



            ____

                Am 04.11.2021 um 04:47 schrieb Kurt Buff
                <kurt.b...@gmail.com <mailto:kurt.b...@gmail.com>>:____

                ____

                A pure guess - is the vacuum running?____

                https://www.postgresql.org/docs/9.1/sql-vacuum.html
                <https://www.postgresql.org/docs/9.1/sql-vacuum.html>____

                __ __

                Kurt____

                __ __

                On Wed, Nov 3, 2021 at 7:55 AM Muris <alcat...@gmail.com
                <mailto:alcat...@gmail.com>> wrote:____

                    Hi All,____

                    ____

                    I have a bit of a problem where netdisco database
                    seems to have stopped working and lookups from web
                    interface I have tried to restart postgresql and
                    netdisco-web and netdisco-backend helps a bit but
                    then starts happening again____

                    ____

                    The database is very slow to access looking at items
                    through the web interface, and looking up a device
                    comes up with “Search failed! Please contact your
                    site administrator (server error).”____

                    ____

                    Netdisco-backend and netdisco-log don’t show any
                    errors, so I think its something postgresql related
                    that’s not going right…____

                    ____

                    I then stop netdisco-web and netdisco-backend and
                    theres this “postgres : checkpointer” that
                    constantly hangs around and takes cpu____

                    ____

                    What can I do to fix it or see whats going on with
                    postgres? ____

                    ____

                    Thanks____

                    Muris____

                    _______________________________________________
                    Netdisco mailing list
                    netdisco-users@lists.sourceforge.net
                    <mailto:netdisco-users@lists.sourceforge.net>
                    https://sourceforge.net/p/netdisco/mailman/netdisco-users/
                    
<https://sourceforge.net/p/netdisco/mailman/netdisco-users/>____

                _______________________________________________
                Netdisco mailing list
                netdisco-users@lists.sourceforge.net
                <mailto:netdisco-users@lists.sourceforge.net>
                https://sourceforge.net/p/netdisco/mailman/netdisco-users/
                <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>____

            _______________________________________________
            Netdisco mailing list
            netdisco-users@lists.sourceforge.net
            <mailto:netdisco-users@lists.sourceforge.net>
            https://sourceforge.net/p/netdisco/mailman/netdisco-users/
            <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>____

        _______________________________________________
        Netdisco mailing list
        netdisco-users@lists.sourceforge.net
        <mailto:netdisco-users@lists.sourceforge.net>
        https://sourceforge.net/p/netdisco/mailman/netdisco-users/
        <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>

    _______________________________________________
    Netdisco mailing list
    netdisco-users@lists.sourceforge.net
    <mailto:netdisco-users@lists.sourceforge.net>
    https://sourceforge.net/p/netdisco/mailman/netdisco-users/
    <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>



_______________________________________________
Netdisco mailing list
netdisco-users@lists.sourceforge.net
https://sourceforge.net/p/netdisco/mailman/netdisco-users/

--- End Message ---

--- Begin Message ---

Oddly, I'm not sure cli scraping would help with cluster/stack/fex things.

First time I ran into this was with old 3750 stacks mid 2000's, and later
even worse with nexus fex.  If you snmpwalk and watch like a real geek,
you'll notice it slows down hitting non-local ports ala stack/fex members
significantly/horribly.  Logging onto cli, do a "show interface counter" or
something else to show across all ports, you'll notice the returned output
is much, much slower too for same non-local port.  Hmm, that really sucks
you say.

In all the vss/stack/fex relations ala cisco, one switch is master, and
polls the rest.  You hit the master, fast snmp as normal, then it's got to
reach out and grab data via some rpc/ipc relation to non-local devices, etl
and feed that back via master cluster snmp, it adds up quick the delay.
I've not dealt with juniper, extreme, other stack-or-die vendors to know
how much they suck too, but I know cisco sucks at it.

I'm all for (usually arista) vxlan/evpn these days with just a bunch of
switches, each polled, and work fine, fsck stacking/clustering.  I really
do hate the stupid stack/cluster mentality as they never work quite right.
Again, Cisco can't code themselves out of their own way anymore, and not
sure the other vendors really do either.  Deprecating snmp is a real bad
idea until the oamp tools catch up still.

Maybe api-based methods improve this with some upper-layer caching or such
mojo, but too bad there still isn't any single good standard between
vendors to like snmp in that realm.  Or just keep snmp working as always as
a lowest common denominator even some monolithic dinosaur like AT&T can
understand, but these are the challenges of evolution.  If they don't,
shoot them in the face as a bad actor.

I'm sure Oliver and company is rewriting all snmp polling to api function
right now though even, right?  :D

-mb


On Sat, Nov 6, 2021 at 4:28 PM Christian Ramseyer <ramse...@netnea.com>
wrote:

> We did a lot of experimenting and tuning over the years, but in the end
> macsucking through a large Cisco network is just very slow due to the
> community based indexing. It effectively turns a linear problem into a
> quadratish one, and a DC-style switch with a bunch of FEX and a large
> amount of VLANS can clog up a poller for ten minutes easily.
>
> I was thinking lately about trying non-SNMP variants of macsuck, e.g. by
> leveraging ntc-templates for the parsing of all "show mac-address table"
> IOS/NXOS variants and then writing the results back via the API. I'm
> pretty sure that would speed up things a lot. In the meantime, we just
> throw CPUs and workers: at the problem until we finish in the desired
> cycle time :)
>
>
> Cheers
> Christian
>
>
> On 06.11.21 08:42, Oliver Gorwits wrote:
> >     All you can really do is distribute collection, spread out timed jobs
> >
> >
> > Yes, it's worth looking into the features we have... you can run
> > multiple backends (restricted to certain device IPs if you wish, or
> > not), you can spread out the polling by having different arpwalk/etc
> > commands limited to certain device IPs, and of course simply alter the
> > number of pollers on the backend.
> >
> > Similar to Mike I would be interested to hear examples of how people
> > have used these features (I do know that distributed backends has been
> > helpful to those running global networks, to have the SNMP happen
> > locally and DB data sent over the WAN back home) -- and any tips for DB
> > scaling as that becomes the central bottleneck for it all.
> >
> > Thanks,
> >
> > On Sat, 6 Nov 2021 at 07:11, Michael Butash <mich...@butash.net
> > <mailto:mich...@butash.net>> wrote:
> >
> >     Do you have a lot of fex or stack switch ports?  With ciscos I
> >     always notice vss/stack/fex's are terribly slow in returning snmp
> >     data and put huge delays into them having to remote query cluster
> >     devices with enough they'll back monitoring up.  I've seen this in
> >     bigger environments when polling a lot of pseudo-devices like those,
> >     particularly remotely a few states away over vpn.  I find these
> >     often good reasons not to use vss/stack/fex's, and seen enough tools
> >     conversely also impact (ie crash) the devices adversely from
> >     over-polling.  Hell, last asr1000 I pointed netdisco at began
> >     causing 100% cpu, never underestimate vendor bugs/stupidity.
> >
> >     All you can really do is distribute collection, spread out timed
> >     jobs, or squeeze performance out of the app/db themselves debugging
> >     the box.  Check iotop and db tools for performance bottlenecks too.
> >
> >     I'd be curious what a scale out model of netdisco would look like
> >     for large/huge environments if distributed layers of collection,
> >     database, graphic ui's, etc, but never needed to.
> >
> >     -mb
> >
> >
> >     On Fri, Nov 5, 2021 at 9:01 PM Muris <alcat...@gmail.com
> >     <mailto:alcat...@gmail.com>> wrote:
> >
> >         Hi Trent, thanks for that.____
> >
> >         __ __
> >
> >         What I found out was, because the environment is so big, the
> >         arpnips and macsucks are overlapping, I have them on a hourly
> >         basis..so I don’t think they finishing on time... so I have
> >         separated them out by 2 – 3 hours now.. and it seems to be
> >         working a lot better. When arpnips/macsucks are running seems to
> >         use 90-100% cpu.. and then web requests can timeout looking up
> >         things.____
> >
> >         __ __
> >
> >         Will checkout the vacuum things too.____
> >
> >         __ __
> >
> >         Muris____
> >
> >         __ __
> >
> >         *From: *Trent Curtis <trent.cur...@gmail.com
> >         <mailto:trent.cur...@gmail.com>>
> >         *Date: *Thursday, 4 November 2021 at 22:07
> >         *To: *Dominik Müller <mue...@t-online.de
> >         <mailto:mue...@t-online.de>>
> >         *Cc: *<netdisco-users@lists.sourceforge.net
> >         <mailto:netdisco-users@lists.sourceforge.net>>
> >         *Subject: *Re: [Netdisco] Issues with netdisco and postgresql____
> >
> >         __ __
> >
> >         I would suggest stopping services, restarting postgres, and
> >         performing a full database vacuum, and potentially clearing the
> >         admin table.____
> >
> >         __ __
> >
> >         1. Sudo as netdisco and Stop Netdisco-backend and Netdisco-web:
> ____
> >
> >         ~/bin/netdisco-backend stop____
> >
> >         ~/bin/netdisco-web stop____
> >
> >         __ __
> >
> >         2. Restart postgres service per your distro instructions. ____
> >
> >         __ __
> >
> >         3. Sudo as postgres user and execute: ____
> >
> >         psql netdisco____
> >
> >         set statement_timeout=0;____
> >
> >         vacuum full; ____
> >
> >         __ __
> >
> >         The vacuum will take a while depending on the table sizes.  I'd
> >         suggest also making the pgtune tweaks if you have not already.
> >         It would also good to take a look at your postgres logs to see
> >         if the vacuum is actually running. Another tip would be to clear
> >         the admin table all together to clear any schedule backlogs,
> >         this could be one of the culprits of the slowness. ____
> >
> >         __ __
> >
> >         To clear the admin queue, sudo as netdisco and execute: ____
> >
> >         ~/bin/netdisco-do psql -e "DELETE FROM admin;"____
> >
> >         __ __
> >
> >         Also please share your schedule config from the deployment.yml
> >         file. Ultimately you want to avoid any job overlaps to prevent
> >         scheduling backlogs.____
> >
> >         __ __
> >
> >         Hope this helps.____
> >
> >         __ __
> >
> >         - Trent____
> >
> >         __ __
> >
> >         On Thu, Nov 4, 2021, 1:21 AM Dominik Müller <mue...@t-online.de
> >         <mailto:mue...@t-online.de>> wrote:____
> >
> >             Hi,____
> >
> >             __ __
> >
> >             sounds like there is a constant writing on your db.____
> >
> >             How often do you discover your Network?____
> >
> >             __ __
> >
> >             BR____
> >
> >             __ __
> >
> >             Dominik____
> >
> >
> >
> >             ____
> >
> >                 Am 04.11.2021 um 04:47 schrieb Kurt Buff
> >                 <kurt.b...@gmail.com <mailto:kurt.b...@gmail.com>>:____
> >
> >                 ____
> >
> >                 A pure guess - is the vacuum running?____
> >
> >                 https://www.postgresql.org/docs/9.1/sql-vacuum.html
> >                 <https://www.postgresql.org/docs/9.1/sql-vacuum.html
> >____
> >
> >                 __ __
> >
> >                 Kurt____
> >
> >                 __ __
> >
> >                 On Wed, Nov 3, 2021 at 7:55 AM Muris <alcat...@gmail.com
> >                 <mailto:alcat...@gmail.com>> wrote:____
> >
> >                     Hi All,____
> >
> >                     ____
> >
> >                     I have a bit of a problem where netdisco database
> >                     seems to have stopped working and lookups from web
> >                     interface I have tried to restart postgresql and
> >                     netdisco-web and netdisco-backend helps a bit but
> >                     then starts happening again____
> >
> >                     ____
> >
> >                     The database is very slow to access looking at items
> >                     through the web interface, and looking up a device
> >                     comes up with “Search failed! Please contact your
> >                     site administrator (server error).”____
> >
> >                     ____
> >
> >                     Netdisco-backend and netdisco-log don’t show any
> >                     errors, so I think its something postgresql related
> >                     that’s not going right…____
> >
> >                     ____
> >
> >                     I then stop netdisco-web and netdisco-backend and
> >                     theres this “postgres : checkpointer” that
> >                     constantly hangs around and takes cpu____
> >
> >                     ____
> >
> >                     What can I do to fix it or see whats going on with
> >                     postgres? ____
> >
> >                     ____
> >
> >                     Thanks____
> >
> >                     Muris____
> >
> >                     _______________________________________________
> >                     Netdisco mailing list
> >                     netdisco-users@lists.sourceforge.net
> >                     <mailto:netdisco-users@lists.sourceforge.net>
> >
> https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> >                     <
> https://sourceforge.net/p/netdisco/mailman/netdisco-users/>____
> >
> >                 _______________________________________________
> >                 Netdisco mailing list
> >                 netdisco-users@lists.sourceforge.net
> >                 <mailto:netdisco-users@lists.sourceforge.net>
> >
> https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> >                 <
> https://sourceforge.net/p/netdisco/mailman/netdisco-users/>____
> >
> >             _______________________________________________
> >             Netdisco mailing list
> >             netdisco-users@lists.sourceforge.net
> >             <mailto:netdisco-users@lists.sourceforge.net>
> >             https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> >             <https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> >____
> >
> >         _______________________________________________
> >         Netdisco mailing list
> >         netdisco-users@lists.sourceforge.net
> >         <mailto:netdisco-users@lists.sourceforge.net>
> >         https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> >         <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>
> >
> >     _______________________________________________
> >     Netdisco mailing list
> >     netdisco-users@lists.sourceforge.net
> >     <mailto:netdisco-users@lists.sourceforge.net>
> >     https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> >     <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>
> >
> >
> >
> > _______________________________________________
> > Netdisco mailing list
> > netdisco-users@lists.sourceforge.net
> > https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> >
>

--- End Message ---

_______________________________________________
Netdisco mailing list - Digest Mode
netdisco-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/netdisco-users

netdisco-users Digest, Vol 184, Issue 5

Reply via email to