Send netdisco-users mailing list submissions to
netdisco-users@lists.sourceforge.net
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/netdisco-users
or, via email, send a message with subject or body 'help' to
netdisco-users-requ...@lists.sourceforge.net
You can reach the person managing the list at
netdisco-users-ow...@lists.sourceforge.net
When replying, please edit your Subject line so it is more specific
than "Re: Contents of netdisco-users digest..."
Today's Topics:
1. Re: Issues with netdisco and postgresql (Christian Ramseyer)
2. Re: Issues with netdisco and postgresql (Michael Butash)
--- Begin Message ---
We did a lot of experimenting and tuning over the years, but in the end
macsucking through a large Cisco network is just very slow due to the
community based indexing. It effectively turns a linear problem into a
quadratish one, and a DC-style switch with a bunch of FEX and a large
amount of VLANS can clog up a poller for ten minutes easily.
I was thinking lately about trying non-SNMP variants of macsuck, e.g. by
leveraging ntc-templates for the parsing of all "show mac-address table"
IOS/NXOS variants and then writing the results back via the API. I'm
pretty sure that would speed up things a lot. In the meantime, we just
throw CPUs and workers: at the problem until we finish in the desired
cycle time :)
Cheers
Christian
On 06.11.21 08:42, Oliver Gorwits wrote:
All you can really do is distribute collection, spread out timed jobs
Yes, it's worth looking into the features we have... you can run
multiple backends (restricted to certain device IPs if you wish, or
not), you can spread out the polling by having different arpwalk/etc
commands limited to certain device IPs, and of course simply alter the
number of pollers on the backend.
Similar to Mike I would be interested to hear examples of how people
have used these features (I do know that distributed backends has been
helpful to those running global networks, to have the SNMP happen
locally and DB data sent over the WAN back home) -- and any tips for DB
scaling as that becomes the central bottleneck for it all.
Thanks,
On Sat, 6 Nov 2021 at 07:11, Michael Butash <mich...@butash.net
<mailto:mich...@butash.net>> wrote:
Do you have a lot of fex or stack switch ports? With ciscos I
always notice vss/stack/fex's are terribly slow in returning snmp
data and put huge delays into them having to remote query cluster
devices with enough they'll back monitoring up. I've seen this in
bigger environments when polling a lot of pseudo-devices like those,
particularly remotely a few states away over vpn. I find these
often good reasons not to use vss/stack/fex's, and seen enough tools
conversely also impact (ie crash) the devices adversely from
over-polling. Hell, last asr1000 I pointed netdisco at began
causing 100% cpu, never underestimate vendor bugs/stupidity.
All you can really do is distribute collection, spread out timed
jobs, or squeeze performance out of the app/db themselves debugging
the box. Check iotop and db tools for performance bottlenecks too.
I'd be curious what a scale out model of netdisco would look like
for large/huge environments if distributed layers of collection,
database, graphic ui's, etc, but never needed to.
-mb
On Fri, Nov 5, 2021 at 9:01 PM Muris <alcat...@gmail.com
<mailto:alcat...@gmail.com>> wrote:
Hi Trent, thanks for that.____
__ __
What I found out was, because the environment is so big, the
arpnips and macsucks are overlapping, I have them on a hourly
basis..so I don’t think they finishing on time... so I have
separated them out by 2 – 3 hours now.. and it seems to be
working a lot better. When arpnips/macsucks are running seems to
use 90-100% cpu.. and then web requests can timeout looking up
things.____
__ __
Will checkout the vacuum things too.____
__ __
Muris____
__ __
*From: *Trent Curtis <trent.cur...@gmail.com
<mailto:trent.cur...@gmail.com>>
*Date: *Thursday, 4 November 2021 at 22:07
*To: *Dominik Müller <mue...@t-online.de
<mailto:mue...@t-online.de>>
*Cc: *<netdisco-users@lists.sourceforge.net
<mailto:netdisco-users@lists.sourceforge.net>>
*Subject: *Re: [Netdisco] Issues with netdisco and postgresql____
__ __
I would suggest stopping services, restarting postgres, and
performing a full database vacuum, and potentially clearing the
admin table.____
__ __
1. Sudo as netdisco and Stop Netdisco-backend and Netdisco-web: ____
~/bin/netdisco-backend stop____
~/bin/netdisco-web stop____
__ __
2. Restart postgres service per your distro instructions. ____
__ __
3. Sudo as postgres user and execute: ____
psql netdisco____
set statement_timeout=0;____
vacuum full; ____
__ __
The vacuum will take a while depending on the table sizes. I'd
suggest also making the pgtune tweaks if you have not already.
It would also good to take a look at your postgres logs to see
if the vacuum is actually running. Another tip would be to clear
the admin table all together to clear any schedule backlogs,
this could be one of the culprits of the slowness. ____
__ __
To clear the admin queue, sudo as netdisco and execute: ____
~/bin/netdisco-do psql -e "DELETE FROM admin;"____
__ __
Also please share your schedule config from the deployment.yml
file. Ultimately you want to avoid any job overlaps to prevent
scheduling backlogs.____
__ __
Hope this helps.____
__ __
- Trent____
__ __
On Thu, Nov 4, 2021, 1:21 AM Dominik Müller <mue...@t-online.de
<mailto:mue...@t-online.de>> wrote:____
Hi,____
__ __
sounds like there is a constant writing on your db.____
How often do you discover your Network?____
__ __
BR____
__ __
Dominik____
____
Am 04.11.2021 um 04:47 schrieb Kurt Buff
<kurt.b...@gmail.com <mailto:kurt.b...@gmail.com>>:____
____
A pure guess - is the vacuum running?____
https://www.postgresql.org/docs/9.1/sql-vacuum.html
<https://www.postgresql.org/docs/9.1/sql-vacuum.html>____
__ __
Kurt____
__ __
On Wed, Nov 3, 2021 at 7:55 AM Muris <alcat...@gmail.com
<mailto:alcat...@gmail.com>> wrote:____
Hi All,____
____
I have a bit of a problem where netdisco database
seems to have stopped working and lookups from web
interface I have tried to restart postgresql and
netdisco-web and netdisco-backend helps a bit but
then starts happening again____
____
The database is very slow to access looking at items
through the web interface, and looking up a device
comes up with “Search failed! Please contact your
site administrator (server error).”____
____
Netdisco-backend and netdisco-log don’t show any
errors, so I think its something postgresql related
that’s not going right…____
____
I then stop netdisco-web and netdisco-backend and
theres this “postgres : checkpointer” that
constantly hangs around and takes cpu____
____
What can I do to fix it or see whats going on with
postgres? ____
____
Thanks____
Muris____
_______________________________________________
Netdisco mailing list
netdisco-users@lists.sourceforge.net
<mailto:netdisco-users@lists.sourceforge.net>
https://sourceforge.net/p/netdisco/mailman/netdisco-users/
<https://sourceforge.net/p/netdisco/mailman/netdisco-users/>____
_______________________________________________
Netdisco mailing list
netdisco-users@lists.sourceforge.net
<mailto:netdisco-users@lists.sourceforge.net>
https://sourceforge.net/p/netdisco/mailman/netdisco-users/
<https://sourceforge.net/p/netdisco/mailman/netdisco-users/>____
_______________________________________________
Netdisco mailing list
netdisco-users@lists.sourceforge.net
<mailto:netdisco-users@lists.sourceforge.net>
https://sourceforge.net/p/netdisco/mailman/netdisco-users/
<https://sourceforge.net/p/netdisco/mailman/netdisco-users/>____
_______________________________________________
Netdisco mailing list
netdisco-users@lists.sourceforge.net
<mailto:netdisco-users@lists.sourceforge.net>
https://sourceforge.net/p/netdisco/mailman/netdisco-users/
<https://sourceforge.net/p/netdisco/mailman/netdisco-users/>
_______________________________________________
Netdisco mailing list
netdisco-users@lists.sourceforge.net
<mailto:netdisco-users@lists.sourceforge.net>
https://sourceforge.net/p/netdisco/mailman/netdisco-users/
<https://sourceforge.net/p/netdisco/mailman/netdisco-users/>
_______________________________________________
Netdisco mailing list
netdisco-users@lists.sourceforge.net
https://sourceforge.net/p/netdisco/mailman/netdisco-users/
--- End Message ---
--- Begin Message ---
Oddly, I'm not sure cli scraping would help with cluster/stack/fex things.
First time I ran into this was with old 3750 stacks mid 2000's, and later
even worse with nexus fex. If you snmpwalk and watch like a real geek,
you'll notice it slows down hitting non-local ports ala stack/fex members
significantly/horribly. Logging onto cli, do a "show interface counter" or
something else to show across all ports, you'll notice the returned output
is much, much slower too for same non-local port. Hmm, that really sucks
you say.
In all the vss/stack/fex relations ala cisco, one switch is master, and
polls the rest. You hit the master, fast snmp as normal, then it's got to
reach out and grab data via some rpc/ipc relation to non-local devices, etl
and feed that back via master cluster snmp, it adds up quick the delay.
I've not dealt with juniper, extreme, other stack-or-die vendors to know
how much they suck too, but I know cisco sucks at it.
I'm all for (usually arista) vxlan/evpn these days with just a bunch of
switches, each polled, and work fine, fsck stacking/clustering. I really
do hate the stupid stack/cluster mentality as they never work quite right.
Again, Cisco can't code themselves out of their own way anymore, and not
sure the other vendors really do either. Deprecating snmp is a real bad
idea until the oamp tools catch up still.
Maybe api-based methods improve this with some upper-layer caching or such
mojo, but too bad there still isn't any single good standard between
vendors to like snmp in that realm. Or just keep snmp working as always as
a lowest common denominator even some monolithic dinosaur like AT&T can
understand, but these are the challenges of evolution. If they don't,
shoot them in the face as a bad actor.
I'm sure Oliver and company is rewriting all snmp polling to api function
right now though even, right? :D
-mb
On Sat, Nov 6, 2021 at 4:28 PM Christian Ramseyer <ramse...@netnea.com>
wrote:
> We did a lot of experimenting and tuning over the years, but in the end
> macsucking through a large Cisco network is just very slow due to the
> community based indexing. It effectively turns a linear problem into a
> quadratish one, and a DC-style switch with a bunch of FEX and a large
> amount of VLANS can clog up a poller for ten minutes easily.
>
> I was thinking lately about trying non-SNMP variants of macsuck, e.g. by
> leveraging ntc-templates for the parsing of all "show mac-address table"
> IOS/NXOS variants and then writing the results back via the API. I'm
> pretty sure that would speed up things a lot. In the meantime, we just
> throw CPUs and workers: at the problem until we finish in the desired
> cycle time :)
>
>
> Cheers
> Christian
>
>
> On 06.11.21 08:42, Oliver Gorwits wrote:
> > All you can really do is distribute collection, spread out timed jobs
> >
> >
> > Yes, it's worth looking into the features we have... you can run
> > multiple backends (restricted to certain device IPs if you wish, or
> > not), you can spread out the polling by having different arpwalk/etc
> > commands limited to certain device IPs, and of course simply alter the
> > number of pollers on the backend.
> >
> > Similar to Mike I would be interested to hear examples of how people
> > have used these features (I do know that distributed backends has been
> > helpful to those running global networks, to have the SNMP happen
> > locally and DB data sent over the WAN back home) -- and any tips for DB
> > scaling as that becomes the central bottleneck for it all.
> >
> > Thanks,
> >
> > On Sat, 6 Nov 2021 at 07:11, Michael Butash <mich...@butash.net
> > <mailto:mich...@butash.net>> wrote:
> >
> > Do you have a lot of fex or stack switch ports? With ciscos I
> > always notice vss/stack/fex's are terribly slow in returning snmp
> > data and put huge delays into them having to remote query cluster
> > devices with enough they'll back monitoring up. I've seen this in
> > bigger environments when polling a lot of pseudo-devices like those,
> > particularly remotely a few states away over vpn. I find these
> > often good reasons not to use vss/stack/fex's, and seen enough tools
> > conversely also impact (ie crash) the devices adversely from
> > over-polling. Hell, last asr1000 I pointed netdisco at began
> > causing 100% cpu, never underestimate vendor bugs/stupidity.
> >
> > All you can really do is distribute collection, spread out timed
> > jobs, or squeeze performance out of the app/db themselves debugging
> > the box. Check iotop and db tools for performance bottlenecks too.
> >
> > I'd be curious what a scale out model of netdisco would look like
> > for large/huge environments if distributed layers of collection,
> > database, graphic ui's, etc, but never needed to.
> >
> > -mb
> >
> >
> > On Fri, Nov 5, 2021 at 9:01 PM Muris <alcat...@gmail.com
> > <mailto:alcat...@gmail.com>> wrote:
> >
> > Hi Trent, thanks for that.____
> >
> > __ __
> >
> > What I found out was, because the environment is so big, the
> > arpnips and macsucks are overlapping, I have them on a hourly
> > basis..so I don’t think they finishing on time... so I have
> > separated them out by 2 – 3 hours now.. and it seems to be
> > working a lot better. When arpnips/macsucks are running seems to
> > use 90-100% cpu.. and then web requests can timeout looking up
> > things.____
> >
> > __ __
> >
> > Will checkout the vacuum things too.____
> >
> > __ __
> >
> > Muris____
> >
> > __ __
> >
> > *From: *Trent Curtis <trent.cur...@gmail.com
> > <mailto:trent.cur...@gmail.com>>
> > *Date: *Thursday, 4 November 2021 at 22:07
> > *To: *Dominik Müller <mue...@t-online.de
> > <mailto:mue...@t-online.de>>
> > *Cc: *<netdisco-users@lists.sourceforge.net
> > <mailto:netdisco-users@lists.sourceforge.net>>
> > *Subject: *Re: [Netdisco] Issues with netdisco and postgresql____
> >
> > __ __
> >
> > I would suggest stopping services, restarting postgres, and
> > performing a full database vacuum, and potentially clearing the
> > admin table.____
> >
> > __ __
> >
> > 1. Sudo as netdisco and Stop Netdisco-backend and Netdisco-web:
> ____
> >
> > ~/bin/netdisco-backend stop____
> >
> > ~/bin/netdisco-web stop____
> >
> > __ __
> >
> > 2. Restart postgres service per your distro instructions. ____
> >
> > __ __
> >
> > 3. Sudo as postgres user and execute: ____
> >
> > psql netdisco____
> >
> > set statement_timeout=0;____
> >
> > vacuum full; ____
> >
> > __ __
> >
> > The vacuum will take a while depending on the table sizes. I'd
> > suggest also making the pgtune tweaks if you have not already.
> > It would also good to take a look at your postgres logs to see
> > if the vacuum is actually running. Another tip would be to clear
> > the admin table all together to clear any schedule backlogs,
> > this could be one of the culprits of the slowness. ____
> >
> > __ __
> >
> > To clear the admin queue, sudo as netdisco and execute: ____
> >
> > ~/bin/netdisco-do psql -e "DELETE FROM admin;"____
> >
> > __ __
> >
> > Also please share your schedule config from the deployment.yml
> > file. Ultimately you want to avoid any job overlaps to prevent
> > scheduling backlogs.____
> >
> > __ __
> >
> > Hope this helps.____
> >
> > __ __
> >
> > - Trent____
> >
> > __ __
> >
> > On Thu, Nov 4, 2021, 1:21 AM Dominik Müller <mue...@t-online.de
> > <mailto:mue...@t-online.de>> wrote:____
> >
> > Hi,____
> >
> > __ __
> >
> > sounds like there is a constant writing on your db.____
> >
> > How often do you discover your Network?____
> >
> > __ __
> >
> > BR____
> >
> > __ __
> >
> > Dominik____
> >
> >
> >
> > ____
> >
> > Am 04.11.2021 um 04:47 schrieb Kurt Buff
> > <kurt.b...@gmail.com <mailto:kurt.b...@gmail.com>>:____
> >
> > ____
> >
> > A pure guess - is the vacuum running?____
> >
> > https://www.postgresql.org/docs/9.1/sql-vacuum.html
> > <https://www.postgresql.org/docs/9.1/sql-vacuum.html
> >____
> >
> > __ __
> >
> > Kurt____
> >
> > __ __
> >
> > On Wed, Nov 3, 2021 at 7:55 AM Muris <alcat...@gmail.com
> > <mailto:alcat...@gmail.com>> wrote:____
> >
> > Hi All,____
> >
> > ____
> >
> > I have a bit of a problem where netdisco database
> > seems to have stopped working and lookups from web
> > interface I have tried to restart postgresql and
> > netdisco-web and netdisco-backend helps a bit but
> > then starts happening again____
> >
> > ____
> >
> > The database is very slow to access looking at items
> > through the web interface, and looking up a device
> > comes up with “Search failed! Please contact your
> > site administrator (server error).”____
> >
> > ____
> >
> > Netdisco-backend and netdisco-log don’t show any
> > errors, so I think its something postgresql related
> > that’s not going right…____
> >
> > ____
> >
> > I then stop netdisco-web and netdisco-backend and
> > theres this “postgres : checkpointer” that
> > constantly hangs around and takes cpu____
> >
> > ____
> >
> > What can I do to fix it or see whats going on with
> > postgres? ____
> >
> > ____
> >
> > Thanks____
> >
> > Muris____
> >
> > _______________________________________________
> > Netdisco mailing list
> > netdisco-users@lists.sourceforge.net
> > <mailto:netdisco-users@lists.sourceforge.net>
> >
> https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> > <
> https://sourceforge.net/p/netdisco/mailman/netdisco-users/>____
> >
> > _______________________________________________
> > Netdisco mailing list
> > netdisco-users@lists.sourceforge.net
> > <mailto:netdisco-users@lists.sourceforge.net>
> >
> https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> > <
> https://sourceforge.net/p/netdisco/mailman/netdisco-users/>____
> >
> > _______________________________________________
> > Netdisco mailing list
> > netdisco-users@lists.sourceforge.net
> > <mailto:netdisco-users@lists.sourceforge.net>
> > https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> > <https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> >____
> >
> > _______________________________________________
> > Netdisco mailing list
> > netdisco-users@lists.sourceforge.net
> > <mailto:netdisco-users@lists.sourceforge.net>
> > https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> > <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>
> >
> > _______________________________________________
> > Netdisco mailing list
> > netdisco-users@lists.sourceforge.net
> > <mailto:netdisco-users@lists.sourceforge.net>
> > https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> > <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>
> >
> >
> >
> > _______________________________________________
> > Netdisco mailing list
> > netdisco-users@lists.sourceforge.net
> > https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> >
>
--- End Message ---
_______________________________________________
Netdisco mailing list - Digest Mode
netdisco-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/netdisco-users