I also applied these configuration options earlier today to all the servers in 
1 of my pools that was experiencing high IO load and repeated SigAlarms:
command_timeout: 600
wserver_timeout: 30
max_recover: 150

And since then, everything has been quiet:

IO on the main node that gossips externally: https://i.imgur.com/ERgz0Xo.jpg 
<https://i.imgur.com/ERgz0Xo.jpg>

IO from another node in the same pool that gossips internally with the above 
node: https://i.imgur.com/wsaxrJ5.jpg <https://i.imgur.com/wsaxrJ5.jpg>

Hopefully this can help other operators keep things in better shape for the 
time being.

-T


> On Feb 6, 2019, at 3:22 AM, Rolf Wuerdemann <ro...@digitalis.org> wrote:
> 
> With your suggestions:
> 
> load average below 1
> Traffic: ~150G/day
> 
> Best,
> 
>   Rolf
> 
> Am 2019-02-04 12:52, schrieb Martin Dobrev:
>> Hi,
>> I've spent last week trying to optimize configuration as much as
>> possible. Following advise from a previous mail I've added:
>>> command_timeout: 600
>>> wserver_timeout: 30
>>> max_recover: 150
>> to my sksconf and it seems this fixed majority of the EventLoop
>> failures. I've added DB_CONFIG in KDB/PTree folders to get rid of DB
>> archive logs that were causing plenty of IO load too.
>> My clusters are now happily responding to queries and load-average is
>> bellow one. Traffic wise things look better too, ~20GB/day.
>> Kind regards,
>> Martin Dobrev
>> P.S. Adding/changing DB_CONFIG might cause an error in the databases
>> that you can easily fix by running
>> db_recover -e -v -h <path to SKS>/{KDB,PTree}
>> On 04/02/2019 09:49, Rolf Wuerdemann wrote:
>>> Hi,
>>> Don't get me wrong, but within three days I've got 450G traffic
>>> which can be assigned to sks by 99.9%. Estimated to 30 days this
>>> means 4.5T (which is in good agreement of your 2+T/Key for these
>>> two poison keys).
>>> With this amount of traffic and the possibility to get
>>> more of this keys (thus more traffic) every moment, I think it's
>>> only a question of time until the network with the current
>>> implementation will vanish. Traffic increased roughly a factor of
>>> 300 (15G->4.5T) within twelve months, nodes within the network
>>> decreased by a factor of two at least for the same time.
>>> So: where to go and how?
>>> Just my 2ct,
>>> rowue
>>> Am 2019-01-30 22:09, schrieb Martin Dobrev:
>>> Hi,
>>> My observations so far show that both keys generate  2+ TB/month
>>> traffic on average for all my clustered nodes. I'm running nginx +
>>> Varnish in-memory cache tuned at 5 minutes TTL which gives plenty of
>>> CPU cycles for the never-ending EventLoop alarm loops. The latter
>>> cause load-average spikes of up to 10 with just 4 Docker containers
>>> running on a 12 core system.
>>> Don't get me wrong. The throttling penalty is something I'd
>>> swallow-up
>>> as long as we keep the network running.
>>> Regards,
>>> Martin
>>> keyserver.dobrev.eu | pgp.dobrev.it
>>> -------- Original message --------
>>> From: Kristian Fiskerstrand
>>> <kristian.fiskerstr...@sumptuouscapital.com>
>>> Date: 30/01/2019 20:18 (GMT+00:00)
>>> To: Shengjing Zhu <zsj950...@gmail.com>, sks-devel@nongnu.org
>>> Subject: Re: [Sks-devel] Unusual traffic for key 0x69D2EAD9 and
>>> 0xB33B4659
>>> On 1/12/19 8:15 PM, Shengjing Zhu wrote:
>>> I think these requests are quite unusual.
>>> Does anyone know what happens to these two keys?
>>> Just to add a comment on this, adding a cache on the load-balancer
>>> is
>>> really a nice way to slow down hits on the underlying SKS nodes, I
>>> keep
>>> cache for 10 minutes in nginx, which really makes life more
>>> pleasant.
>>> --
>>> ----------------------------
>>> Kristian Fiskerstrand
>>> Blog: https://blog.sumptuouscapital.com
>>> Twitter: @krifisk
>>> ----------------------------
>>> Public OpenPGP keyblock at hkp://pool.sks-keyservers.net
>>> fpr:94CB AFDD 3034 5109 5618 35AA 0B7F 8B60 E3ED FAE3
>>> ----------------------------
>>> "Action is the foundational key to all success"
>>> (Pablo Picasso)
>>> _______________________________________________
>>> Sks-devel mailing list
>>> Sks-devel@nongnu.org
>>> https://lists.nongnu.org/mailman/listinfo/sks-devel
>> _______________________________________________
>> Sks-devel mailing list
>> Sks-devel@nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/sks-devel
> 
> --
> Security is an illusion - Datasecurity twice
> Rolf Würdemann  -  ro...@digitalis.org  -  DL9ROW
> GnuPG fingerprint:    EEDC BEA9 EFEA 54A9 E1A9  2D54 69CC 9F31 6C64 206A
> xmpp: ro...@digitalis.org E1189573 6B4A150C A0C2BF5A 5553F865 0B9CBF7A
>      ro...@jabber.ccc.de 64CBBB68 0A3514A4 026FC1E7 5328CE87 AEE2185F
> 
> _______________________________________________
> Sks-devel mailing list
> Sks-devel@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/sks-devel

Attachment: signature.asc
Description: Message signed with OpenPGP

_______________________________________________
Sks-devel mailing list
Sks-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/sks-devel

Reply via email to