Re: generic message-id redirector

2021-02-01 Thread Konstantin Ryabitsev
On Mon, Feb 01, 2021 at 02:26:30PM +0100, Uwe Kleine-König wrote: > > PublicInbox::NewsWWW fallback lets //$host/$message_id work (no /r/). > > It can be run as a standalone PSGI, too, see examples/newswww.psgi > > Huh, it seems I have to dig deeper into the internals of Plack. Thanks. > > > At

Re: generic message-id redirector

2021-02-01 Thread Eric Wong
Uwe Kleine-König wrote: > On Mon, Feb 01, 2021 at 11:10:49AM +, Eric Wong wrote: > > To get /r/, you can use the "mount" directive in the > > Plack::Builder DSL as shown in example/newswww.psgi > > > > Is there some additional code or configuration necessary to make this > > > work? Am I

Re: generic message-id redirector

2021-02-01 Thread Uwe Kleine-König
Hello, [adding Konstantin to Cc:] On Mon, Feb 01, 2021 at 11:10:49AM +, Eric Wong wrote: > Uwe Kleine-König wrote: > > I'm currently trying to get up a public-inbox instance and I fail to > > setup a generic message-id redirector as lore.kernel.org implements it. > > That is a request to

Re: [PATCH 2/2] doc: add lei-overview(7)

2021-02-01 Thread Eric Wong
Eric Wong wrote: > Kyle Meyer wrote: > > +=item $ lei q -t -o t.mbox --format mboxrd --mua=mutt s:lei s:skeleton > > > > +Write mboxrd-formatted results to t.mbox and enter mutt to view the > > +file by invoking C. > > Thanks for this series. I'll take a closer look later (or > tomorrow) It

Re: generic message-id redirector

2021-02-01 Thread Eric Wong
Uwe Kleine-König wrote: > Hello, > > I'm currently trying to get up a public-inbox instance and I fail to > setup a generic message-id redirector as lore.kernel.org implements it. > That is a request to https://lore.kernel.org/r/message@id is redirected > to

generic message-id redirector

2021-02-01 Thread Uwe Kleine-König
Hello, I'm currently trying to get up a public-inbox instance and I fail to setup a generic message-id redirector as lore.kernel.org implements it. That is a request to https://lore.kernel.org/r/message@id is redirected to https://lore.kernel.org/somelist/message@id for a list "somelist" that has

Perl debug patches used to track down source of segfault

2021-02-01 Thread Eric Wong
Attached are two patches against the Debian-packaged perl 5.28.1-6+deb10u1 which I used for tracking down the attempt to access @DB::args of PublicInbox::Listener::event_step as the source of the segfault. I don't know Perl internals very well, and I was never an advanced gdb user when I hacked

[PATCH 11/21] lei: deep clone {ovv} for l2m workers

2021-02-01 Thread Eric Wong
We don't need to send the temporary xsearch {git} object over to workers, just the directory name. --- lib/PublicInbox/LEI.pm | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm index 4f7ed171..08554932 100644 ---

[PATCH 12/21] sharedkv: lock and explicitly disconnect {dbh}

2021-02-01 Thread Eric Wong
It may be possible for updates or changes to be uncommitted until disconnect, so we'll use flock() as we do elsewhere to avoid the polling retry behavior of SQLite. We also need to clear CachedKids before disconnecting to to avoid warnings like: ->disconnect invalidates 1 active statement

[PATCH 06/21] lei: remove syslog dependency

2021-02-01 Thread Eric Wong
It doesn't seem necessary now that we redirect and write stuff to errors.log, which gets checked every run. --- lib/PublicInbox/LEI.pm | 17 ++--- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm index 22cd20f6..c0b90451

[PATCH 21/21] doc: note optional BSD::Resource use

2021-02-01 Thread Eric Wong
We've actually been capable of using this since 2019(*) in our spawn code for PSGI limiters. And it's been used since 2016 in our tests. It's a dependency of SpamAssassin, and Danga::Socket used it, too. (*) commit 721368cd04bfbd03c0d9173fff633ae34f16409a ("spawn: support RLIMIT_CPU,

[PATCH 18/21] ds: guard against stack-not-refcounted quirk of Perl 5

2021-02-01 Thread Eric Wong
The Perl 5 stack is weakly-referenced for performance reasons. This means it's possible for items in the stack to be freed while executing further down the stack. In lei (and perhaps public-facing read-only daemons in the future), we'll fork and call PublicInbox::DS->Reset in the child process.

[PATCH 07/21] sharedkv: release {dbh} before rmtree

2021-02-01 Thread Eric Wong
This may be needed to avoid warnings/errors when operating in single process mode in the future. --- lib/PublicInbox/SharedKV.pm | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/SharedKV.pm b/lib/PublicInbox/SharedKV.pm index 94f2429f..f5d09cc1 100644

[PATCH 20/21] lei: avoid ETOOMANYREFS, cleanup imports

2021-02-01 Thread Eric Wong
As with PublicInbox::IPC, we'll attempt to bump RLIMIT_NOFILE and transparently workaround ETOOMANYREFS. If that fails, we'll give the user a hint to bump RLIMIT_NOFILE since ETOOMANYREFS is an uncommon error which users may be unfamiliar with. Found while stress testing for segfaults. ---

[PATCH 19/21] ds: next_tick: avoid $_ in top-level loop iterator

2021-02-01 Thread Eric Wong
$_ at the top of a potentially deep stack below may cause surprising behavior as I experienced with ExtSearchIdx. In the future, we'll limit our $_ usage to easily-auditable bits (e.g. map, grep, and small for loops) --- lib/PublicInbox/DS.pm | 8 1 file changed, 4 insertions(+), 4

[PATCH 17/21] import: reap git-config(1) synchronously

2021-02-01 Thread Eric Wong
This avoids a zombie if another step of the event loop takes too long. --- lib/PublicInbox/Import.pm | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm index 8a06a661..a070aa1e 100644 --- a/lib/PublicInbox/Import.pm +++

[PATCH 14/21] sharedkv: use lock_for_scope_fast

2021-02-01 Thread Eric Wong
This allows us to avoid repeated open() and close() syscalls and speeds up the new xt/stress-sharedkv.t maintainer test by roughly 7%. --- MANIFEST| 1 + lib/PublicInbox/Lock.pm | 17 + lib/PublicInbox/SharedKV.pm | 14 +-- xt/stress-sharedkv.t

[PATCH 16/21] sharedkv: do not set cache_size by default

2021-02-01 Thread Eric Wong
These DBs will probably be too small to be worth increasing the cache size of. --- lib/PublicInbox/SharedKV.pm | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/SharedKV.pm b/lib/PublicInbox/SharedKV.pm index b0588060..d65c3158 100644 ---

[PATCH 15/21] lei_to_mail: reduce spew on Maildir removal

2021-02-01 Thread Eric Wong
At most, we'll only warn once per worker when a Maildir disappears from under us. We'll also use the '!' OpPipe to note the exceptional condition, and use '|' to SIGPIPE so it'll be a bit easier for hackers to remember. --- lib/PublicInbox/LEI.pm| 8

[PATCH 13/21] lei: increase initial timeout

2021-02-01 Thread Eric Wong
PublicInbox::Listener unconditionally sets O_NONBLOCK upon accept(), so we need a larger timeout under heavy load since there's no "dataready" accept filter on the listener. With O_NONBLOCK already set, we don't have to set it at ->event_step_init --- lib/PublicInbox/LEI.pm | 7 ---

[PATCH 10/21] lei_xsearch: load PublicInbox::Smsg

2021-02-01 Thread Eric Wong
We use $smsg->populate here, so ensure it's loaded although PublicInbox::Search currently loads it. --- lib/PublicInbox/LeiXSearch.pm | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm index b4a9b89d..4d390ee4 100644 ---

[PATCH 09/21] lei_dedupe: use Digest::SHA

2021-02-01 Thread Eric Wong
While it's loaded by ContentHash, we use Digest::SHA directly in this package for smsg and OID-only deduplication. --- lib/PublicInbox/LeiDedupe.pm | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/PublicInbox/LeiDedupe.pm b/lib/PublicInbox/LeiDedupe.pm index e3ae8e33..55488376 100644 ---

[PATCH 08/21] lei: keep $lei around until workers are reaped

2021-02-01 Thread Eric Wong
This prevents SharedKV->DESTROY in lei-daemon from triggering before DB handles are closed in lei2mail processes. The {each_smsg_not_done} pipe was not sufficient in this case: that gets closed at the end of the last git_to_mail callback invocation. --- lib/PublicInbox/IPC.pm| 10

[PATCH 05/21] ipc: more helpful ETOOMANYREFS error messages

2021-02-01 Thread Eric Wong
ETOOMANYREFS is probably a unfamiliar error to most users, so give a hint about RLIMIT_NOFILE. This can be hit on my system running 3 simultaneous queries with my system default limit of 1024. There's also no need to import Errno constants for uncommon errors, so we'll stop using Errno, here.

[PATCH 04/21] lei: remove SIGPIPE handler

2021-02-01 Thread Eric Wong
It doesn't save us any code, and the action-at-a-distance element was making it confusing to track down actual problems. Another potential problem was keeping references alive too long. So do like we would a C100K server and check every write while still ensuring lei(1) exit with a proper SIGPIPE

[PATCH 03/21] lei: remove per-child SIG{__WARN__}

2021-02-01 Thread Eric Wong
The top-level $SIG{__WARN__} using $current_lei does the job, already. --- lib/PublicInbox/LEI.pm | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm index 3ed330f9..ceba16e4 100644 --- a/lib/PublicInbox/LEI.pm +++

[PATCH 02/21] ipc: switch wq to use the event loop

2021-02-01 Thread Eric Wong
This will let us to maximize the capability of our asynchronous git API. This lets us avoid relying on EOF to notify lei2mail workers; thus giving us the option of running fewer lei_xsearch worker processes in parallel than local sources. I tried using a synchronous git API; and even with

[PATCH 01/21] lei: more consistent dedupe and ovv_buf init

2021-02-01 Thread Eric Wong
This fixes "--dedupe none" with Maildir where we don't create the object at all. --- lib/PublicInbox/LeiDedupe.pm | 4 ++-- lib/PublicInbox/LeiOverview.pm | 18 ++ lib/PublicInbox/LeiToMail.pm | 3 +-- 3 files changed, 13 insertions(+), 12 deletions(-) diff --git