[PATCH 3/7] view: move '<' and '>' outside

2019-10-23 Thread Eric Wong
Browsers may underline '<' and '>' in links, which may be confused with '≤' and '≥'. So have the Message-ID header display follow what we do with In-Reply-To headers and move the "<" and ">" outside of in the HTML. --- lib/PublicInbox/View.pm | 18 +- t/psgi_v2.t | 2

[RFC 7/7] view: show X-Alt-Message-ID in permalink view, too

2019-10-23 Thread Eric Wong
Since we index X-Alt-Message-ID (because we need to placate some NNTP clients), we now display it as well, since that Message-ID could be the X-Alt-Message-ID that the reader is actually interested in. --- lib/PublicInbox/View.pm | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) di

[PATCH 1/7] search: support multiple From/To/Cc/Subject headers

2019-10-23 Thread Eric Wong
We can easily support searching on messages with multiple From/To/Cc/Subject headers just like we do with multiple Message-ID headers. This matches the normal mutt pager display behavior. --- lib/PublicInbox/SearchMsg.pm | 4 ++-- t/v2reindex.t| 16 2 files chang

[PATCH 2/7] view: display redundant headers in permalink

2019-10-23 Thread Eric Wong
Mail headers can contain multiple headers of any type, so ensure we don't hide any information we're getting in the per-message permalink views. This means it's possible to have multiple From, Date, To, Cc, Subject, and In-Reply-To headers displayed. The thread indices are a special case, I guess

[RFC 6/7] index: allow search/lookups on X-Alt-Message-ID

2019-10-23 Thread Eric Wong
Since we replace extra Message-ID headers with X-Alt-Message-ID to placate NNTP clients, we should allow searching and indexing on X-Alt-Message-ID just like we do with Message-ID. --- lib/PublicInbox/MID.pm | 27 +-- lib/PublicInbox/OverIdx.pm | 4 ++-- lib/Public

Re: RFC: monthly epochs for v2

2019-10-24 Thread Eric Wong
Konstantin Ryabitsev wrote: > Hi, all: > > With public-inbox now providing manifest files, it is easy to communicate to > mirroring services when an epoch rolls over. What do you think if we make > these roll-overs month-based instead of size-based. So, instead of: > > git/ > 0.git > 1.git >

[PATCH] HACKING: add a note about avoiding recursion

2019-10-24 Thread Eric Wong
Bad things happen when user data can control our stack size. --- Maybe there's programming languages where deep recursion isn't a problem, but they're not widely available on common GNU/Linux or *BSD systems. HACKING | 5 + 1 file changed, 5 insertions(+) diff --git a/HACKING b/HACKING

Re: RFC: monthly epochs for v2

2019-10-24 Thread Eric Wong
Konstantin Ryabitsev wrote: > On Thu, Oct 24, 2019 at 08:35:03PM +0000, Eric Wong wrote: > > > - if someone is only interested in a few months worth of archives, they > > > don't have to clone the entire collection > > > - similarly, someone using public-in

what should happen when mda sees multiple List-ID headers?

2019-10-24 Thread Eric Wong
Given my recent traumatic experience[*] around multiple From/To/Cc/Subject headers; I guess we should prepare for the possibility of multiple List-ID headers showing up in -mda. Right now, we handle the first one (and I'm updating -learn to support List-ID, too); but it's possible that multiple Li

Re: what should happen when mda sees multiple List-ID headers?

2019-10-24 Thread Eric Wong
"Eric W. Biederman" wrote: ... nothing? Just checked my mail server and it's not out of space and I'm not seeing any errors in logs. Anyways I'm offline for a bit and will be back (hopefully :x) -- unsubscribe: meta+unsubscr...@public-inbox.org archive: https://public-inbox.org/meta/

Re: what should happen when mda sees multiple List-ID headers?

2019-10-25 Thread Eric Wong
"Eric W. Biederman" wrote: > There are two reasonable things that can be done, and I suggest > we do them both. > - Print a warning. (To be deleted if this case turns out to be common). > - Deliver to all of the lists you have mailboxes for the List-IDs. Thanks, I was leaning towards delivering t

Re: RFC: monthly epochs for v2

2019-10-25 Thread Eric Wong
Eric Wong wrote: > Konstantin Ryabitsev wrote: > > On Thu, Oct 24, 2019 at 08:35:03PM +0000, Eric Wong wrote: > > > > - if someone is only interested in a few months worth of archives, they > > > > don't have to clone the entire collection > > > &g

Re: RFC: monthly epochs for v2

2019-10-25 Thread Eric Wong
Konstantin Ryabitsev wrote: > On Fri, Oct 25, 2019 at 12:22:14PM +0000, Eric Wong wrote: > > > I'm not sure about a libpublicinbox... I have been really > > > hesitant to depend on shared C/C++ libraries whenever I use Perl > > > or Ruby because of build and

[PATCH 00/14] learn: sync w/ -mda changes and add manpage

2019-10-28 Thread Eric Wong
What started with adding a manpage for public-inbox-learn, ended up being a bunch of fixes and improvements to catch up to -mda changes. -mda also learned to deal with multiple List-ID headers in the meantime. Eric Wong (14): learn: support multiple To/Cc headers learn: only map recipient

[PATCH 01/14] learn: support multiple To/Cc headers

2019-10-28 Thread Eric Wong
It's possible to specify these headers multiple times, and PublicInbox::MDA->precheck takes that into account, so -learn should, too. --- script/public-inbox-learn | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/script/public-inbox-learn b/script/public-inbox-learn inde

[PATCH 05/14] learn: hoist out remove_or_add subroutine

2019-10-28 Thread Eric Wong
We'll be reusing it for List-ID processing in the next commit. --- script/public-inbox-learn | 56 ++- 1 file changed, 31 insertions(+), 25 deletions(-) diff --git a/script/public-inbox-learn b/script/public-inbox-learn index 299f75a0..56739f88 100755 --- a/scr

[PATCH 02/14] learn: only map recipient list on "ham" or "rm"

2019-10-28 Thread Eric Wong
It's assumed that "spam" can end up anywhere due to Bcc:, so we need to scan every single inbox. However, "rm" is usually more targeted and and "ham" obviously only belongs in some inboxes. --- script/public-inbox-learn | 71 +++ 1 file changed, 35 insertions(+

[PATCH 03/14] learn: update usage statement

2019-10-28 Thread Eric Wong
Use since that seems to be the favored notation for required command args (taking a hint from git(1) manpage). While we're at it, remove the space after '<' for the redirect to match git.git coding style. --- script/public-inbox-learn | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) dif

[PATCH 07/14] filter/base: remove MAX_MID_SIZE constant

2019-10-28 Thread Eric Wong
We don't need it in the filter, here, since we have one in the MDA package. --- lib/PublicInbox/Filter/Base.pm | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/PublicInbox/Filter/Base.pm b/lib/PublicInbox/Filter/Base.pm index 052cd332..7a0c720f 100644 --- a/lib/PublicInbox/Filter/Base.pm +++

[PATCH 10/14] inboxwritable: add assert_usable_dir sub

2019-10-28 Thread Eric Wong
And use it for mda, since "0" could be a usable directory if somebody insists on using relative paths... --- lib/PublicInbox/InboxWritable.pm | 9 - lib/PublicInbox/V2Writable.pm| 5 ++--- script/public-inbox-mda | 4 +++- t/import.t | 8 t/v

[PATCH 13/14] learn: allow running without spamc

2019-10-28 Thread Eric Wong
It's possible that a user will want to disabe SpamAssassin via "publicinboxmda.spamcheck=none" in public-inbox-config(5) when injecting ham into an inbox. Fixes: 466df3e029fe ("mda: allow configuring globally without spamc support") --- script/public-inbox-learn | 8 +--- 1 file changed, 5 in

[PATCH 04/14] learn: GIT_COMMITTER_ may be "" or "0"

2019-10-28 Thread Eric Wong
Users may be zeroes or blanks. --- script/public-inbox-learn | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/script/public-inbox-learn b/script/public-inbox-learn index ad132985..299f75a0 100755 --- a/script/public-inbox-learn +++ b/script/public-inbox-learn @@ -65,8 +65,8

[PATCH 06/14] mda: hoist out List-ID handling and reuse in -learn

2019-10-28 Thread Eric Wong
It's now possible to inject false-positive ham into an inbox the same way -mda does via List-ID. --- lib/PublicInbox/MDA.pm| 15 +++ script/public-inbox-learn | 8 +++- script/public-inbox-mda | 5 + 3 files changed, 23 insertions(+), 5 deletions(-) mode change 100755

[PATCH 08/14] mda: hoist out mda_filter_adjust

2019-10-28 Thread Eric Wong
It makes it easier to document the default -mda behavior is stricter than normal, including "public-inbox-learn ham" --- script/public-inbox-mda | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/script/public-inbox-mda b/script/public-inbox-mda index 3ff318c9.

[PATCH 14/14] doc: add public-inbox-learn(1) manpage

2019-10-28 Thread Eric Wong
Tools intended for end users need manpages. --- Documentation/include.mk | 1 + Documentation/public-inbox-learn.pod | 86 MANIFEST | 1 + 3 files changed, 88 insertions(+) create mode 100644 Documentation/public-inbox-learn.p

[PATCH 12/14] mda: support multiple List-ID matches

2019-10-28 Thread Eric Wong
While it's not RFC2919-conformant, mail software can theoretically set multiple List-ID headers. Deliver to all inboxes which match a given List-ID since that's likely the intended. Cc: Eric W. Biederman Link: https://public-inbox.org/meta/87pniltscf@x220.int.ebiederm.org/ --- lib/PublicInb

[PATCH 11/14] mda: prepare for multiple destinations

2019-10-28 Thread Eric Wong
Multiple List-ID headers will be supported in the next commit --- script/public-inbox-mda | 92 - 1 file changed, 55 insertions(+), 37 deletions(-) diff --git a/script/public-inbox-mda b/script/public-inbox-mda index c122984f..821bd9cc 100755 --- a/script/p

[PATCH 09/14] mda: skip MIME parsing if spam

2019-10-28 Thread Eric Wong
We don't want to waste cycles parsing the message for MIME bits if it's spam. --- script/public-inbox-mda | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/script/public-inbox-mda b/script/public-inbox-mda index 71c5d937..69354616 100755 --- a/script/public-inbox-mda +++ b/script

Re: how's memory usage on public-inbox-httpd?

2019-10-28 Thread Eric Wong
Eric Wong wrote: > Cool, but 1GB is still an order of magnitude worse that what > I'd expect :< I remember Email::MIME had huge explosions with > some 30MB+ spam messages: > https://public-inbox.org/meta/20190609083918.gfr2kurah7f2hysx@dcvr/ > (maybe gmime can help

Re: I have figured out IMAP IDLE

2019-10-29 Thread Eric Wong
"Eric W. Biederman" wrote: > > A few days ago I stumbled upon this magic decoder ring for IMAP. > The "Ten Commandments of How to Write an IMAP client" > > https://www.washington.edu/imap/documentation/commndmt.txt > > The part I was most clearly missing was that for IMAP it is better to > open

Re: RFC: monthly epochs for v2

2019-10-29 Thread Eric Wong
Konstantin Ryabitsev wrote: > On Tue, Oct 29, 2019 at 10:03:43AM -0500, Eric W. Biederman wrote: > > So not monthly epochs. But it would be very handing to have a > > public-inbox command command that refreshes git mirrors. It would > > be even more awesome if there was something like the IMAP I

WWW::Curl [was: I have figured out IMAP IDLE]

2019-10-29 Thread Eric Wong
Eric Wong wrote: > WWW::Curl is also packaged for CentOS/RHEL7, so it should not be > tough to install. But still a pain to build from source, if need be: https://bugs.debian.org/843432 And it looks like upstream's been missing for a few years and distros have mostly ke

[PATCH] wwwlisting: fix spelling and clarify sub location

2019-10-30 Thread Eric Wong
Spell "Schwartzian" correctly, and clarify the location of "modified" since we have multiple subs named "modified" --- lib/PublicInbox/WwwListing.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm index c5e16eb2..035

[PATCH] search: add note about SCHEMA_VERSION 15

2019-10-30 Thread Eric Wong
--reindex has gotten better over the years, and having parallel Xapian DB directories would exceed all available disk space for some users with giant inboxes. --- lib/PublicInbox/Search.pm | 3 +++ 1 file changed, 3 insertions(+) diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm

Re: [PATCH 12/14] mda: support multiple List-ID matches

2019-10-30 Thread Eric Wong
"Eric W. Biederman" wrote: > Eric Wong writes: > > > While it's not RFC2919-conformant, mail software can > > theoretically set multiple List-ID headers. Deliver to all > > inboxes which match a given List-ID since that's likely the > > inten

[PATCH] msgiter: do not assume UTF-8 if Email::MIME->body_str succeeds

2019-10-30 Thread Eric Wong
ISO-2202-JP and other non-UTF-8 messages need to be displayed correctly. Fixes: 7d82a8bc04ce ('handle "multipart/mixed" messages which are not multipart') --- MANIFEST | 1 + lib/PublicInbox/MsgIter.pm | 3 ++- t/iso-2202-jp.mbox | 10 ++ t/msg_iter.t

[PATCH 1/2] msgiter: attempt to decode all text/* bodies

2019-10-30 Thread Eric Wong
We want to index text/x-patch and text/x-diff, at least, since "git format-patch" can generate a patch series as attachments using --attach. --- lib/PublicInbox/MsgIter.pm | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicI

[PATCH 0/2] attached patches and false-positive dfpost:

2019-10-30 Thread Eric Wong
Once reindexed, all the patches attached at https://public-inbox.org/git/b9fb52b8-8168-6bf0-9a72-1e6c44a28...@oracle.com/ should be "solvable". Eric Wong (2): msgiter: attempt to decode all text/* bodies solvergit: deal with false-positive dfpost: results lib/PublicInbox/MsgIter

[PATCH 2/2] solvergit: deal with false-positive dfpost: results

2019-10-30 Thread Eric Wong
When solving for blob 81c1164ae5 in https://public-inbox.org/git/, at least two messages get indexed with the dfpost result for that blob (after fixing MsgIter to decode all text/* parts): 1. https://public-inbox.org/git/b9fb52b8-8168-6bf0-9a72-1e6c44a28...@oracle.com/ 2. https://public-inbox.org/

[PATCH] qspawn: psgi_qx: delay callback until waitpid returns

2019-10-30 Thread Eric Wong
We need to detect "git apply" failures reliably when patches fail. This is necessary for solving for blob 81c1164ae5 in https://public-inbox.org/git/ when at least two messages can solve for it (and one of them fails): 1. https://public-inbox.org/git/b9fb52b8-8168-6bf0-9a72-1e6c44a28...@oracle.co

[PATCH 0/2] CSS and source highlighting fixes

2019-10-31 Thread Eric Wong
We'll only be supporting highlight.pm this release, so at least try to do it right for people not using dark color schemes. My eyes are tired now after attempting to use dillo and light colors :< Eric Wong (2): contrib/css/216light: improve contrast a bit hval: replace "&#

[PATCH 2/2] hval: replace "'" with "'" for compatibility

2019-10-31 Thread Eric Wong
While testing 216light.css changes, I managed to hit some cases where dillo failed to render ' correctly, but I also can't reproduce it reliably. Anyways, it's definitely a problem with some old browsers and newer versions of highlight already work around it, but Debian 10.x has 3.41, so use "'" t

[PATCH 1/2] contrib/css/216light: improve contrast a bit

2019-10-31 Thread Eric Wong
"#ff0" foreground on a "#fff" background is just too difficult to distinguish, among other things. So choose slightly darker colors when using a (painful) "#fff" background. --- contrib/css/216light.css | 24 1 file changed, 24 insertions(+) diff --git a/contrib/css/216

[PATCH] doc: add public-inbox-purge(1) manpage

2019-11-02 Thread Eric Wong
Tools intended for end users need manpages, and doubly so to convince potential users NOT to use them :) --- Documentation/include.mk | 1 + Documentation/public-inbox-purge.pod | 69 MANIFEST | 1 + 3 files changed, 71 inserti

[PATCH] doc: add public-inbox.cgi(1) manpage

2019-11-02 Thread Eric Wong
Yet another case of documenting things which should NOT be used :> --- Documentation/.gitignore | 1 + Documentation/include.mk | 1 + Documentation/public-inbox.cgi.pod | 34 ++ MANIFEST | 1 + 4 files changed, 37 insert

[PATCH] TODO: update item for multiple Date: headers

2019-11-02 Thread Eric Wong
That's the only head-scratcher of the bunch remaining, since that relies on ranges. --- TODO | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/TODO b/TODO index f9122a5d..922163f8 100644 --- a/TODO +++ b/TODO @@ -137,5 +137,5 @@ all need to be considered for everything we int

[PATCH] doc: mknews: force plain-text NEWS to UTF-8

2019-11-02 Thread Eric Wong
We'll have non-7-bit ASCII in the 1.2.0 release notes. --- Documentation/mknews.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl index 509da3b1..e78c1cb8 100755 --- a/Documentation/mknews.perl +++ b/Documentation/mknews.

[PATCH] build: add "git-dist" target for making gzipped tarballs

2019-11-02 Thread Eric Wong
Since MANIFEST is tied to files tracked by git, adding generated files such as NEWS to that is more effort than its worth (esp. when I'm wondering if MakeMaker is useful compared to only using GNU make). I also have trouble reading CamelCase, so lower-case names are nicer and more consistent with

[ANNOUNCE] public-inbox 1.2.0

2019-11-02 Thread Eric Wong
* first non-pre/rc release with v2 format support for scalability. See public-inbox-v2-format(5) manpage for more details. * new admin tools for v2 inboxes: - public-inbox-convert - converts v1 to v2 repo formats - public-inbox-compact - v2 convenience wrapper for xapian-compact(1) - publi

[PATCH] public-inbox v1.2.0

2019-11-02 Thread Eric Wong
cumentation/RelNotes/v1.2.0.eml similarity index 91% rename from Documentation/RelNotes/v1.2.0.wip rename to Documentation/RelNotes/v1.2.0.eml index 8df3e4f9..2eeb0de0 100644 --- a/Documentation/RelNotes/v1.2.0.wip +++ b/Documentation/RelNotes/v1.2.0.eml @@ -1,5 +1,8 @@ +From: Eric Wong To: meta@publi

[PATCH] searchidxshard: reuse $SIG{__WARN__} callback from Admin

2019-11-02 Thread Eric Wong
We don't want to define $SIG{__WARN__} in the worker to call an existing non-default callback. Instead update ->{current_info} the same way the V2Writable master process does. I noticed this while reindexing with a large XAPIAN_FLUSH_THRESHOLD and seeing a the wrong epoch on my terminal from a sh

HTTP::Date replacing Date::Parse [was: TODO: remove done items, add some more]

2019-11-03 Thread Eric Wong
> +* consider using HTTP::Date instead of Date::Parse, since we need the > + former is capable of parsing RFC822-ish dates, used by Plack, and > + the latter is missing from OpenBSD and maybe other distros. Ugh, HTTP::Date will try to use Time::Zone under the hood if available for non-conformant

[PATCH] index: "git log" failures are fatal

2019-11-03 Thread Eric Wong
While I've never seen "git log" fail on its own, it could happen one day and we should be prepared to abort indexing when it happens. Beef up tests for t/spawn.t to ensure close() behaves on popen_rd the way we expect it to. --- lib/PublicInbox/SearchIdx.pm | 8 ++-- lib/PublicInbox/V2Writab

[PATCH] t/httpd-corner.t: check for curl(1) errors in big async test

2019-11-03 Thread Eric Wong
curl(1) can fail and we need to invalidate the test in the rare case it fails. --- t/httpd-corner.t | 1 + 1 file changed, 1 insertion(+) diff --git a/t/httpd-corner.t b/t/httpd-corner.t index 4077a6d1..fbba4623 100644 --- a/t/httpd-corner.t +++ b/t/httpd-corner.t @@ -272,6 +272,7 @@ SKIP: {

[PATCH 0/5] tiny test overhead reductions

2019-11-03 Thread Eric Wong
from most tests (and explicitly test it in one place) since we don't need to repeat ourselves. Eric Wong (5): t/*.t: remove IPC::Run dependency for git commands t/httpd-corner.t: drop unnecessary bytes:: for length() t/httpd-corner.t: get rid of IPC::Run for running curl t/hl_mod.t: r

[PATCH 3/5] t/httpd-corner.t: get rid of IPC::Run for running curl

2019-11-03 Thread Eric Wong
We already load PublicInbox::Spawn, so there's no need to add another dependency to make life difficult for potential contributors. --- t/httpd-corner.t | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/t/httpd-corner.t b/t/httpd-corner.t index 50aa28e3..1

[PATCH 2/5] t/httpd-corner.t: drop unnecessary bytes:: for length()

2019-11-03 Thread Eric Wong
We don't need to force byte semantics for a buffer we clearly create (via ->read) with byte semantics. Since we didn't "use bytes" in t/httpd-corner.t, it was inadvertantly made available by IPC::Run (which goes away, next). --- t/httpd-corner.t | 2 +- 1 file changed, 1 insertion(+), 1 deletion(

[PATCH 1/5] t/*.t: remove IPC::Run dependency for git commands

2019-11-03 Thread Eric Wong
One small step towards making tests easier-to-run. We can rely on "local $ENV{GIT_DIR}" for potentially shell-unsafe path names, and the rest of our path names are relative and don't contain characters which require escaping. --- t/git.t | 12 ++-- t/www_listing.t | 11 ++-

[PATCH 5/5] t/*.t: disable nntpd/httpd worker processes in most tests

2019-11-03 Thread Eric Wong
And explicitly test for respawning in t/httpd-corner.t There's no need to have an extra entries in the process table for most tests we run, since that's not what we're testing. --- t/httpd-corner.psgi | 3 +++ t/httpd-corner.t| 21 + t/httpd.t | 2 +- t/v2mirro

[PATCH 4/5] t/hl_mod.t: remove IPC::Run (and File::Temp) dependency

2019-11-03 Thread Eric Wong
We already load PublicInbox::Spawn for which(), so using spawn() isn't unreasonable. And rely on "skip" to log the omitted test if w3m is missing, which means we need to update the "&&" escaping test to be self-referential on the same line. File::Temp was totally unused, there; and we can use "op

[PATCH] tests: rely on PublicInbox::Git for pathname safety

2019-11-04 Thread Eric Wong
It's possible (but unlikely) a user will put spaces in TMPDIR and cause File::Temp::tempdir() to return a temporary directory with spaces in the filename, making it unsafe for shell expansion. PublicInbox::Git didn't exist when t/mda.t was written, and I just forgot about PublicInbox::Git->qx for

[PATCH 2/1] t/edit: use PublicInbox::Git::qx for pathname safety

2019-11-04 Thread Eric Wong
Another case where spaces can be in TMPDIR and cause shell expansion with `command` to fail. --- t/edit.t | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/t/edit.t b/t/edit.t index 1e9597f1..5cb66a65 100644 --- a/t/edit.t +++ b/t/edit.t @@ -41,7 +41,7 @@ my $mid =

[PATCH] doc: actually document publicinboxwatch.watchspam

2019-11-07 Thread Eric Wong
Instead of copy-pasting the documentation for `spamcheck'. --- Documentation/public-inbox-config.pod | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod index 1c5ba01..a5275a4 100644 --

[PATCH 0/2] edit: minor bug fixes

2019-11-08 Thread Eric Wong
A few minor bugfixes I noticed while working on other stuff... Eric Wong (2): edit: propagate correct editor exit code edit: check for write errors writing "From_" line script/public-inbox-edit | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) -- unsubscribe: met

[PATCH 1/2] edit: propagate correct editor exit code

2019-11-08 Thread Eric Wong
exit($?) is never correct, since ($? >> 8) is needed to extract the correct exit code, as other information (e.g. such as signal) is encoded in $? in addition to the exit code. --- script/public-inbox-edit | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/script/public-inbox

[PATCH 2/2] edit: check for write errors writing "From_" line

2019-11-08 Thread Eric Wong
We need to check every print to a regular file for errors, because storage devices inevitably fail. --- script/public-inbox-edit | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/script/public-inbox-edit b/script/public-inbox-edit index 24b7ed8b..43ce9900 100755 --- a/script/pu

Re: [PATCH] doc: drop a repeated word

2019-11-09 Thread Eric Wong
Applied, thanks :> -- unsubscribe: meta+unsubscr...@public-inbox.org archive: https://public-inbox.org/meta/

Re: Archiving HTML mail

2019-11-12 Thread Eric Wong
Florian Weimer wrote: > New contributors tend to send text/html. We are currently rejecting > such email, which is proving more and more problematic. I think a > change would be easier to justify if I can show that this will not > break our mailing list archives (in the sense that they become >

Re: Archiving HTML mail

2019-11-12 Thread Eric Wong
Florian Weimer wrote: > * Eric Wong: > > text/html is currently shown inline as raw HTML since > > https://public-inbox.org/meta/20191031031220.21048-...@80x24.org/ > > But maybe the HTML part shouldn't be shown inline at all in > > multiparts parents. > >

Re: Archiving HTML mail

2019-11-12 Thread Eric Wong
Florian Weimer wrote: > * Eric Wong: > > >> My feeling is that it would need some post-processing, maybe stripping > >> image links and forms (and Javascript of course). Plus the separate > >> domain thing for additional XSS protection (like bugzilla.m

Message-ID awareness on the rise \o/

2019-11-12 Thread Eric Wong
So since May 2019, I've seen some GoogleGroups-appended mailing list signatures with the typical unsubscribe/optout info, and also an archive link in the form of: https://groups.google.com/d/msgid/$GROUP_NAME/$MESSAGE_ID Maybe public-inbox had a role in influencing that? who knows... -- u

Re: Archiving HTML mail

2019-11-12 Thread Eric Wong
Konstantin Ryabitsev wrote: > On Tue, Nov 12, 2019 at 10:29:32PM +0000, Eric Wong wrote: > > > You have to rewrite the HTML parts anyway, to resolve RFC 2392 cid: > > > links, prior to handing them to web browsers. I don't think web > > > browsers support the

[PATCH] xapcmd: localize %SIG changes using "local"

2019-11-12 Thread Eric Wong
Perl's "local" allows changes to %SIG (and %ENV) to be limited to its enclosing block. This allows us to get rid of a global variable and ad-hoc method for restoring signal handlers. --- lib/PublicInbox/Xapcmd.pm | 31 --- 1 file changed, 12 insertions(+), 19 deletions

[PATCH] solvergit: use --unidiff-zero with git-apply(1)

2019-11-13 Thread Eric Wong
I sometimes post context-free documentation patches generated with "-U0" to reduce size and bandwidth overhead when replacing URLs or updating copyright notices. git-apply(1) needs the --unidiff-zero switch to work properly with context-free patches. Given our search looks for blob OIDs, and we'r

[PATCH] inboxwritable: drop {-importer} cyclic reference

2019-11-13 Thread Eric Wong
InboxWritable caching the result of ->importer leads to a circular references with returned (V2Writable|Import) object holds onto the calling InboxWritable object. With public-inbox-watch, this leads to a memory leak if a user is reloading via SIGHUP after a message is imported (it would only beco

[PATCH 1/4] t/common: inline stream_to_string into t/feed.t

2019-11-13 Thread Eric Wong
We only use it in one place and have favored test_psgi in newer tests, so move it out-of-the-way to reduce startup overhead of other *.t files. --- t/common.perl | 11 --- t/feed.t | 9 - 2 files changed, 8 insertions(+), 12 deletions(-) diff --git a/t/common.perl b/t/common

[PATCH 4/4] doc: check-man: save the result of successful runs

2019-11-13 Thread Eric Wong
We can keep a stamp around if the corresponding manpage hasn't changed to avoid re-running man(1) and awk(1). --- .gitignore | 1 + Documentation/include.mk | 13 +++-- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/.gitignore b/.gitignore index 66f165e2..167

[PATCH 0/4] some minor test updates

2019-11-13 Thread Eric Wong
A few minor, low impact improvements to avoid loading and running redundant code while I'm ironing out some major test improvements. Eric Wong (4): t/common: inline stream_to_string into t/feed.t t/common: move unix_server to t/httpd-corner.t t/psgi_mount: require SearchIdx before

[PATCH 2/4] t/common: move unix_server to t/httpd-corner.t

2019-11-13 Thread Eric Wong
unix_server() is not commonly used, only t/httpd-corner.t uses it and most HTTP tests use TCP since most HTTP libraries only support TCP. --- t/common.perl| 10 -- t/httpd-corner.t | 10 ++ 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/t/common.perl b/t/commo

[PATCH 3/4] t/psgi_mount: require SearchIdx before using

2019-11-13 Thread Eric Wong
We may not implicitly load it via other means in the future. --- t/psgi_mount.t | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/t/psgi_mount.t b/t/psgi_mount.t index aa7c863f..7de2bc0e 100644 --- a/t/psgi_mount.t +++ b/t/psgi_mount.t @@ -90,9 +90,10 @@ test_psgi($app, sub {

[PATCH] convert: remove duplicated GetOptions() call

2019-11-14 Thread Eric Wong
We only need to parse the command-line once. --- script/public-inbox-convert | 1 - 1 file changed, 1 deletion(-) diff --git a/script/public-inbox-convert b/script/public-inbox-convert index 3182410e..9bee5e7a 100755 --- a/script/public-inbox-convert +++ b/script/public-inbox-convert @@ -20,7 +20

[PATCH] doc: mknews: support Email::MIME <1.930

2019-11-14 Thread Eric Wong
Email::MIME::header_str is not available until 1.930, so the rest of our code uses Email::MIME::header for compatibility with distros, since CentOS 7.x only has 1.926. --- Documentation/mknews.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/mknews.perl b/Docu

[PATCH 04/29] index: pass global variables into subs

2019-11-15 Thread Eric Wong
Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub. --- script/public-inbox-index | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/script/public-inbox-index b/script/public-inbox-index index 139b6e56..102381c3 100755 --- a/s

[PATCH 03/29] admin: get rid of singleton $CFG var

2019-11-15 Thread Eric Wong
PublicInbox::Admin::config() just adds an extra layer of indirection which we barely rely on. So get rid of this global variable and make it easier to run tests in the future without relying on global state. --- lib/PublicInbox/Admin.pm | 9 +++-- script/public-inbox-edit | 7 +++ 2 files

[PATCH 01/29] edit: pass global variables into subs

2019-11-15 Thread Eric Wong
Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub. --- script/public-inbox-edit | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/script/public-inbox-edit b/script/public-inbox-edit index 43ce9900..c9884053 1

[PATCH 05/29] init: pass global variables into subs

2019-11-15 Thread Eric Wong
Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub. We also need to rely on ->DESTROY instead of END{} to unlink the lock file on sub exit. --- script/public-inbox-init | 27 ++- 1 file changed, 22 insertions(+), 5 delet

[PATCH 00/29] speed up tests by preloading

2019-11-15 Thread Eric Wong
references and minor bugs were found (and fixed) in preparation for this. Most of the changes were to explicitly pass global variables into subs to avoid warnings. TEST_RUN_MODE=0 can be set in the environment to restore real-world behavior with (v)fork && execve. Eric Wong (29):

[PATCH 14/29] t/init: convert to using run_script

2019-11-15 Thread Eric Wong
This gives a 2-3x speedup on the test with the default run_mode=1. --- t/init.t | 85 1 file changed, 37 insertions(+), 48 deletions(-) diff --git a/t/init.t b/t/init.t index 0cd6f31f..2442eeec 100644 --- a/t/init.t +++ b/t/init.t @@ -7,55

[PATCH 07/29] learn: pass global variables into subs

2019-11-15 Thread Eric Wong
Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub. --- script/public-inbox-learn | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/script/public-inbox-learn b/script/public-inbox-learn index 3073294a..93aece2e 100644 -

[PATCH 09/29] import: only pass Inbox object to SearchIdx->new

2019-11-15 Thread Eric Wong
SearchIdx->new no longer accepts a GIT_DIR path as its argument since commit 585314673236d664729fe3ab2d4fb229d1c0f2d5 ("searchidx: require PublicInbox::Inbox (or InboxWritable) ref") --- lib/PublicInbox/Import.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/PublicInb

[PATCH 11/29] spawn: which: allow embedded slash for relative path

2019-11-15 Thread Eric Wong
This makes the subroutine behave more like which(1) command and will make using spawn() in tests easier. --- lib/PublicInbox/Spawn.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm index e2868a55..b946a663 100644 --- a/lib/Pu

[PATCH 13/29] t/edit: switch to use run_script

2019-11-15 Thread Eric Wong
Perl parsing is slow, and run_script default behavior allows this to speed up t/edit.t by over 100% in my case. --- t/edit.t | 65 1 file changed, 32 insertions(+), 33 deletions(-) diff --git a/t/edit.t b/t/edit.t index 5cb66a65..09e0cddd 1

[PATCH 02/29] edit: use OO API of File::Temp to shorten lifetime

2019-11-15 Thread Eric Wong
Instead of relying on END{} blocks, rely on ->DESTROY so the temporary files go out-of-scope and system resources get released, sooner. --- script/public-inbox-edit | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/script/public-inbox-edit b/script/public-inbox-edit

[PATCH 15/29] t/purge: convert to run_script

2019-11-15 Thread Eric Wong
This nets us another sizeable speedup. --- t/purge.t | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/t/purge.t b/t/purge.t index 67c4e58d..bcdbad52 100644 --- a/t/purge.t +++ b/t/purge.t @@ -6,12 +6,12 @@ use Test::More; use File::Temp qw/tempdir/; requ

[PATCH 12/29] t/common: introduce run_script wrapper for t/cgi.t

2019-11-15 Thread Eric Wong
This will give us a consistent interface for running test scripts in more performant ways while still giving us a consistent interface to recreate real-world behavior via spawn() (fork + execve), if needed. The default run_mode (1) is faster and can run within the test process with some minor adju

[PATCH 10/29] xapcmd: do not fire END and DESTROY handlers in child

2019-11-15 Thread Eric Wong
We need to bypass whatever Test::More does with END/DESTROY handlers for use in lon-lived process. This doesn't affect any of our normal code since we don't use END/DESTROY for Xapcmd and its callers. --- lib/PublicInbox/Xapcmd.pm | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --gi

[PATCH 08/29] inboxwritable: add ->cleanup method

2019-11-15 Thread Eric Wong
We've been using this in -edit, and will be using it in some more scripts and tests to optimize for run_mode=2 with run_script. Keeping this in the *Writable modules since I don't see it being useful for the WWW and NNTP read-only interfaces which use PublicInbox::Inbox. --- lib/PublicInbox/Inbox

[PATCH 16/29] t/v2mirror: get rid of IPC::Run dependency

2019-11-15 Thread Eric Wong
Not taking advantage of faster run modes in run_script, yet since some lifetime problems need to be sorted. --- t/v2mirror.t | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/t/v2mirror.t b/t/v2mirror.t index f826775c..3c238093 100644 --- a/t/v2mirror.

[PATCH 06/29] mda: pass global variables into subs

2019-11-15 Thread Eric Wong
Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub. --- script/public-inbox-mda | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/script/public-inbox-mda b/script/public-inbox-mda index dca8a0ea..9da2d90f 100755 --

[PATCH 20/29] t/v2mirror: switch to default run_mode for speedup

2019-11-15 Thread Eric Wong
We need to be careful and explicitly close FDs before doing -index, since we can't rely on FD_CLOEXEC without execve(2) syscalls. --- t/v2mirror.t | 27 +++ 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/t/v2mirror.t b/t/v2mirror.t index 3c238093..2c7f6a84 1

<    1   2   3   4   5   6   7   8   9   10   >