[PATCH] index: v2: parallelize if --reindex or --jobs is specified

2020-05-16 Thread Eric Wong
`--reindex' involves chomping down lots of mail, so it benefits from parallelization just like the initial indexing. It's also a bit surprising to specify `--jobs/-j' without parallel processes, so ensure we turn on parallelization there, too. We can simplify initialization here, as well, since n

[PATCH] index: add --batch-size=SIZE option

2020-05-17 Thread Eric Wong
On powerful systems, having this option is preferable to XAPIAN_FLUSH_THRESHOLD due to lock granularity and contention with other processes (-learn, -mda, -watch). Setting XAPIAN_FLUSH_THRESHOLD can cause -learn, -mda, and -watch to get stuck until an epoch is completely processed. --- Documentat

[PATCH] favor readline() and print() as functions

2020-05-17 Thread Eric Wong
In our inbox-writing code paths, ->getline as an OO method may be confused with the various definitions of `getline' used by the PSGI interface. It's also easier to do: "perldoc -f readline" than to figure out which class "->getline" belongs to (IO::Handle) and lookup documentation for that. ->pr

Re: [PATCH] index: add --batch-size=SIZE option

2020-05-17 Thread Eric Wong
Kyle Meyer wrote: > Eric Wong writes: > > > +Increase this value on powerful systems improve throughput at > > +the expense of memory use. The reduction of lock granularity > > I think this is missing a "to" in front of "improve". Thanks, fix queued

Re: [PATCH] index: add --batch-size=SIZE option

2020-05-17 Thread Eric Wong
Eric Wong wrote: > On powerful systems, having this option is preferable to > XAPIAN_FLUSH_THRESHOLD due to lock granularity and contention > with other processes (-learn, -mda, -watch). > > Setting XAPIAN_FLUSH_THRESHOLD can cause -learn, -mda, and > -watch to get stuck

MUA and client-side limits w.r.t. IMAP

2020-05-19 Thread Eric Wong
TL;DR: is 50K messages a reasonable IMAP folder size? So far, the IMAP daemon code shares quite a bit of data and logic with the NNTP code. NNTP article numbers are reusable as IMAP UIDs; and for small inboxes, newsgroups can be an IMAP folder. But that only works for small inboxes... One major

[PATCH] scripts/import_*: remove PublicInbox::MIME usage

2020-05-19 Thread Eric Wong
These aren't really supported and will probably be replaced with better tools, but PublicInbox::Eml should be readily available to anybody who already has our source tree. --- scripts/import_maildir | 7 +++ scripts/import_slrnspool | 4 ++-- 2 files changed, 5 insertions(+), 6 deletions(-)

[PATCH] spamcheck/spamc: use localized slurp to read from spamc

2020-05-19 Thread Eric Wong
The perlop, `readline', and `read' functions will all retry on EINTR, so there's no need to retry and loop ourselves with `sysread'. --- lib/PublicInbox/Spamcheck/Spamc.pm | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/lib/PublicInbox/Spamcheck/Spamc.pm b/lib/Publi

[PATCH] t/edit: use eml_load here, too

2020-05-19 Thread Eric Wong
I missed this instance of file slurping into an Email::MIME-like object the other week when tearing Email::MIME usage out. --- t/edit.t | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/t/edit.t b/t/edit.t index 1a5698f6..4b004c1c 100644 --- a/t/edit.t +++ b/t/edit.t @@ -24

[PATCH] convert: describe the release of fast-import pipes

2020-05-20 Thread Eric Wong
Upon rereading the code, it wasn't immediately obvious to me why we didn't check for errors with `close($w)' instead of relying on `undef'. So add a comment for the benefit of future readers. --- script/public-inbox-convert | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scrip

[PATCH] t/eml.t: favor ->header over ->header_str

2020-05-20 Thread Eric Wong
This test may still run against ancient versions of Email::MIME for comparisons. --- t/eml.t | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/eml.t b/t/eml.t index 1892b001..a2479f6f 100644 --- a/t/eml.t +++ b/t/eml.t @@ -80,7 +80,7 @@ for my $cls (@classes) { $eml->he

[PATCH] spawn: fix compatibility with old Inline::C

2020-05-20 Thread Eric Wong
Older versions of Inline (e.g. 0.53 in CentOS 7) did not accept the `directory' parameter, so use conditional assignment to set a default value on $ENV{PERL_INLINE_DIRECTORY}, instead. --- lib/PublicInbox/Spawn.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/PublicIn

[PATCH] inboxidle: new class to detect inbox changes

2020-05-21 Thread Eric Wong
This will be used to implement IMAP IDLE, first. Eventually, it may be used to trigger other things: * incremental internal updates for manifest.js.gz * restart `git cat-file' processes on pack index unlink * IMAP IDLE-like long-polling HTTP endpoint And maybe more things we haven't thought of,

Re: [PATCH] inboxidle: new class to detect inbox changes

2020-05-21 Thread Eric Wong
Eric Wong wrote: > --- a/lib/PublicInbox/Inbox.pm Naming is (still) hard. > +# $obj must respond to >inbox_changed, which takes Inbox ($self) as an arg ^ That should be ->on_inbox_unlock > +sub subscribe_unlo

[PATCH] v2writable: only load Xapian when a shard is found

2020-05-23 Thread Eric Wong
We don't need to load Xapian until we have a directory which looks like a shard, otherwise we're wasting cycles on memory when running short-lived processes. --- lib/PublicInbox/V2Writable.pm | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm

[PATCH] msgmap: tmp_clone: use in-memory journal

2020-05-24 Thread Eric Wong
This prevents $TMPDIR from being littered with *-journal files after running the test suite. This shouldn't cause excessive memory use since $v2w->{mm_tmp} doesn't see big transactions. There's no need to worry about data loss, here,either, since this is just a temporary clone we've even disabled

[PATCH] view: do not offer links to 0-byte multipart attachments

2020-05-25 Thread Eric Wong
Offering links to download 0-byte files is useless. We could waste memory by preserving $eml->{bdy} during iteration, but offering attachments of type "multipart" is not very useful, as users are usually interested in decoded attachments or the entire raw message. Fixes: e60231148eb604a3 ("descen

[PATCH 1/2] learn: fix buggy typo on List-ID mapping

2020-05-26 Thread Eric Wong
There is obviously a typo here, so fix it and add a test case to guard against future regressions. Fixes: 74a3206babe0572a ("mda: support multiple List-ID matches") --- script/public-inbox-learn | 2 +- t/mda.t | 10 +- 2 files changed, 10 insertions(+), 2 deletions(-)

[PATCH 2/2] learn: support --all with `rm'

2020-05-26 Thread Eric Wong
I found myself wanting to remove a message from all inboxes while working on a test case in another branch. I figure this could also be useful for globally removing messages which are in the grey area or too big for spamc. --- Documentation/public-inbox-learn.pod | 8 ++-- script/public-inbox

[PATCH 0/2] -learn fixes and updates

2020-05-26 Thread Eric Wong
I noticed -learn was lacking in some areas while working on other stuff. Maybe more consistency improvements with other CLI tools coming... Eric Wong (2): learn: fix buggy typo on List-ID mapping learn: support --all with `rm' Documentation/public-inbox-learn.pod | 8 ++-- s

Re: Search based on data in follow-ups

2020-05-26 Thread Eric Wong
Konstantin Ryabitsev wrote: > Hello: > > I suspect this would be Pretty Hard To Do, but wanted to mention it on > the list anyway, just as a "musing out loud." It would be cool to be > able to exclude/include results based on conditions in thread > follow-ups. E.g.: Yup, I've wanted something

[PATCH] treat $INBOX_DIR/description and gitweb.owner as UTF-8

2020-05-28 Thread Eric Wong
Julien Moutinho wrote: > public-inbox-httpd does not output $INBOX_DIR/description > using the expected Unicode code points. thanks for the bug report. Below is a patch + tests which should fix the bug. > Debugging > - > This may be due to using: ascii_html($ibx->description); Nope, a

[PATCH] testcommon: speed up wait_for_tail() on GNU/Linux

2020-05-30 Thread Eric Wong
Somewhat recent versions of GNU tail(1) use inotify(7) on Linux; so don't penalize hackers using TAIL='tail -F' to run their tests with extra delays. Ironically, we still need to busy loop on /proc/$TAIL_PID/{fd,fdinfo} since inotify doesn't seem to support procfs. --- lib/PublicInbox/TestCommon.

[PATCH] wwwlisting: utf8::decode before undef

2020-05-31 Thread Eric Wong
Assisted by commit a73957b5b05f2a00f7a85353b1658b6d8cde05ae ("testcommon: speed up wait_for_tail() on GNU/Linux") Fixes: 846161e3d1207d59 ("treat $INBOX_DIR/description and gitweb.owner as UTF-8") --- lib/PublicInbox/WwwListing.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff -

[PATCH 09/13] nntp: smsg_range_i: favor ->{$field} lookups when possible

2020-06-01 Thread Eric Wong
PublicInbox::Smsg::date remains the only exception which requires any subroutine calls, here, so we'll just have a branch just for that. --- lib/PublicInbox/NNTP.pm | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm i

[PATCH 04/13] import: modernize to use Perl 5.10 features

2020-06-01 Thread Eric Wong
First, prefer the leaner "parent" module over the heavy "base" module to establish ISA relationships, since "base" is only needed for "fields". The "//" and "//=" operators allow us simplify our code and fix minor bugs where a value of "0" was disallowed. Yes, we'll allow "0" as an email address,

[PATCH 02/13] wwwatomstream: convert callers to use smsg_eml

2020-06-01 Thread Eric Wong
We can simplify WwwAtomStream callbacks by performing ->smsg_eml calls in the `feed_entry' sub itself. This simplifies callers, by reducing the number of places which can load an Eml object into memory. --- lib/PublicInbox/Feed.pm | 2 +- lib/PublicInbox/SearchView.pm| 2 +- lib/Publ

[PATCH 08/13] www: remove smsg_mime API and adjust callers

2020-06-01 Thread Eric Wong
To further simplify callers and avoid embarrasing memory explosions[1], we can finally eliminate this method in favor of smsg_eml. [1] commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5 ("view: stop storing all MIME objects on large threads") fixed a huge memory blowup. --- Documentation/mkn

[PATCH 11/13] smsg: remove ->bytes and ->lines methods

2020-06-01 Thread Eric Wong
They're stored directly in Xapian and SQLite document data. NNTP accesses those fields directly to avoid method invocation overhead so there's no reason to waste several kilobytes for each sub. --- lib/PublicInbox/Smsg.pm | 4 1 file changed, 4 deletions(-) diff --git a/lib/PublicInbox/Smsg.

[PATCH 01/13] inbox: introduce smsg_eml method

2020-06-01 Thread Eric Wong
The goal of this is to eventually remove the $smsg->{mime} field which is easy-to-misuse and cause memory explosions which necessitated fixes like commit 7d02b9e64455831d ("view: stop storing all MIME objects on large threads"). --- lib/PublicInbox/Inbox.pm | 6 ++ lib/PublicInbox/SolverGi

[PATCH 00/13] smsg: remove tricky {mime} field

2020-06-01 Thread Eric Wong
storing all MIME objects on large threads") Eric Wong (13): inbox: introduce smsg_eml method wwwatomstream: convert callers to use smsg_eml v2writable: fix non-sensical interpolation in BUG message import: modernize to use Perl 5.10 features smsg: introduce ->populate method smsg

[PATCH 10/13] smsg: get rid of remaining {mime} users

2020-06-01 Thread Eric Wong
We'll let $smsg->populate take care of everything all at once without hanging onto the header object for too long. --- lib/PublicInbox/OverIdx.pm | 1 - lib/PublicInbox/Smsg.pm| 33 +++-- 2 files changed, 3 insertions(+), 31 deletions(-) diff --git a/lib/PublicInb

[PATCH 12/13] smsg: remove remaining accessor methods

2020-06-01 Thread Eric Wong
We'll continue to favor simpler data models that can be used directly rather than wasting time and memory with accessor APIs. The ->from, ->to, -cc, ->mid, ->subject, >references methods can all be trivially replaced by hash lookups since all their values are stored in doc_data. Most remaining ca

[PATCH 13/13] wwwatomstream: drop smsg->{mid} fallback for non-SQLite

2020-06-01 Thread Eric Wong
It's no longer necessary to populate the smsg->{mid} field now that ->smsg_eml calls smsg->populate in rare cases where the smsg did not originate from SQLite. --- lib/PublicInbox/WwwAtomStream.pm | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/PublicInbox/WwwAtomStream.p

[PATCH 07/13] inbox: msg_by_*: remove $(size)ref args

2020-06-01 Thread Eric Wong
None of our current callers care about the size of the blob we're retrieving, so stop wasting stack space and code for it. --- lib/PublicInbox/Inbox.pm | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm

[PATCH 06/13] smsg: get rid of ->wrap initializer, too

2020-06-01 Thread Eric Wong
We'll just use `bless' like most current PublicInbox::Smsg callers. --- lib/PublicInbox/SearchIdx.pm | 2 +- lib/PublicInbox/Smsg.pm | 5 - 2 files changed, 1 insertion(+), 6 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index eb228e6bba7..f10a9104

[PATCH 03/13] v2writable: fix non-sensical interpolation in BUG message

2020-06-01 Thread Eric Wong
No point in attempting to print the value of an undefined variable if there's a bug. Fortunately, (AFAIK) we've never hit that bug check :> --- lib/PublicInbox/V2Writable.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writa

[PATCH 05/13] smsg: introduce ->populate method

2020-06-01 Thread Eric Wong
This will eventually replace the __hdr() calling methods and eradicate {mime} usage from Smsg. For now, we can eliminate PublicInbox::Smsg->new since most callers already rely on an open `bless' to avoid the old {mime} arg. --- lib/PublicInbox/Import.pm | 40

Re: [PATCH 00/13] smsg: remove tricky {mime} field

2020-06-01 Thread Eric Wong
Eric Wong wrote: > Furthermore, $smsg->$field dispatch has always been measurably > faster than $smsg->{$field} access in NNTP. s/faster/slower/ :x -- unsubscribe: one-click, see List-Unsubscribe header archive: https://public-inbox.org/meta/

[PATCH] search: index byte size of a message for IMAP search

2020-06-01 Thread Eric Wong
Searching for messages smaller than a certain size is allowed by offlineimap(1), mbsync(1), and possibly other tools. Maybe public-inbox-watch will support it, too. I don't see a reason to expose searching by size via WWW search right now (but maybe in the future, I could be convinced to). Note:

Re: MUA and client-side limits w.r.t. IMAP

2020-06-04 Thread Eric Wong
Eric Wong wrote: > * default, same as newsgroup name, messages "expire" > inbox.comp.version-control.git =>most recent 50K messages I think that's best left an empty parent folder, since having the most recent X messages means the client will have to refetc

[PATCH] searchidx: v1: fix retries when Xapian and Msgmap are out-of-sync

2020-06-04 Thread Eric Wong
We forcibly stop git-log here, so erroring out on git-log close failures is wrong since it sees SIGPIPE. Noticed while reindexing a large v1 inbox for IMAP changes. Fixes: b32b47fb12a3043d ("index: "git log" failures are fatal") --- lib/PublicInbox/SearchIdx.pm | 5 + 1 file changed, 1 inser

Re: [Patch] Update 24-hour times to use two digits for the hour

2020-06-05 Thread Eric Wong
Varun Varada wrote: > Hello, > > Here is a patch to update the timestamps displayed to have 2 digits > for the hour when since it is using the 24-hour clock: Hello Varun, thanks for your interest in the project. But why this patch? It's a requirement to document the "why?" for submitting any p

Re: [Patch] Update 24-hour times to use two digits for the hour

2020-06-05 Thread Eric Wong
Varun Varada wrote: > Hi Eric, > > The "why?" is that leading zeroes are standard for virtually any > 24-hour clock in the world > (https://www.google.com/search?q=24-hour+time). This is even codified > in the ISO 8601 standard > (https://en.wikipedia.org/wiki/ISO_8601#Times), which the project >

Re: [Patch] Update 24-hour times to use two digits for the hour

2020-06-05 Thread Eric Wong
Varun Varada wrote: > Hi Eric, > > The issue of being able to determine whether a reply was on a > different day still arises with your method, as replies that occurred > before 12:00 on the same day would have the same problem. True, there's cases where reading the date column is necessary; but

[PATCH] index: v2: parallel by default

2020-06-07 Thread Eric Wong
InboxWritable should only set $v2w->{parallel} if the $parallel flag is defined to 0 or 1. We want indexing a new inbox to utilize SMP, just like --reindex. -index once again allows -j0/--jobs=0 to force single-process use, and we'll be ensuring that works in tests to maintain performance on smal

IMAP server notes, maybe JMAP?

2020-06-09 Thread Eric Wong
OK, so I almost have something that won't kill clients or trigger OOMs on the server. I think I'll have to implement MSN (message sequence numbers) properly for some clients, cheaply. I know there's also interest in getting search usable via an HTTP(S) API, so maybe JMAP[1] is worth looking into

Re: news.public-inbox.org misbehaving?

2020-06-09 Thread Eric Wong
Kyle Meyer wrote: > Any ideas what might be going on? Thanks for pointing that out. Worked around for now via SIGHUP to reload -nntpd. I used -compact (and via `-index -c --reindex'), and -nntpd didn't pick up the movement of over.sqlite3 automatically. Will have to think about how -compact ne

[PATCH 00/82] public-inbox-imapd: read-only IMAP server

2020-06-10 Thread Eric Wong
elism :) Anyways, I'll probably be porting some of the scalability and slow-storage work to older parts of the code before fiddling with more IMAP extensions. Eric Wong (82): doc: add some IMAP standards nntpd: restrict allowed newsgroup names preliminary imap server implementation inbo

[PATCH 01/82] doc: add some IMAP standards

2020-06-10 Thread Eric Wong
There's more, but IMAP is big and complex already. --- Documentation/standards.perl | 9 + 1 file changed, 9 insertions(+) diff --git a/Documentation/standards.perl b/Documentation/standards.perl index 34ab829e2e3..37309956f39 100755 --- a/Documentation/standards.perl +++ b/Documentation/

[PATCH 02/82] nntpd: restrict allowed newsgroup names

2020-06-10 Thread Eric Wong
We'll be using newsgroup names as mailbox names for IMAP, too, so ensure we don't send wonky characters in responses. I doubt this affects any real-world instances, but a BOFH could choose strange names to cause grief for clients. --- lib/PublicInbox/NNTPD.pm | 6 ++ 1 file changed, 6 inserti

[PATCH 04/82] inboxidle: new class to detect inbox changes

2020-06-10 Thread Eric Wong
This will be used to implement IMAP IDLE, first. Eventually, it may be used to trigger other things: * incremental internal updates for manifest.js.gz * restart `git cat-file' processes on pack index unlink * IMAP IDLE-like long-polling HTTP endpoint And maybe more things we haven't thought of,

[PATCH 08/82] imap: implement STATUS command

2020-06-10 Thread Eric Wong
I'm not sure if there's much use for this command, but it's part of RFC3501 and works read-only. --- lib/PublicInbox/IMAP.pm | 29 + t/imapd.t | 5 + 2 files changed, 34 insertions(+) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm ind

[PATCH 10/82] imap: support LIST command

2020-06-10 Thread Eric Wong
We'll optimize for the common case of: $TAG LIST "" * and rely on the grep perlfunc to handle trickier cases. --- lib/PublicInbox/IMAP.pm | 14 ++ lib/PublicInbox/IMAPD.pm | 25 + t/imapd.t| 59 3 files changed, 98 i

[PATCH 06/82] msgmap: split ->max into its own method

2020-06-10 Thread Eric Wong
There's enough places where we only care about the max NNTP article number to warrant avoiding a call into SQLite. Using ->num_highwater in read-only packages such as PublicInbox::IMAP is also incorrect, since that memoizes and won't pick up changes made by other processes. --- lib/PublicInbox/IM

[PATCH 12/82] imap: support fetch for BODYSTRUCTURE and BODY

2020-06-10 Thread Eric Wong
I'm not sure which clients use these, but it could be useful down the line. --- lib/PublicInbox/Eml.pm | 7 +-- lib/PublicInbox/IMAP.pm | 106 +++- t/imapd.t | 16 +- 3 files changed, 123 insertions(+), 6 deletions(-) diff --git a/lib/Publ

[PATCH 19/82] imap: support sequence number FETCH

2020-06-10 Thread Eric Wong
We'll return dummy messages for now when sequence numbers go missing, in case clients can't handle missing messages. --- lib/PublicInbox/IMAP.pm | 87 - t/imapd.t | 18 + 2 files changed, 96 insertions(+), 9 deletions(-) diff --git a/l

[PATCH 05/82] imap: support IDLE

2020-06-10 Thread Eric Wong
It seems to be working as far as Mail::IMAPClient is concerned. --- Documentation/standards.perl | 2 +- lib/PublicInbox/IMAP.pm | 58 ++-- lib/PublicInbox/IMAPD.pm | 20 - lib/PublicInbox/NNTPD.pm | 6 ++-- t/imapd.t|

[PATCH 21/82] imap: support the CLOSE command

2020-06-10 Thread Eric Wong
It seems worthless to support CLOSE for read-only inboxes, but mutt sends it, so don't return a BAD error with proper use. --- lib/PublicInbox/IMAP.pm | 6 ++ t/imapd.t | 3 ++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbo

[PATCH 07/82] imap: delay InboxIdle start, support refresh

2020-06-10 Thread Eric Wong
InboxIdle should not be holding onto Inbox objects after the Config object they came from expires, and Config objects may expire on SIGHUP. Old Inbox objects still persist due to IMAP clients holding onto them, but that's a concern we'll deal with at another time, or not at all, since all clients

[PATCH 11/82] t/imapd: support FakeInotify and KQNotify

2020-06-10 Thread Eric Wong
We can fill in some missing pieces from the emulation APIs to enable IMAP IDLE tests on non-Linux platforms. --- lib/PublicInbox/FakeInotify.pm | 22 +- lib/PublicInbox/KQNotify.pm| 6 ++ t/imapd.t | 3 ++- 3 files changed, 29 insertions(+), 2 del

[PATCH 23/82] git: async: flatten the inflight array

2020-06-10 Thread Eric Wong
Small array refs have considerable overhead in Perl, so reduce AV/SV overhead and instead allow the inflight array to grow twice as large. --- lib/PublicInbox/Git.pm | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm index 84

[PATCH 03/82] preliminary imap server implementation

2020-06-10 Thread Eric Wong
It shares a bit of code with NNTP. It's copy+pasted for now since this provides new ground to experiment with APIs for dealing with slow storage and many inboxes. --- Documentation/public-inbox-imapd.pod | 91 + MANIFEST | 7 + lib/PublicInbox/Daemon.pm

[PATCH 20/82] imap: do not include ".PEEK" in responses

2020-06-10 Thread Eric Wong
They're not specified in RFC 3501 for responses, and at least mutt fails to handle it. --- lib/PublicInbox/IMAP.pm | 32 ++-- t/imap.t| 12 ++-- t/imapd.t | 14 ++ 3 files changed, 30 insertions(+), 28 deletions(-) diff

[PATCH 09/82] imap: use Text::ParseWords::parse_line to handle quoted words

2020-06-10 Thread Eric Wong
IMAP clients may quote args and escape similar to POSIX shell, so attempt to handle them properly using this standard library module. --- lib/PublicInbox/IMAP.pm | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index a2d59e5cc

[PATCH 14/82] imap: allow fetch of partial of BODY[...] and headers

2020-06-10 Thread Eric Wong
IMAP supports a high level of granularity when it comes to fetching, but fortunately Perl makes it fairly easy to support. --- MANIFEST| 1 + lib/PublicInbox/IMAP.pm | 154 ++-- t/imap.t| 43 +++ t/imapd.t

[PATCH 22/82] imap: speed up HEADER.FIELDS[.NOT] range fetches

2020-06-10 Thread Eric Wong
While we can't memoize the regexp forever like we do with other Eml users, we can still benefit from caching regexp compilation on a per-request basis. A FETCH request from mutt on a 4K message inbox is around 8% faster after this. Since regexp compilation via qr// isn't unbearably slow, a shared

[PATCH 15/82] imap: always include `resp-text' in responses

2020-06-10 Thread Eric Wong
Mail::IMAPClient doesn't seem to mind the lack of `resp-text'; but it's required by RFC 3501. Preliminary tests with offlineimap(1) indicates the presence of `resp-text' is necessary, even if it's just the freeform `text'. And make the `text' more consistent, favoring "done" over "complete" or "c

[PATCH 18/82] imap: simplify partial fetch structure

2020-06-10 Thread Eric Wong
While the contents of normal %want hash keys are bounded in size, %partial can cause more overhead and lead to repeated sort calls on multi-message fetches. So sort it once and use arrayrefs to make the data structure more compact. --- lib/PublicInbox/IMAP.pm | 13 ++--- 1 file changed, 1

[PATCH 17/82] imap: fix multi-message partial header fetches

2020-06-10 Thread Eric Wong
We must keep the contents of {-partial} around when handling a request to fetch multiple messages. --- lib/PublicInbox/IMAP.pm | 2 +- t/imapd.t | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 4292c564f01.

[PATCH 13/82] eml: each_part: single part $idx is 1

2020-06-10 Thread Eric Wong
Instead of counts starting at 0, we start the single-part message at 1 like we do with subparts of a multipart message. This will make it easier to map offsets for "BODY[$SECTION]" when using IMAP FETCH, since $SECTION must contain non-zero numbers according to RFC 3501. This doesn't make any diff

[PATCH 16/82] imap: split out unit tests and benchmarks

2020-06-10 Thread Eric Wong
This makes the test code easier-to-manage and allows us to run faster unit tests which don't involve loading Mail::IMAPClient. --- MANIFEST| 1 + t/imap.t| 20 ++ t/imapd.t | 49 + xt/perf-imap-list.t |

[PATCH 24/82] git: do our own read buffering for cat-file

2020-06-10 Thread Eric Wong
To work with our event loop, we must perform read buffering ourselves or risk starvation, as there doesn't appear to be a way to check the amount of data buffered in userspace by by the PerlIO layers without resorting to C or XS. This lets us perform fewer syscalls at the expense of more Perl ops.

[PATCH 27/82] imap: support LSUB command

2020-06-10 Thread Eric Wong
Since we only support read-only operation, we can't save subscriptions requested by clients. So just list no inboxes as subscribed, some MUAs may blindly try to fetch everything its subscribed to. --- lib/PublicInbox/IMAP.pm | 5 + 1 file changed, 5 insertions(+) diff --git a/lib/PublicInbox

[PATCH 30/82] testcommon: tcp_(server|connect): BAIL_OUT on failure

2020-06-10 Thread Eric Wong
None of our tests rely on this failing, so just bail out if the system is out of resources. --- lib/PublicInbox/TestCommon.pm | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm index 246047b1357..5e7dc8b0d3f 1006

[PATCH 33/82] git: cat_async: provide requested OID + "missing" on missing blobs

2020-06-10 Thread Eric Wong
This will make it easier to implement the retries on alternates_changed() of the synchronous ->cat_file API. --- lib/PublicInbox/Git.pm | 13 - t/git.t| 2 +- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm

[PATCH 32/82] imap: fix pipelining with async git

2020-06-10 Thread Eric Wong
Since IMAP yields control to GitAsyncCat, IMAP->event_step may be invoked with {long_cb} still active. We must be sure to bail out of IMAP->event_step if that happens and continue to let GitAsyncCat drive IMAP. This also improves fairness by never processing more than one request per ->event_step

[PATCH 37/82] xt: add imapd-validate and imapd-mbsync-oimap

2020-06-10 Thread Eric Wong
imapd-validate is a beefed up version of our nntpd-validate test which hammers the server with parallel connections over regular IMAP, IMAPS, IMAP+STARTTLS; and COMPRESS=DEFLATE variants of each of those. It uses $START_UID:$END_UID fetch ranges to reduce requests and slurp many responses at once

[PATCH 31/82] *deflate: drop invalid comment about rbuf

2020-06-10 Thread Eric Wong
It must be a scalar reference, unlike ->write --- lib/PublicInbox/IMAPdeflate.pm | 1 - lib/PublicInbox/NNTPdeflate.pm | 1 - 2 files changed, 2 deletions(-) diff --git a/lib/PublicInbox/IMAPdeflate.pm b/lib/PublicInbox/IMAPdeflate.pm index 9366db7a7fc..67c9a9738d5 100644 --- a/lib/PublicInbox/IM

[PATCH 28/82] imap: FETCH: support comma-delimited ranges

2020-06-10 Thread Eric Wong
The RFC 3501 `sequence-set' definition allows comma-delimited ranges, so we'll support it in case clients send them. Coalescing overlapping ranges isn't required, so we won't support it as such an attempt to save bandwidth would waste memory on the server, instead. --- lib/PublicInbox/IMAP.pm | 9

[PATCH 38/82] imap: support out-of-bounds ranges

2020-06-10 Thread Eric Wong
"$UID_START:*" needs to return at least one message according to RFC 3501 section 6.4.8. While we're in the area, coerce ranges to (unsigned) integers by adding zero ("+ 0") to reduce memory overhead. --- lib/PublicInbox/IMAP.pm | 8 +--- t/imapd.t | 13 + 2 files c

[PATCH 35/82] git: async: automatic retry on alternates change

2020-06-10 Thread Eric Wong
This matches the behavior of the existing synchronous ->cat_file method. In fact, ->cat_file now becomes a small wrapper around the ->cat_async method. --- lib/PublicInbox/Git.pm | 64 +- t/git.t| 37 2 files changed

[PATCH 29/82] add imapd compression test

2020-06-10 Thread Eric Wong
Include a test for Mail::IMAPTalk, here, since Mail::IMAPClient stalls with compression enabled: https://rt.cpan.org/Ticket/Display.html?id=132720 --- MANIFEST| 1 + xt/cmp-imapd-compress.t | 83 + 2 files changed, 84 insertions(+)

[PATCH 25/82] imap: use git-cat-file asynchronously

2020-06-10 Thread Eric Wong
This ought to improve overall performance with multiple clients. Single client performance suffers a tiny bit due to extra syscall overhead from epoll. This also makes the existing async interface easier-to-use, since calling cat_async_begin is no longer required. --- MANIFEST

[PATCH 26/82] git: idle rbuf for async

2020-06-10 Thread Eric Wong
We do this for the C10K-oriented HTTP/NNTP/IMAP processes, and we may support thousands of git-cat-file processes in the future. --- lib/PublicInbox/Git.pm | 47 ++ lib/PublicInbox/GitAsyncCat.pm | 2 +- 2 files changed, 26 insertions(+), 23 deletions(-) d

[PATCH 36/82] imapclient: wrapper for Mail::IMAPClient

2020-06-10 Thread Eric Wong
We'll be using this wrapper class to workaround some upstream bugs in Mail::IMAPClient. There may also be experiments with new APIs for more performance. --- MANIFEST | 1 + lib/PublicInbox/IMAPClient.pm | 119 ++ t/imapd-tls.t

[PATCH 40/82] imap: case-insensitive mailbox name comparisons

2020-06-10 Thread Eric Wong
IMAP RFC 3501 stipulates case-insensitive comparisons, and so does RFC 977 (NNTP). However, INN (nnrpd) uses case-sensitive comparisons, so we've always used case-sensitive comparisons for NNTP to match nnrpd behavior. Unfortunately, some IMAP clients insist on sending "INBOX" with caps, which ca

[PATCH 39/82] xt/perf-imap-list: time refresh_inboxlist

2020-06-10 Thread Eric Wong
It's useful to know how fast SIGHUP can be handled, too. --- xt/perf-imap-list.t | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/xt/perf-imap-list.t b/xt/perf-imap-list.t index 37640a90f3c..0f00f487991 100644 --- a/xt/perf-imap-list.t +++ b/xt/perf-imap-list.t @@ -9,15

[PATCH 34/82] git: move async_cat reference to PublicInbox::Git

2020-06-10 Thread Eric Wong
Trying to avoid a circular reference by relying on $ibx object here makes no sense, since skipping GitCatAsync::close will result in an FD leak, anyways. So keep GitAsyncCat contained to git-only operations, since we'll be using it for Solver in the distant feature. --- lib/PublicInbox/Git.pm

[PATCH 51/82] imap: start parsing out queries for SQLite and Xapian

2020-06-10 Thread Eric Wong
None of the new cases are wired up, yet, but existing cases still work. --- lib/PublicInbox/IMAP.pm | 142 ++-- t/imap.t| 15 + 2 files changed, 150 insertions(+), 7 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.p

[PATCH 42/82] imap: start doing iterative config reloading

2020-06-10 Thread Eric Wong
This will be used to prevent reloading a giant config with tens/hundreds of thousands of inboxes from blocking the event loop. --- lib/PublicInbox/Config.pm | 15 ++ lib/PublicInbox/IMAPD.pm | 61 +++ script/public-inbox-imapd | 2 +- 3 files changed,

[PATCH 50/82] imap: avoid uninitialized warnings on incomplete commands

2020-06-10 Thread Eric Wong
No point in spewing "uninitialized" warnings into logs when the cat jumps on the Enter key. --- lib/PublicInbox/IMAP.pm | 7 +-- t/imapd.t | 11 +++ 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index e7

[PATCH 47/82] xt/*: show some tunable parameters

2020-06-10 Thread Eric Wong
This will make it easier to show parameters used for testing and potential tweaks to be made. --- xt/eml_check_limits.t | 5 - xt/git_async_cmp.t | 1 + xt/imapd-mbsync-oimap.t | 4 +++- xt/imapd-validate.t | 1 + xt/mem-msgview.t| 1 + 5 files changed, 10 insertions(+), 2 d

[PATCH 46/82] t/config.t: always compare against git bool behavior

2020-06-10 Thread Eric Wong
We'll use the xqx() to avoid losing too much performance compared to normal `backtick` (qx) when testing using "make check-run" + Inline::C. --- t/config.t | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/t/config.t b/t/config.t index 1f50bb86a09..3f41c0042a9 100

[PATCH 45/82] imap: omit $UID_END from mailbox name, use index

2020-06-10 Thread Eric Wong
Having two large numbers separated by a dash can make visual comparisons difficult when numbers are in the 3,000,000 range for LKML. So avoid the $UID_END value, since it can be calculated from $UID_MIN. And we can avoid large values of $UID_MIN, too, by instead storing the block index and just m

[PATCH 41/82] imap: break giant inboxes into sub-inboxes of 50K messages

2020-06-10 Thread Eric Wong
This limit on mailbox size should keep users of tools like mbsync (isync) and offlineimap happy, since typical filesystems struggle with giant Maildirs. I chose 50K since it's a bit more than what LKML typically sees in a month and still manages to give acceptable performance on my ancient Centrin

[PATCH 49/82] imap: EXAMINE/STATUS: return correct counts

2020-06-10 Thread Eric Wong
We can share code between them and account for each 50K mailbox slice. However, we must overreport these for non-zero slices and just return lots of empty data for high-numbered slices because some MUAs still insist on non-UID fetches. --- lib/PublicInbox/IMAP.pm | 53

[PATCH 44/82] imapd: ensure LIST is sorted alphabetically, for now

2020-06-10 Thread Eric Wong
I'm not sure this matters, and it could be a waste of CPU cycles if no real clients care. However, it does make debugging over telnet or s_client a bit easier. --- lib/PublicInbox/IMAPD.pm | 7 ++- t/imapd.t| 14 +- 2 files changed, 19 insertions(+), 2 deletions(-

[PATCH 56/82] search: index UID for IMAP search, too

2020-06-10 Thread Eric Wong
We'll need to support searching UID ranges for IMAP, so make sure it's indexed, too. --- lib/PublicInbox/Search.pm| 1 + lib/PublicInbox/SearchIdx.pm | 1 + t/search.t | 5 + 3 files changed, 7 insertions(+) diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search

[PATCH 53/82] imap: allow UID range search on timestamps

2020-06-10 Thread Eric Wong
Since it seems somewhat common for IMAP clients to limit searches by sent Date: or INTERNALDATE, we can rely on the NNTP/WWW-optimized overview DB. For other queries, we'll have to depend on the Xapian DB. --- lib/PublicInbox/DummyInbox.pm | 2 +- lib/PublicInbox/IMAP.pm | 13 -

<    5   6   7   8   9   10   11   12   13   14   >