[PATCH 2/2] treewide: relax allow >=40 chars for git OID

2020-09-15 Thread Eric Wong
This will help with eventual git SHA-256 transitions. --- lib/PublicInbox/Feed.pm | 4 ++-- lib/PublicInbox/Git.pm | 2 +- lib/PublicInbox/Import.pm | 4 ++-- lib/PublicInbox/ViewDiff.pm | 4 ++-- lib/PublicInbox/WWW.pm | 2 +- t/edit.t| 4 ++-- t/plack.t

[PATCH 0/2] remove more 40 char limits

2020-09-15 Thread Eric Wong
SHA-256 support is coming in git Eric Wong (2): mid: rename MID_MAX to ID_MAX treewide: relax allow >=40 chars for git OID lib/PublicInbox/Feed.pm | 4 ++-- lib/PublicInbox/Git.pm | 2 +- lib/PublicInbox/Import.pm | 4 ++-- lib/PublicInbox/MID.pm | 4 ++-- lib/PublicIn

[PATCH] wwwstream: link to cgit URLs for coderepo

2020-09-15 Thread Eric Wong
Hopefully this reduces the ambiguity between code for the project(s) using public-inbox and the code for public-inbox itself. --- lib/PublicInbox/WwwStream.pm | 21 + 1 file changed, 21 insertions(+) diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm inde

[PATCH] git_async_cat: fix outdated comment

2020-09-16 Thread Eric Wong
We replaced Danga::Socket with PublicInbox::DS roughly a year before GitAsyncCat was introduced into our git history. --- lib/PublicInbox/GitAsyncCat.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/GitAsyncCat.pm b/lib/PublicInbox/GitAsyncCat.pm index e618d36

[ANNOUNCE] public-inbox 1.6.0

2020-09-16 Thread Eric Wong
A big release containing several performance optimizations, a new anonymous IMAP server, and more. It represents an incremental improvement over 1.5 in several areas with more to come in 1.7. The read-only httpd and nntpd daemons no longer block the event loop when retrieving blobs from git, maki

[PATCH] t/indexlevels-mirror: fix improperly skipped test

2020-09-16 Thread Eric Wong
Oops :x --- t/indexlevels-mirror.t | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/indexlevels-mirror.t b/t/indexlevels-mirror.t index b12bd3cb..656a9a34 100644 --- a/t/indexlevels-mirror.t +++ b/t/indexlevels-mirror.t @@ -162,7 +162,7 @@ my $import_index_incremental = sub {

[PATCH] git_async_cat: inline + drop redundant batch_prepare call

2020-09-17 Thread Eric Wong
$git->cat_async already calls $git->batch_prepare iff needed, so we can reduce subroutine calls and inline a one-off subroutine to save some memory, here. --- lib/PublicInbox/GitAsyncCat.pm | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/lib/PublicInbox/GitAsyncC

Re: Epoch roll-over with imap

2020-09-17 Thread Eric Wong
Konstantin Ryabitsev wrote: > Good morning, and congratulations on 1.6.0! Thanks. > I'm starting to play with the new imapd mode (currently using the imap > daemon on public-inbox.org), and I am curious how we can make it obvious > to the clients that there is a new epoch available. For exampl

[PATCH 2/2] doc: txt2pre: more manpage URLs

2020-09-17 Thread Eric Wong
We host our own -imapd manpage, and we started using a few more git commands (fast-import for ages). We'll also need to link to manpages.debian.org and live with long URLs for a few non-standard manpages in software we reference. --- Documentation/txt2pre | 8 1 file changed, 8 insertion

[PATCH 1/2] doc: flow: include -imapd

2020-09-17 Thread Eric Wong
It's another read-only daemon, and it may see more usage than -nntpd as more users have IMAP support than NNTP. --- Documentation/flow.ge | 1 + Documentation/flow.txt | 1 + 2 files changed, 2 insertions(+) diff --git a/Documentation/flow.ge b/Documentation/flow.ge index 27f2bfcb..0cc1c333 1006

[PATCH 0/2] some doc fixes

2020-09-17 Thread Eric Wong
Should've been in 1.6, oh well :x Anyways, https://public-inbox.org/flow.html is updated with -imapd Eric Wong (2): doc: flow: include -imapd doc: txt2pre: more manpage URLs Documentation/flow.ge | 1 + Documentation/flow.txt | 1 + Documentation/txt2pre | 8 3 files change

[PATCH 1/7] gcf2: libgit2-based git cat-file alternative

2020-09-19 Thread Eric Wong
From: Eric Wong Having tens of thousands of inboxes and associated git processes won't work well, so we'll use libgit2 to access the object DB directly. We only care about OID lookups and won't need to rely on per-repo revision names or paths. The Git::Raw XS package won'

[PATCH 0/7] gcf2: libgit2-based cat-file alternative

2020-09-19 Thread Eric Wong
oes detect new epochs and seems mostly working otherwise... Eric Wong (7): gcf2: libgit2-based git cat-file alternative t/gcf2: test changes to alternates add gcf2 client and executable script gcf2: transparently retry on missing OID gcf2*: more descriptive package descriptions gcf2:

[PATCH 5/7] gcf2*: more descriptive package descriptions

2020-09-19 Thread Eric Wong
Hopefully this allows others to more quickly figure out what's going on. --- lib/PublicInbox/Gcf2.pm | 5 +++-- lib/PublicInbox/Gcf2Client.pm | 2 ++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/Gcf2.pm b/lib/PublicInbox/Gcf2.pm index 6ac3aa18..fe76b1fd 1006

[PATCH 4/7] gcf2: transparently retry on missing OID

2020-09-19 Thread Eric Wong
Since we only get OIDs from trusted local data sources (over.sqlite3), we can safely retry within the -gcf2 process without worry about clients spamming us with requests for invalid OIDs and triggering reopens. --- lib/PublicInbox/Gcf2Client.pm | 11 +-- lib/PublicInbox/Git.pm | 5 ++

[PATCH 7/7] gcf2: wire up read-only daemons and rm -gcf2 script

2020-09-19 Thread Eric Wong
It seems easiest to have a singleton Gcf2Client client object per daemon worker for all inboxes to use. This reduces overall FD usage from pipes. The `public-inbox-gcf2' command + manpage are gone and a `$^X' one-liner is used, instead. This saves inodes for internal commands and hopefully makes

[PATCH 3/7] add gcf2 client and executable script

2020-09-19 Thread Eric Wong
From: Eric Wong This should be able to replace multiple `git cat-file' for blob retrieval, but adjustments may be needed. --- Documentation/public-inbox-gcf2.pod | 63 + MANIFEST| 4 ++ Makefile.PL | 5 +++

[PATCH 6/7] gcf2: require git dir with OID

2020-09-19 Thread Eric Wong
This amortizes the cost of recreating PublicInbox::Gcf2 objects when alternates change in v2 all.git. --- lib/PublicInbox/Gcf2Client.pm | 13 - lib/PublicInbox/Git.pm | 5 +++-- lib/PublicInbox/GitAsyncCat.pm | 24 +++- script/public-inbox-gcf2 | 14

[PATCH 2/7] t/gcf2: test changes to alternates

2020-09-19 Thread Eric Wong
From: Eric Wong Calling ->add_alternate won't pick up new additions to $OBJDIR/info/alternates, unfornately. Thus v2 inboxes will need to do something to invalidate Gcf2 objects. --- t/gcf2.t | 68 +--- 1 file changed, 60 insertio

Re: thoughts on Git::Raw / libgit2?

2020-09-19 Thread Eric Wong
I noped out of Git::Raw since their manpage explicitly stated it's an unstable API. So I'm using Inline::C for libgit2 support (we've been using Inline::C for years, now): https://public-inbox.org/meta/20200919093714.21776-...@80x24.org/ -- unsubscribe: one-click, see List-Unsubscribe header ar

Re: [ANNOUNCE] public-inbox 1.6.0

2020-09-19 Thread Eric Wong
Leah Neukirchen wrote: > Hi, > > thanks for the release! You're welcome! > > * Upgrading for new features in 1.6 > I did all these steps in this order, NNTP works fine but IMAP shows > all folders as empty. Any ideas how to debug this? Any chance you're hitting "$NEWSGROUP" and not "$NEWSG

[PATCH] doc: post-1.6 updates, start 1.7

2020-09-19 Thread Eric Wong
NAME => 'PublicInbox', + NAME => 'PublicInbox', # n.b. camel-case is not our choice + + # XXX drop "PENDING" in .pod before updating this! VERSION => '1.6.0', + AUTHOR => 'Eric Wong ', ABSTRACT =>

[PATCH] config: warn on multiple values for some fields

2020-09-19 Thread Eric Wong
Our code doesn't support multi-values for these, and having unexpected arrays leads to unexpected results (e.g. showing stuff like "ARRAY(0xDEADBEEFADD12E55)" in user interfaces). So warn and only use the last value (matching git-config(1) behavior without `--get-all'). --- lib/PublicInbox/Config

Re: [PATCH] mid: drop repeated ';' in mid_escape() regular expression

2020-09-20 Thread Eric Wong
Thanks, pushed as commit dc93c36eb62d36e649b9500b7f190687a3fbcffd -- unsubscribe: one-click, see List-Unsubscribe header archive: https://public-inbox.org/meta/

[PATCH] mda: match List-Id insensitively

2020-09-21 Thread Eric Wong
Konstantin Ryabitsev wrote: > Hello: > > Attempting to subscribe radio...@radiotap.org has highlighted two > problems with list-id matching. When the email comes in from the mailing > list, the header is set as: > > List-Id: radiotap.NetBSD.org > > Public-inbox doesn't find this because the

[PATCH 2/2] v2writable: drop outdated {unindex_range} check

2020-09-22 Thread Eric Wong
{unindex_range} only exists in the $sync state, nowadays, not the V2Writable ($self) object. $sync->{unindex_range} won't be populated if $regen_max is zero, either, unless somebody is injecting importable commits into an epoch history, in which this change will result in no-op indexing doing no w

[PATCH 0/2] more minor indexing fixes

2020-09-22 Thread Eric Wong
Stuff noticed while working on detached index usability and maintainability... Eric Wong (2): idxstack: fix comment about file_char v2writable: drop outdated {unindex_range} check lib/PublicInbox/IdxStack.pm | 2 +- lib/PublicInbox/V2Writable.pm | 2 +- 2 files changed, 2 insertions(+), 2

[PATCH 1/2] idxstack: fix comment about file_char

2020-09-22 Thread Eric Wong
It's `d' for deletes, not `a'. --- lib/PublicInbox/IdxStack.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/IdxStack.pm b/lib/PublicInbox/IdxStack.pm index b43b8064..ce75b46a 100644 --- a/lib/PublicInbox/IdxStack.pm +++ b/lib/PublicInbox/IdxStack.pm @@ -14,7

[PATCH] searchidx: fix (undocumented) --skip-docdata handling

2020-09-24 Thread Eric Wong
This switch is still undocumented, but we can reduce the scope of our Xapian docdata dependency by moving its only caller to SearchIdx. This reduces the amount of code loaded by read-only code paths. --- lib/PublicInbox/Search.pm| 3 --- lib/PublicInbox/SearchIdx.pm | 26

[PATCH] xt: add eml ->as_string round trip checker

2020-09-24 Thread Eric Wong
Unlike Email::MIME, PublicInbox::Eml::as_string should be able to round trip from the Perl object to a raw scalar and back without changes. --- MANIFEST | 1 + xt/eml_check_roundtrip.t | 43 2 files changed, 44 insertions(+) create mode 10

[PATCH] imap: avoid raising exception if client disconnects

2020-09-26 Thread Eric Wong
This ought to save a few cycles if a client disconnects while in the middle of a (UID) FETCH. This avoids: Can't call method "git" on an undefined value at .../PublicInbox/IMAP.pm errors in stderr. --- lib/PublicInbox/IMAP.pm | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --

Re: [PATCH] imap: avoid raising exception if client disconnects

2020-09-26 Thread Eric Wong
Eric Wong wrote: > diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm > index e7ea7f7f..9327100c 100644 > --- a/lib/PublicInbox/IMAP.pm > +++ b/lib/PublicInbox/IMAP.pm > @@ -619,18 +619,19 @@ sub fetch_run_ops { > sub fetch_blob_cb { # called by git->cat_as

[PATCH] ds: add missing label for systems w/o EPOLLEXCLUSIVE

2020-09-27 Thread Eric Wong
Oops :x --- lib/PublicInbox/DS.pm | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm index 9c278307..d0caa5e7 100644 --- a/lib/PublicInbox/DS.pm +++ b/lib/PublicInbox/DS.pm @@ -332,6 +332,7 @@ sub new { _InitPoller(); +retry: if (epoll_c

[PATCH] gcf2: improve error handling and do not ->fail on wbuf

2020-09-27 Thread Eric Wong
For historical reasons, both Danga::Socket::write and PublicInbox::DS::write will return 0 when data is buffered; so we Gcf2Client must not call ->fail when DS::write returns 0. We'll also improve robustness by recreating the entire Gcf2Client object if it does die for other reasons, instead of ri

[PATCH] searchidx: index lower-case List-Id value

2020-09-27 Thread Eric Wong
We don't want a List-Id value being confused with a Xapian term prefix, here. Followup-to: 8b06cda3a3af3f0e ("mda: match List-Id insensitively") --- lib/PublicInbox/SearchIdx.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/Sear

Re: [PATCH] gcf2: improve error handling and do not ->fail on wbuf

2020-09-27 Thread Eric Wong
Putting words together to write commit messages is hard without enough sleep :< Anyways, revised and pushed as commit 8ba04f214bbadcbe106c94281a0c4c21dd50adb8 gcf2: improve error handling and do not ->fail on wbuf For historical reasons, both Danga::Socket::write and PublicInbox:

[PATCH] v2writable: use "HEAD" to match v1 indexing behavior

2020-09-29 Thread Eric Wong
Users may want to change the default branch used for git epochs in v2 (v1 SearchIdx always used whatever "HEAD" pointed to). --- lib/PublicInbox/V2Writable.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index 5ff

Re: Thoughts on search-based imap mailboxes

2020-10-02 Thread Eric Wong
Konstantin Ryabitsev wrote: > Hello: > > While discussing something else on the kernel.org users list, the Btw, is this list public? > question of "virtual inbox folders" came up when talking about imap and > public-inbox. Here's how I imagine it could work in a way that doesn't > require an

Re: Thoughts on search-based imap mailboxes

2020-10-03 Thread Eric Wong
Konstantin Ryabitsev wrote: > On Fri, Oct 02, 2020 at 08:08:30PM +0000, Eric Wong wrote: > > A client-side tool is likely required anyways, I'm thinking > > having saved search functionality in a local tool writing to > > Maildir/mbox might be the best way forward as

[PATCH] manifest: favor Cpanel::JSON::XS

2020-10-03 Thread Eric Wong
JSON::MaybeXS already favors Cpanel::JSON::XS (and has for many years, now). Allow users to skip installing JSON::MaybeXS if they want an XS-based JSON implementation. --- There is more JSON on the way... lib/PublicInbox/ManifestJsGz.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) dif

[PATCH] admin: preserve config ordering of `--all' switch

2020-10-12 Thread Eric Wong
When `--all' is passed to -index and similar commands, process them in the same order as what is given in the config file. --- lib/PublicInbox/Admin.pm | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm index fb88e621..9

Re: [PATCH] admin: preserve config ordering of `--all' switch

2020-10-13 Thread Eric Wong
Eric Wong wrote: > When `--all' is passed to -index and similar commands, process > them in the same order as what is given in the config file. Pushed along with following explanation: This ensures predictable behavior so admins can ensure certain inboxes see updated ind

Re: [PATCH] scripts/dupe-finder: restore $dbh variable

2020-10-15 Thread Eric Wong
Kyle Meyer wrote: > When dupe-finder was switched from ->search->{over_ro} to ->over, the > database handle was dropped. Restore it because a spot downstream > uses it. > > Fixes: 73e3a6ed6e95adc6 (use more idiomatic internal API for ->over access) Thanks, I haven't used this since 2018 :x Btw

[PATCH 04/64] git: async: loop inflight checks for nested callbacks

2020-10-16 Thread Eric Wong
We need to loop the inflight check for nested callback invocations to ensure we don't clog the pipe that feeds `git cat-file'. This bug was obscured by the fact that we're already accounting for 64-char git OIDs with SHA-256 in the pipe space calculation; perhaps we shouldn't do that. --- lib/Pub

[PATCH 02/64] git: ensure ->destroy clobbers check_async read buffer

2020-10-16 Thread Eric Wong
It's currently not a problem as ->destroy doesn't happen for no reason, we'll need to ensure future uses of ->destroy correctly discard the check_async buffer. --- lib/PublicInbox/Git.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbo

[PATCH 03/64] git: *_async: support nested callback invocations

2020-10-16 Thread Eric Wong
For external indices, we'll need to support nested cat_async invocations to deduplicate cross-posted messages. Thus we need to ensure we do not clobber the {inflight*} queues while stepping through and ensure {cat_rbuf} is stored before invoking callbacks. This fixes the ->cat_async-only case, bu

[PATCH 01/64] inbox: add uidvalidity method

2020-10-16 Thread Eric Wong
This will make it easier to deal with ExtSearchIdx, which won't have msgmap. --- lib/PublicInbox/DummyInbox.pm | 4 ++-- lib/PublicInbox/IMAPD.pm | 6 +++--- lib/PublicInbox/Inbox.pm | 2 ++ 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/lib/PublicInbox/DummyInbox.pm b/li

oops, :x was supposed to be 1/3 for git: stuff

2020-10-16 Thread Eric Wong
Fat-fingered and sent the wrong directory. Anyways, some git async fixes which seems to be working well enough and ironed out by yet-to-be-published extsearch indexing code. AFAIK none of the current code in public-inbox.git is affected by these chagnes. -- unsubscribe: one-click, see List-Unsubs

Re: [PATCH 01/64] inbox: add uidvalidity method

2020-10-16 Thread Eric Wong
Eric Wong wrote: > This will make it easier to deal with ExtSearchIdx, which > won't have msgmap. I was going to send this as a standalone patch (not part of any series, but it's fine, here). -- unsubscribe: one-click, see List-Unsubscribe header archive: https://public-inbox.org/meta/

[PATCH] tmpfile: modernize to 5.10.1+, note O_APPEND workaround

2020-10-16 Thread Eric Wong
Once again we'll need O_APPEND on a temporary file, so note we support it, here; since Perl 5.32 is way too new to depend on our users having. --- Just something I noticed while working on other stuff, since I'll be relying on O_APPEND tmpfile behavior in yet another place. lib/PublicInbox/Tmpf

[PATCH] git: introduce async_wait_all

2020-10-17 Thread Eric Wong
->cat_async and ->check_async may trigger each other (in future callers) while waiting, so we need a unified method to ensure both complete. This doesn't affect current code, but allows us to slightly simplify existing callers. --- lib/PublicInbox/Git.pm| 14 -- lib/PublicInbo

[REVERT?] xt: add eml ->as_string round trip checker

2020-10-17 Thread Eric Wong
Eric Wong wrote: > Unlike Email::MIME, PublicInbox::Eml::as_string should be able > to round trip from the Perl object to a raw scalar and back > without changes. Well, almost... As long as we don't use ->each_part. Will likely go with this revert: --8<--

[PATCH 00/52] detached external index: mostly

2020-10-27 Thread Eric Wong
t;200GB due to deduplication between cross posts. -compact isn't working with these indices, yet, but will sometime... More changes on the way, still trying fix my brain and get through this year... Eric Wong (52): doc/standards: add RFCs for URL schemes search: hoist out _xdb_sharded for

[PATCH 03/52] extsearch: start mocking out

2020-10-27 Thread Eric Wong
This will provide a similar API to PublicInbox::Inbox for read-only WWW, -imapd, and -nntpd interfaces. --- MANIFEST | 2 ++ lib/PublicInbox/ExtSearch.pm | 40 lib/PublicInbox/Search.pm| 4 ++-- t/extsearch.t| 11 ++

[PATCH 02/52] search: hoist out _xdb_sharded for v2 inboxes

2020-10-27 Thread Eric Wong
We'll be using this in detached (ext) Xapian indexes in cross inbox search. --- lib/PublicInbox/Search.pm | 58 +-- 1 file changed, 31 insertions(+), 27 deletions(-) diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm index 0321ca93..6346d788 100

[PATCH 01/52] doc/standards: add RFCs for URL schemes

2020-10-27 Thread Eric Wong
We linkify these in the WWW UI, and will support them in other places. These URL schemes may end up being stored in external/detached indices for indexing non-git-based mail stores. --- Documentation/standards.perl | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/standards.per

[PATCH 06/52] v2writable: make OO calls to last_commit-related methods

2020-10-27 Thread Eric Wong
We'll try to reuse as much V2Writable code as possible for external indices, but the way "last_commit" info is stored must be different as external indices will deal with last_commit info for multiple inboxes. --- lib/PublicInbox/V2Writable.pm | 6 +++--- 1 file changed, 3 insertions(+), 3 deletio

[PATCH 05/52] v2writable: add git method

2020-10-27 Thread Eric Wong
This will make it easier to share code with ExtSearchIdx. --- lib/PublicInbox/V2Writable.pm | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index c04f0c59..9d08549f 100644 --- a/lib/Publi

[PATCH 04/52] searchidx: expose INDEXLEVELS as `our'

2020-10-27 Thread Eric Wong
This will be used by external/detached indices, too. --- lib/PublicInbox/SearchIdx.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 2aec2b73..af707ced 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/Pub

[PATCH 16/52] inboxwritable: eidx_key for external index

2020-10-27 Thread Eric Wong
This is preferable to open-coding "newsgroup // inboxdir" everywhere. --- lib/PublicInbox/InboxWritable.pm | 2 ++ lib/PublicInbox/SearchIdx.pm | 12 ++-- lib/PublicInbox/SearchIdxShard.pm | 32 --- 3 files changed, 29 insertions(+), 17 deletions(-) diff

[PATCH 15/52] v2: some changes for ExtSearchIdx compatibility

2020-10-27 Thread Eric Wong
We'll be using per-sync-state {ibx} refs instead, so make parts of the v2 indexing code less-dependent on $self->{ibx} where $self is a V2Writable object. --- lib/PublicInbox/InboxWritable.pm | 21 ++ lib/PublicInbox/V2Writable.pm| 49 +++- lib/PublicInb

[PATCH 14/52] overidx: introduce changes for external index

2020-10-27 Thread Eric Wong
Since external indices won't have msgmap.sqlite3, we'll need to store last_commit-* metadata in over.sqlite3 instead. This has a longer limits to account for path names or newsgroup names stored in keys. We'll also rely on built-in counters for Xapian document IDs, since msgmap.sqlite3 no longer

[PATCH 10/52] v2writable: hoist out write_alternates

2020-10-27 Thread Eric Wong
We'll be reusing this for external indices and possibly other places. --- lib/PublicInbox/V2Writable.pm | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index eecc702b..aa812a6b 100644 --- a/l

[PATCH 11/52] searchidxshard: allow msgref to be undef

2020-10-27 Thread Eric Wong
We don't need to keep it in code paths which are guaranteed to only see PublicInbox::Eml (and not Email::MIME or PublicInbox::MIME which did not round-trip properly). However, we must set {raw_bytes} since PublicInbox::Eml may add an extra "\n" for rare messages with no bodies. --- lib/PublicInbo

[PATCH 07/52] search: xdb_sharded: make this a public method for ExtSearch

2020-10-27 Thread Eric Wong
We can simplify callers by using $self->{xpfx} instead of passing another arg on the stack. --- lib/PublicInbox/ExtSearch.pm | 2 +- lib/PublicInbox/Search.pm| 10 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/ExtSearch.pm b/lib/PublicInbox/ExtSearc

[PATCH 08/52] searchidx: introduce "xref3" concept

2020-10-27 Thread Eric Wong
This will be used to track cross-posted messages in the external/detached index. --- lib/PublicInbox/SearchIdx.pm | 78 ++- lib/PublicInbox/SearchIdxShard.pm | 53 ++--- lib/PublicInbox/Smsg.pm | 13 ++ t/search.t

[PATCH 13/52] v2writable: count_shards: allow working without {ibx}

2020-10-27 Thread Eric Wong
This will be needed for ExtSearchIdx which doesn't have a persistent PublicInbox::Inbox object. --- lib/PublicInbox/V2Writable.pm | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index f575ba11..667a11f8

[PATCH 09/52] v2writable: prepare initialization for external indices

2020-10-27 Thread Eric Wong
External indices won't have $self->{ibx} since it needs to deal with multiple inboxes. We can also hoist out ->parallel_init to make it easier to distinguish the non-parallel control flow. --- lib/PublicInbox/V2Writable.pm | 30 ++ 1 file changed, 18 insertions(+), 12

[PATCH 12/52] v2writable: idx_shard: simplify callers

2020-10-27 Thread Eric Wong
This will make it easier-to-use in ExtSearchIdx. --- lib/PublicInbox/V2Writable.pm | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index aa812a6b..f575ba11 100644 --- a/lib/PublicInbox/V2Writable.pm

[PATCH 20/52] searchidx: index eidx_key as a boolean term

2020-10-27 Thread Eric Wong
Using `O' (owner) here (according Xapian omega's termprefixes.rst) since we could say the newsgroup or inbox is the owner of the given message. --- lib/PublicInbox/SearchIdx.pm | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 0

[PATCH 27/52] v2writable: rename {v2w} field to {self}

2020-10-27 Thread Eric Wong
This will make it easier to reuse some indexing code for ExtSearchIdx. --- lib/PublicInbox/V2Writable.pm | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index c0306e82..3d3c25ec 100644 --- a/lib/Public

[PATCH 22/52] searchidxshard: special init for eidx

2020-10-27 Thread Eric Wong
Having a special init path for external indices is probably easier than further overloading SearchIdx->new initialization to work without an Inbox object. --- lib/PublicInbox/SearchIdx.pm | 13 + lib/PublicInbox/SearchIdxShard.pm | 7 --- 2 files changed, 17 insertions(+), 3

[PATCH 19/52] extsearchidx: initial implementation

2020-10-27 Thread Eric Wong
It compiles... --- MANIFEST| 1 + lib/PublicInbox/ExtSearchIdx.pm | 311 t/extsearch.t | 1 + 3 files changed, 313 insertions(+) create mode 100644 lib/PublicInbox/ExtSearchIdx.pm diff --git a/MANIFEST b/MANIFEST inde

[PATCH 28/52] v2writable: make *last_commits and sync_prepare OO methods

2020-10-27 Thread Eric Wong
This will allow ExtSearchIdx to override or reuse them more easily. Unfortunately we lose prototype validation, but that seems to be discouraged anyways given the 'signatures' feature in Perl 5.20+. --- lib/PublicInbox/V2Writable.pm | 9 + 1 file changed, 5 insertions(+), 4 deletions(-)

[PATCH 33/52] v2writable: pass oid to uindex_oid

2020-10-27 Thread Eric Wong
We'll be validating against this in the future to stop bugs from creeping in. --- lib/PublicInbox/V2Writable.pm | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index c8b01a3d..efda7907 100644 --- a/lib/PublicIn

[PATCH 35/52] searchidx: export prepare_stack

2020-10-27 Thread Eric Wong
We'll be needing it in ExtSearchIdx for the next commit. --- lib/PublicInbox/SearchIdx.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 33c81ea8..0c0e844a 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/Pub

[PATCH 26/52] v2writable: allow OO method references

2020-10-27 Thread Eric Wong
Using `->can(method)' allows subclasses to override `index_oid' and `unindex_oid' methods. --- lib/PublicInbox/V2Writable.pm | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index 7c7be1bd..c0306e8

[PATCH 25/52] v2writable: more generic sync setup code

2020-10-27 Thread Eric Wong
We want to reuse this code for ExtSearchIdx, eventually. --- lib/PublicInbox/V2Writable.pm | 29 - 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index a403f22f..7c7be1bd 100644 --- a/lib/Pub

[PATCH 32/52] extsearchidx: remove {unindex_range} field

2020-10-27 Thread Eric Wong
Moved to per-epoch "units". --- lib/PublicInbox/ExtSearchIdx.pm | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm index 609151e4..5e72d65d 100644 --- a/lib/PublicInbox/ExtSearchIdx.pm +++ b/lib/PublicInbox/ExtSearchIdx.pm @@ -237,7

[PATCH 18/52] v2writable: checkpoint: account for lack of {mm}

2020-10-27 Thread Eric Wong
ExtSearchIdx will not have Msgmap, since it may index non email blobs in the future (it'll still be usable with IMAP, but not NNTP). --- lib/PublicInbox/V2Writable.pm | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/

[PATCH 29/52] v2writable: move size check init to sync_prepare

2020-10-27 Thread Eric Wong
This will let us use it from ExtSearchIdx. --- lib/PublicInbox/V2Writable.pm | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index ca60f2a1..d417b125 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicI

[PATCH 30/52] extsearchidx: more compatibility with V2Writable callers

2020-10-27 Thread Eric Wong
We'll use `index_oid' and `unindex_oid' as our method names so V2Writable methods may use `$self->can' to access them. --- lib/PublicInbox/ExtSearchIdx.pm | 64 + 1 file changed, 34 insertions(+), 30 deletions(-) diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/P

[PATCH 37/52] searchidx: reduce inbox-dependency, wrap ->with_umask

2020-10-27 Thread Eric Wong
This will let us work consistently with both existing inboxes and external indices. --- lib/PublicInbox/SearchIdx.pm | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 0c0e844a..ea884434 100644 --- a

[PATCH 24/52] searchidx: log2stack: simplify callers

2020-10-27 Thread Eric Wong
Since we store {ibx} in $sync state, we no longer have to pass it as an argument to log2stack. --- lib/PublicInbox/SearchIdx.pm | 8 lib/PublicInbox/V2Writable.pm | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchI

[PATCH 31/52] v2writable: reduce scope of epoch-aware code

2020-10-27 Thread Eric Wong
And clearly label it. We may try to reuse some of this for v1 indexing code paths. --- lib/PublicInbox/V2Writable.pm | 72 +-- 1 file changed, 35 insertions(+), 37 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index d417b1

[PATCH 17/52] v2writable: rename remaining "remote" terminology

2020-10-27 Thread Eric Wong
"remote" used to imply "child process on the same machine" which was somewhat non-sensical, anyways. And OverIdx has been in the same process since v2 was finalized. So use the suffix "aux" for "auxiliary" since it can be safely jettisoned without breaking URLs. --- lib/PublicInbox/V2Writable.pm

[PATCH 34/52] extsearchidx: sync unit updates

2020-10-27 Thread Eric Wong
Now that the V2Writable code is more generic, we can sync with it to use `units' which represent either a v2 epoch or an entire v1 inbox. --- lib/PublicInbox/ExtSearchIdx.pm | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/Publ

[PATCH 23/52] searchidx: put {ibx} into $sync state

2020-10-27 Thread Eric Wong
This will allow reusability with ExtSearchIdx --- lib/PublicInbox/SearchIdx.pm | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 029b2726..d3c904c7 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/l

[PATCH 21/52] searchidx: xref3 delete support

2020-10-27 Thread Eric Wong
Not yet tested, but Perl compiles it! --- lib/PublicInbox/SearchIdx.pm | 50 ++-- 1 file changed, 31 insertions(+), 19 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 5171c610..0458d9c3 100644 --- a/lib/PublicInbox/Search

[PATCH 43/52] searchidx: remove xref3 support for Xapian

2020-10-27 Thread Eric Wong
It doesn't seem worth storing xref3 data in Xapian now that the same info is in over.sqlite3. --- lib/PublicInbox/ExtSearchIdx.pm | 10 +++-- lib/PublicInbox/SearchIdx.pm | 64 +++ lib/PublicInbox/SearchIdxShard.pm | 28 +++--- lib/PublicInbox/Smsg.pm

[PATCH 51/52] extsearchidx: support --batch-size checkpoints

2020-10-27 Thread Eric Wong
This is needed to limit the RSS of processes and ensure the stored data in over.sqlite3 and Xapian DBs are consistent if interrupted. Without checkpoints, indexing lore causes shard workers to take several GB of memory and thrash/OOM smaller systems. --- lib/PublicInbox/ExtSearchIdx.pm | 20 +

[PATCH 38/52] searchidx: favor $sync->{ibx} (over $self->{ibx})

2020-10-27 Thread Eric Wong
In case we want to reuse code with ExtSearchIdx or V2Writable. --- lib/PublicInbox/SearchIdx.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index ea884434..32fa16f5 100644 --- a/lib/PublicInbox/SearchIdx.pm +++

[PATCH 41/52] index: eindex wiring

2020-10-27 Thread Eric Wong
This doesn't do anything, yet, but it will once the rest of the eindex stuff works. --- script/public-inbox-index | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/script/public-inbox-index b/script/public-inbox-index index 5dad6ecb..55e4f641 100755 --- a/script/public-inbox-in

[PATCH 40/52] script: add preliminary eindex implementation

2020-10-27 Thread Eric Wong
Not documented, yet, but it runs... --- MANIFEST | 1 + script/public-inbox-eindex | 43 ++ t/extsearch.t | 26 +++ 3 files changed, 70 insertions(+) create mode 100644 script/public-inbox-eindex diff --git a

[PATCH 44/52] t/extsearch.t: verify results and xref3 ordering

2020-10-27 Thread Eric Wong
We want NNTP clients to see consistent Xref: headers to ensure client-side caches don't get confused. --- t/extsearch.t | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/t/extsearch.t b/t/extsearch.t index dfec6b6f..108ffaeb 100644 --- a/t/extsearch.t +++ b/t/extsea

[PATCH 50/52] extsearchidx: set current_info in warning callbacks

2020-10-27 Thread Eric Wong
This bit is duplicated with per-Inbox indexing in Admin, undecided if it's the right place for it. --- lib/PublicInbox/ExtSearchIdx.pm | 5 + 1 file changed, 5 insertions(+) diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm index bfe39891..050c4252 100644 --- a/li

[PATCH 45/52] t/v2writable: remove pointless ->barrier call

2020-10-27 Thread Eric Wong
We don't actually use it anywhere, and may not need it in the future. --- t/v2writable.t | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/t/v2writable.t b/t/v2writable.t index 2f71fafa..358a2bb7 100644 --- a/t/v2writable.t +++ b/t/v2writable.t @@ -274,14 +274,13 @@ EOF

[PATCH 47/52] extsearchidx: handle edits

2020-10-27 Thread Eric Wong
We can now handle cases where messages are edited in one inbox but not another, bifurcating the message. V2Writable::log_range handles some edge-cases which could happen in v2-only code paths, as well, but weren't usually triggered due to default git-gc knobs not pruning immediately --- lib/Publi

[PATCH 48/52] extsearch: wire up remaining Inbox-like methods for WWW

2020-10-27 Thread Eric Wong
This lets us pretend an ExtSearch object is an Inbox object in most of the existing WWW code. --- lib/PublicInbox/Config.pm| 12 + lib/PublicInbox/ExtSearch.pm | 25 ++ lib/PublicInbox/Inbox.pm | 51 ++-- lib/PublicInbox/WWW.pm

[PATCH 52/52] searchidxshard: make warnings with eidx_key less confusing

2020-10-27 Thread Eric Wong
Seeing "Xorg.foo.bar" can be confusing in warnings if the eidx_key is only "org.foo.bar" with no relation to "Xorg" at all. Furthermore, printing "\0" to log or terminal output isn't very nice and could throw off some users/tools. --- lib/PublicInbox/SearchIdxShard.pm | 5 +++-- 1 file changed, 3

<    11   12   13   14   15   16   17   18   19   20   >