Re: [PATCH 05/23] init: support --newsgroup option

2020-08-20 Thread Eric Wong
Eric Wong wrote: > +Some of the options documented in L > +require editing the config file. Old versions lack the > +C<-n>/C<--newsgroup> parameter While working on this, I realized -n vs -N could be confusing, so I made the abbreviation --ng instead. So I'll squash this in before pushing: dif

[PATCH 11/23] search: export mdocid subroutine

2020-08-20 Thread Eric Wong
No need to have awkward globrefs for this. --- lib/PublicInbox/IMAP.pm | 3 +-- lib/PublicInbox/Search.pm | 2 ++ 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 3d66f930..562c59d4 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/l

[PATCH 10/23] search: improve comments around constants

2020-08-20 Thread Eric Wong
We'll probably be adding more value columns like THREADID to sort on. --- lib/PublicInbox/Search.pm | 63 +-- 1 file changed, 34 insertions(+), 29 deletions(-) diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm index 4d02a7c1..593040a8 100644 --

[PATCH 14/23] smsg: reduce utf8::decode call sites

2020-08-20 Thread Eric Wong
Both callers of load_from_data call utf8::decode, so just do utf8::decode in load_from_data. --- lib/PublicInbox/Over.pm | 1 - lib/PublicInbox/Smsg.pm | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm index 2b314882..81b9fca7 1

[PATCH 06/23] init: drop -N alias for --skip-artnum

2020-08-20 Thread Eric Wong
It may be too easily confused for --newsgroup or --ng. This is too rarely used and never made it into a release, so it should be fine. --- Documentation/public-inbox-init.pod | 2 +- script/public-inbox-init| 2 +- t/init.t| 4 ++-- 3 files changed, 4 inser

[PATCH 18/23] extmsg: avoid using Xapian docdata

2020-08-20 Thread Eric Wong
Once again, over.sqlite3 contains everything necessary for Message-ID resolution. Also, Xapian may be completely unnecessary with the advent of over.sqlite3, but that's for another time. --- lib/PublicInbox/ExtMsg.pm | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) d

[PATCH 17/23] searchview: convert nested and Atom display to over.sqlite3

2020-08-20 Thread Eric Wong
git blob retrieval dominates on these, "&x=t" (nested) is roughly the same due to increased overhead for ->get_percent storage balancing out the mass-loading from SQLite. Atom "&x=A" is sped up slightly and uses less memory in the long-lived response. --- lib/PublicInbox/SearchView.pm | 25 ++

[PATCH 12/23] searchquery: split off from searchview

2020-08-20 Thread Eric Wong
Since this was already a separate package, split it off into its own file since SearchView may not handle inbox groups. --- MANIFEST | 1 + lib/PublicInbox/SearchQuery.pm | 53 ++ lib/PublicInbox/SearchView.pm | 53 ++-

[PATCH 08/23] xapcmd: simplify {reindex} parameter passing

2020-08-20 Thread Eric Wong
No need to localize it, here, since we can just refer to it in the `$opt' hashref. Hopefully this improves readability for others like it does for me. I sometimes wonder if the concept of a stack in high-level languages is even necessary... --- lib/PublicInbox/Xapcmd.pm | 20 +---

[PATCH 15/23] searchview: use over.sqlite3 instead of Xapian docdata

2020-08-20 Thread Eric Wong
This is a step towards improving kernel page cache hit rates by relying on over.sqlite3 for document data instead of Xapian. Some micro-optimization to over->get_art was required to maintain performance. --- lib/PublicInbox/Over.pm | 18 +- lib/PublicInbox/SearchView.pm | 10

[PATCH 04/23] init: support --help and -?

2020-08-20 Thread Eric Wong
And speed those up with some lazy loading, too. --- script/public-inbox-init | 79 +--- 1 file changed, 50 insertions(+), 29 deletions(-) diff --git a/script/public-inbox-init b/script/public-inbox-init index 1c8066df..6852f64a 100755 --- a/script/public-inbox-

[PATCH 07/23] search: v2: ensure shards are numerically sorted

2020-08-20 Thread Eric Wong
This seems required to correctly get the NNTP article number from Xapian docid on combined Xapian DBs. The default (ASCII-betical) sorting was only acceptable for -imapd users until somebody hit 11 (or more) shards, which is a rare case. --- lib/PublicInbox/Search.pm | 27

[PATCH 21/23] t/nntpd-v2: set PI_TEST_VERSION=2 properly

2020-08-20 Thread Eric Wong
Numbers are hard :< --- t/nntpd-v2.t | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/nntpd-v2.t b/t/nntpd-v2.t index 7fc3447e..1dd992a0 100644 --- a/t/nntpd-v2.t +++ b/t/nntpd-v2.t @@ -1,4 +1,4 @@ # Copyright (C) 2019-2020 all contributors # License: AGPL-3.0+

[PATCH 13/23] search: make qparse_new an internal function

2020-08-20 Thread Eric Wong
We'll probably be reusing it from another package in a future commit. --- lib/PublicInbox/Search.pm | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm index f98513d3..e6200bfb 100644 --- a/lib/PublicInbox/Sear

[PATCH 23/23] search: add mset_to_artnums method

2020-08-20 Thread Eric Wong
We can avoid importing mdocid() in several places by using this method, simplifying callers. --- lib/PublicInbox/ExtMsg.pm | 4 +--- lib/PublicInbox/IMAP.pm | 4 +--- lib/PublicInbox/Mbox.pm | 7 ++- lib/PublicInbox/Search.pm | 6 ++ lib/PublicInbox/SearchView.pm | 6 ++

[PATCH 20/23] smsg: remove from_mitem

2020-08-20 Thread Eric Wong
We no longer read docdata.glass from anywhere in our code base. Some adjustments were needed to t/search.t to deal with the Xapian::WritableDatabase committing at different times, since our ->query is avoided from PublicInbox::SearchIdx to avoid needing a {over_ro} field. --- lib/PublicInbox/Sear

[PATCH 16/23] searchview: speed up search summary by ~10%

2020-08-20 Thread Eric Wong
Instead of loading one article at-a-time from over.sqlite3, we can use SQL to mass-load IN (?,?, ...) all results with a single SQLite query. Despite SQLite being in-process and having no network latency, the reduction in SQL query executions from loading multiple rows at once speeds things up sig

[PATCH 05/23] init: support --newsgroup option

2020-08-20 Thread Eric Wong
We can reduce the need to edit the config file for NNTP group names this way. --- Documentation/public-inbox-config.pod | 2 +- Documentation/public-inbox-init.pod | 25 + script/public-inbox-init | 12 ++-- t/imapd.t | 6

[PATCH 09/23] www: reduce long-lived PublicInbox::Search references

2020-08-20 Thread Eric Wong
While this is unlikely to be a problem in current practice, keeping Xapian DBs open for long responses can interfere with free space recovery after -compact. In the future, it will interfere with inbox search grouping and lead to unexpected results. --- lib/PublicInbox/Inbox.pm | 11

[PATCH 19/23] mbox: avoid Xapian docdata in search results

2020-08-20 Thread Eric Wong
Another place where we can reduce kernel page cache overhead by hitting over.sqlite3 instead of docdata.glass. --- lib/PublicInbox/Mbox.pm | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm index a83c0356.

[PATCH 22/23] init+index: support --skip-docdata for Xapian

2020-08-20 Thread Eric Wong
Since we no longer read document data from Xapian, allow users to opt-out of storing it. This breaks compatibility with previous releases of public-inbox, but gives us a ~1.5% space savings on Xapian storage (and associated I/O and page cache pressure reduction). --- Documentation/public-inbox-in

[PATCH 01/23] doc: note -compact and -xcpdb are rarely used

2020-08-20 Thread Eric Wong
Slowly improving the learning curve... --- Documentation/public-inbox-compact.pod | 5 + Documentation/public-inbox-xcpdb.pod | 3 +++ 2 files changed, 8 insertions(+) diff --git a/Documentation/public-inbox-compact.pod b/Documentation/public-inbox-compact.pod index 8e463ab1..4e9b6d9f 1006

[PATCH 03/23] compact: support --help/-? and perform lazy loading

2020-08-20 Thread Eric Wong
This probably won't be used much, but --help can still make sense. --- script/public-inbox-compact | 39 +++-- 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/script/public-inbox-compact b/script/public-inbox-compact index b5fa0086..a6bb62bd 100755 -

[PATCH 02/23] admin: progress shows the inbox being indexed

2020-08-20 Thread Eric Wong
This is helpful with --all, or when multiple inboxes are being indexed. --- lib/PublicInbox/Admin.pm | 3 +++ 1 file changed, 3 insertions(+) diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm index d99a00b4..f5427af7 100644 --- a/lib/PublicInbox/Admin.pm +++ b/lib/PublicInbox/Admin

[PATCH 00/23] indexing: --skip-docdata + speedups

2020-08-20 Thread Eric Wong
Some miscellaneous help and cleanup things, too. Document data is no longer read from Xapian by read-only daemons; that data is redundant given over.sqlite3 always exists. This should should improve page cache hit rates for over.sqlite3 by a small bit. Being able to mass load a bunch of rows fro