Re: [PATCH 03/14] doc: index: more notes about latest changes

2020-08-09 Thread Eric Wong
Kyle Meyer wrote: > Eric Wong writes: > > > For L inboxes, this value is > > multiplied by the number of Xapian shards. Thus a typical v2 > > -inbox with 3 shards will flush every 3 megabytes by default. > > - > > -Default: 1m (one megabyte) > > +inbox with 3 shards will flush every 3 megabyte

Re: [PATCH 03/14] doc: index: more notes about latest changes

2020-08-09 Thread Kyle Meyer
Eric Wong writes: > For L inboxes, this value is > multiplied by the number of Xapian shards. Thus a typical v2 > -inbox with 3 shards will flush every 3 megabytes by default. > - > -Default: 1m (one megabyte) > +inbox with 3 shards will flush every 3 megabytes by default > +when unless paralle

Re: [PATCH 08/14] admin: use a generic veriable name

2020-08-09 Thread Kyle Meyer
Eric Wong writes: > [PATCH 08/14] admin: use a generic veriable name s/veriable/variable/

[PATCH 10/14] searchidx: use singular `$opt' for consistency with v2

2020-08-09 Thread Eric Wong
The rest of our indexing code uses `$opt' instead of `$opts'. --- lib/PublicInbox/SearchIdx.pm | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 7f2447fe..5c39f3d6 100644 --- a/lib/PublicIn

[PATCH 14/14] convert: set No_COW on copied SQLite files

2020-08-09 Thread Eric Wong
We'll use our existing logic and use sqlite_backup_from_file, which appeared in 1.39 (along with sqlite_backup_to_file). --- script/public-inbox-convert | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/script/public-inbox-convert b/script/public-inbox-convert index

[PATCH 12/14] convert: speed up --help

2020-08-09 Thread Eric Wong
Lazy-loading dependencies speeds up --help by several hundred milliseconds and is a huge step towards user-friendliness. --- script/public-inbox-convert | 39 ++--- 1 file changed, 19 insertions(+), 20 deletions(-) diff --git a/script/public-inbox-convert b/script/

[PATCH 13/14] convert: check ARGV more correctly

2020-08-09 Thread Eric Wong
Instead of silently ignoring excessive args, don't let a user specify an extra directory. Furthermore, we'll support the odd case where BOFH wants to name an $INBOX_DIR to be `0' :P --- script/public-inbox-convert | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/script/pub

[PATCH 11/14] convert: support new -index options

2020-08-09 Thread Eric Wong
Converting v1 inboxes from v2 can be a painful experience on HDD. Some of the new options in the CLI or config file make it less painful. --- Documentation/public-inbox-convert.pod | 19 +++ lib/PublicInbox/Admin.pm | 36 script/public-inbox-convert| 77

[PATCH 08/14] admin: use a generic veriable name

2020-08-09 Thread Eric Wong
We parse other options, too, not just --max-size --- lib/PublicInbox/Admin.pm | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm index af2b3da9..8a9a81c9 100644 --- a/lib/PublicInbox/Admin.pm +++ b/lib/PublicInbox/Admin.pm

[PATCH 02/14] index: --sequential-shard works incrementally

2020-08-09 Thread Eric Wong
We should never reindex all data in Xapian unless --reindex is specified on the command-line. This means users who put publicInbox.indexSequentialShard in their config file won't have to put up with a full reindex at every invocation, only when they specify --reindex. We'll also cleanup the progr

[PATCH 06/14] msgmap: tmp_clone: simplify + meaningful filename

2020-08-09 Thread Eric Wong
Trying to use the newer ->sqlite_backup_to_dbh method doesn't seem worth it, as we'll have to support DBD::SQLite <= 1.60 another decade or more. Dumping 'msgmap-XXX' into $INBOX_DIR can appear a bit confusing to users, so give it a "mm_tmp-$PID-" name to emphasize it's a temporary fil

[PATCH 01/14] index: require --reindex when using --xapian-only

2020-08-09 Thread Eric Wong
This to avoid user error of a currently undocumented switch; since --xapian-only always goes through the full history at the moment. --- script/public-inbox-index | 3 +++ 1 file changed, 3 insertions(+) diff --git a/script/public-inbox-index b/script/public-inbox-index index 73ca2953..9e0907be 1

[PATCH 04/14] doc: add some notes around -xcpdb / -edit / -purge

2020-08-09 Thread Eric Wong
These rarely-used commands have some caveats that needed expanding on. --- Documentation/public-inbox-edit.pod | 14 ++ Documentation/public-inbox-purge.pod | 14 ++ Documentation/public-inbox-xcpdb.pod | 15 +-- 3 files changed, 41 insertions(+), 2 deletions(-

[PATCH 00/14] more indexing related improvements

2020-08-09 Thread Eric Wong
publicInbox.indexSequentialShard now works incrementally -convert also learned all the options -index learned, so it can be less painful on HDDs. Eric Wong (14): index: require --reindex when using --xapian-only index: --sequential-shard works incrementally doc: index: some more notes about

[PATCH 07/14] avoid File::Temp::tempfile in more places

2020-08-09 Thread Eric Wong
We can use open(..., undef) natively in Perl in t/import.t In places where we need a pathname, the File::Temp OO API gives us auto-unlinking for free. --- lib/PublicInbox/V2Writable.pm | 17 + script/public-inbox-init | 9 - t/import.t| 5 ++---

[PATCH 03/14] doc: index: more notes about latest changes

2020-08-09 Thread Eric Wong
With LKML on an HDD, a giant --batch-size of 500m ends up being pretty useful. I was able to index LKML in ~16 hours on a system that had other activity on it. The big downside was it was eating up over 5g of RAM :x. We'll also fix up a duplicated indexBatchSize section, fix formatting around gl

[PATCH 05/14] index+xcpdb: improve SIG{INT,TERM,HUP,PIPE} behavior

2020-08-09 Thread Eric Wong
-index now invokes ->DESTROY like xcpdb does, which is necessary to cleanup $INBOX_DIR/msgmap-XXX files. We'll also exit with the expected values for various signals by adding 128 as described in -xcpdb now terminates worker processes and xap

[PATCH 09/14] index: cleanup internal variables

2020-08-09 Thread Eric Wong
Move away from hard-to-read alllowercase naming and favor snake_case or separated-by-dashes. We'll keep `--indexlevel' as-is for now, since it's been around for several releases; but we'll support `--index-level' in the CLI and update our documentation in a few months. We'll also clarify that pub