[PATCH 2/2] dir_idle: require Perl 5.22+ for kqueue

2020-08-07 Thread Eric Wong
IO::KQueue requires us to use fileno(DIRHANDLE) for setting up kqueue watches. This use of fileno() is only supported since Perl 5.22, so BSD users on older Perl will have to fall back to old polling. This affects users of -watch, currently; but will affect other read-only Xapian users soon. ---

[PATCH 0/2] Perl <5.22 fixes

2020-08-07 Thread Eric Wong
fileno(DIRHANDLE) didn't work until Perl 5.22, so we'll fall back to using some Inline::C for setting No_COW on btrfs and polling for -watch on *BSD with older Perl. Eric Wong (2): support setting No_COW on Perl <5.22 dir_idle: require Perl 5.22+ for kqueue lib/PublicInbox/DirIdle.pm | 3

[PATCH 1/2] support setting No_COW on Perl <5.22

2020-08-07 Thread Eric Wong
fileno(DIRHANDLE) only works on Perl 5.22+, so we need to use dirfd(3) ourselves from Inline::C (or rely on chattr(1) being installed). While we're at it, rename `set_nodatacow' to `nodatacow_fd' for consistency with `nodatacow_dir'. --- lib/PublicInbox/Msgmap.pm| 2 +- lib/PublicInbox/NDC_P

Re: [PATCH 1/5] v2writable: fix batch size accounting

2020-08-07 Thread Eric Wong
Eric Wong wrote: > We need to account for whether shard parallelization is > enabled or not, since users of parallelization are expected > to have more RAM. > --- > lib/PublicInbox/V2Writable.pm | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/lib/PublicInbox/V

[PATCH 1/5] v2writable: fix batch size accounting

2020-08-07 Thread Eric Wong
We need to account for whether shard parallelization is enabled or not, since users of parallelization are expected to have more RAM. --- lib/PublicInbox/V2Writable.pm | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2W

[PATCH 0/5] more indexing improvements

2020-08-07 Thread Eric Wong
VERY big batch sizes seem helpful on HDDs.. And I also blew up a run because --compact ran in parallel with 32 shards :x And --help should exist for all commands users may run from the CLI. Eric Wong (5): v2writable: fix batch size accounting index: --compact respects --sequential-shard in

[PATCH 5/5] index: add built-in --help / -?

2020-08-07 Thread Eric Wong
Eventually, commonly-used commands run by the user will all support --help / -? for user-friendliness. The changes from up-front `use' to lazy `require' speed up `--help' by 3x or so. --- Documentation/public-inbox-index.pod | 4 +-- script/public-inbox-index| 44 +++

[PATCH 4/5] searchidx: use Perl truthiness to detect XAPIAN_FLUSH_THRESHOLD

2020-08-07 Thread Eric Wong
XAPIAN_FLUSH_THRESHOLD is a C string in the environment, so users may be tempted to assign an empty string in in their shell, e.g. `XAPIAN_FLUSH_THRESHOLD= ' instead of using `unset' POSIX shell built-in. With either a value of "0" or "" (empty string), Xapian will fall back to its default (1

[PATCH 2/5] index: --compact respects --sequential-shard

2020-08-07 Thread Eric Wong
Since the --compact switch works on Xapian shards, it makes sense that --sequential-shard affects our usage of xapian-compact(1). --- script/public-inbox-index | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/script/public-inbox-index b/script/public-inbox-index index dc9bdd

[PATCH 3/5] index: max out XAPIAN_FLUSH_THRESHOLD if using --batch-size

2020-08-07 Thread Eric Wong
If XAPIAN_FLUSH_THRESHOLD is unset, Xapian will default to 1. That limits the effectiveness of users specifying extremely large values of --batch-size. While we're at it, localize the changes to globals since -index may be eval-ed in tests (and perhaps production code in the future). --- scr

[PATCH] www: avoid warnings on YYYYMMDD-only t= query parameter

2020-08-07 Thread Eric Wong
While we always generate MMDDhhmmss query parameters ourselves, the regexps in paginate_recent allow MMDD-only (no hhmmss) timestamps, so don't trigger Time::Local::timegm warnings about empty numeric comparisons on empty strings when a client starts making up their own URLs. --- lib/Publi

[PATCH] syscall: support sparc64 (and maybe other big-endian systems)

2020-08-07 Thread Eric Wong
Thanks to the GCC compile farm project, we can wire up syscalls for sparc64 and set system-specific SFD_* constants properly. I've FINALLY figured out how to use POSIX::SigSet to generate a usable buffer for the syscall perlfunc. This is required for endian-neutral behavior and relevant to sparc6