Re: [TESTING] WIP - parallel shards on HDD with sequential flush

2020-08-12 Thread Eric Wong
Eric Wong wrote: > Waiting to see if it slows down as the Xapian DBs get bigger... It does :< -- unsubscribe: one-click, see List-Unsubscribe header archive: https://public-inbox.org/meta/

[TESTING] WIP - parallel shards on HDD with sequential flush

2020-08-12 Thread Eric Wong
Frequent flushing to save RAM with HDD is horrible with random writes Xapian tends to do; especially when parallelized I think just making the Xapian commits in sequence while the random reads + in-memory changes are still parallelized is doable, though... With this, --no-fsync may even be detrim

[PATCH 2/6] xcpdb: support --no-fsync from CLI

2020-08-12 Thread Eric Wong
This was omitted in 8b1950055d51d436 :x Fixes: 8b1950055d51d436 ("index+xcpdb: rename `--no-sync' to `--no-fsync'") --- script/public-inbox-xcpdb | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/script/public-inbox-xcpdb b/script/public-inbox-xcpdb index fcd961488..2c91598

[PATCH 6/6] v2writable: remove IdxStack import

2020-08-12 Thread Eric Wong
We use IdxStack via log2stack() from SearchIdx, now. --- lib/PublicInbox/V2Writable.pm | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index 72198a298..d99e476aa 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicInbox/V2Writ

[PATCH 3/6] xapcmd: reduce CPU idling when shards exceeds job count

2020-08-12 Thread Eric Wong
In case there's unbalanced shards AND we're limiting parallelism while using many shards, spawn the next task in the queue ASAP once a task is done, instead of waiting for all tasks to finish before spawning the next batch. Unbalanced shards probably isn't a big issue for most users; however many

[PATCH 4/6] admin: don't warn when --jobs exceeds shards

2020-08-12 Thread Eric Wong
Established tools like make(1), prove(1) and xargs(1) don't warn when the desired parallelism level can't be met, either. --- lib/PublicInbox/Admin.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm index ce720beb6..d99a00

[PATCH 5/6] xcpdb: wire up new index options and --help

2020-08-12 Thread Eric Wong
--sequential-shard also disables the copy parallelism (--jobs), so it can be useful for systems unable to handle parallel random I/O but still want many shards. There was a missing "use strict", too, which is fixed. --- Documentation/public-inbox-xcpdb.pod | 19 +++- lib/PublicInbox/Xapcmd.pm

[PATCH 1/6] xapcmd: simplify sub reference

2020-08-12 Thread Eric Wong
We don't need to fully-qualify when referring to subs in the same namespace, nor do we need make a SCALAR ref only to dereference it (Yes, still learning Perl :x) --- lib/PublicInbox/Xapcmd.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/Xapcmd.pm b/lib/Publ

[PATCH 0/6] xcpdb -index improvements

2020-08-12 Thread Eric Wong
Nothing terribly exciting, since xcpdb isn't really used often. But it'd be bad if it flooded the system with many parallel processes on HDD because -index was configured for many small shards. So now it now supports --sequential-shard and all the other index options. Eric Wong (6): xapcmd: si