[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-12-02 Thread ArielGlenn
ArielGlenn added a comment. We need some timing tests on these: is there a happy medium between 'best settings for compression' and 'best settings for speed'? What are we looking at in terms of execution time and space, if we add this step? We'd continue to provide bz2s I guess, since those

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-07-03 Thread ArielGlenn
ArielGlenn added a comment. I don't want to replace existing compression formats; this would be in addition to what we have. I'll have to look at the graphs to see how we are as far as CPU usage goes. Let's just do the json dump for now, if we do this. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-07-03 Thread Smalyshev
Smalyshev added a comment. I tried zstd some time ago and found that with default settings it's bigger than bz2 and with max settings it's rather slow, so I did not proceed. But I agree that decompression speed might matter too, I did not consider that. So if there's no problem storing

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-17 Thread ArielGlenn
ArielGlenn added a comment. I've run some tests using the (nfs-mounted) filesystem to which our dumps are written in production. ariel@snapshot1008:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20190513$ time (zcat wikidata-20190513-all.json.gz | gzip >

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-15 Thread bennofs
bennofs added a comment. $ time zstdcat -v -d wikidata-20190506-all.json.bz2 | zstd > /dev/null real4m5.341s user2m22.452s

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-15 Thread bennofs
bennofs added a comment. But I can do a zstd decompression -> zstd compression test. TASK DETAIL https://phabricator.wikimedia.org/T222985 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bennofs Cc: ArielGlenn, Liuxinyu970226, bennofs,

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-15 Thread bennofs
bennofs added a comment. I don't have enough disk space for a compression test, that's correct. TASK DETAIL https://phabricator.wikimedia.org/T222985 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bennofs Cc: ArielGlenn, Liuxinyu970226, bennofs,

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-15 Thread bennofs
bennofs added a comment. Now the same with zstd: $ time zstdcat -v -d wikidata-20190506-all.json.bz2 | cat > /dev/null real3m48.657s user0m3.792s sys 0m58.768s here's the sizes: 35G wikidata-20190506-all.json.bz2 39G

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-15 Thread ArielGlenn
ArielGlenn added a comment. Impressive. Would you be willing to do a compression timing test too, or is that prohibitive given your available disk space? TASK DETAIL https://phabricator.wikimedia.org/T222985 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-14 Thread bennofs
bennofs added a comment. So I tried lbzip2, here's the result (on a VM sever with 2 cores, 2.1GHz, the decompression is CPU bound): $ time lbzip2 -n2 -v -d -c wikidata-20190506-all.json.bz2 | cat > /dev/null

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-13 Thread ArielGlenn
ArielGlenn added a comment. Have you tried lbzip2? You can specify a number of threads and get some speedup for compression or decompression, even from pipes. TASK DETAIL https://phabricator.wikimedia.org/T222985 EMAIL PREFERENCES