[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread ArielGlenn
ArielGlenn added a comment. In T207030#4703693, @Smalyshev wrote: Ah, ok, didn't see your comment - yes, we probably need to reduce or cancel small file check for lexemes, or eliminate empty shards. I am not sure how easy it is to do the latter - I am on vacation this week so I'd start with the fo

[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread ArielGlenn
ArielGlenn added a comment. Job has not started yet so the change should have made it out in time for this week's run.TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ArielGlennCc: gerritbot, Aklapper, Smalysh

[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread gerritbot
gerritbot added a comment. Change 470447 merged by ArielGlenn: [operations/puppet@production] Reduce small file size for lexemes https://gerrit.wikimedia.org/r/470447TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferenc

[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread gerritbot
gerritbot added a comment. Change 470447 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [operations/puppet@production] Reduce small file size for lexemes https://gerrit.wikimedia.org/r/470447TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricato

[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread Smalyshev
Smalyshev added a comment. Ah, ok, didn't see your comment - yes, we probably need to reduce or cancel small file check for lexemes, or eliminate empty shards. I am not sure how easy it is to do the latter - I am on vacation this week so I'd start with the former and go back to the latter after I'm

[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread Smalyshev
Smalyshev added a comment. Small batches is normal - these are empty or semi-empty shards I guess. I wonder though why they are not proceeded to create full dump. Maybe small file check is not correct?TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia

[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread ArielGlenn
ArielGlenn added a comment. root@snapshot1008:~# more /var/log/wikidatadump/dumpwikidatattl-wikidata-20181028-lexemes-BETA-main.log File size of is only 518223. Aborting. The file size cutoff is 2000/8 = 250. So that's why no files wind up in the output directory. Also, the error message

[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-29 Thread ArielGlenn
ArielGlenn added a comment. Nope. Something else is wrong. I see no cronspam, no lexeme job running now on the snapshot host, this week's json job has started, but the 'latest' file is Oct 14. There is a 20181028 directory but it is empty. There are a bunch of temp files left in /mnt/dumpsdata/xmld

[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-15 Thread ArielGlenn
ArielGlenn added a comment. This is now deployed on snapshot1008 (where cron jobs run). We'll know next Monday if this took care of the problem; let's leave the task open til then.TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/e

[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-15 Thread gerritbot
gerritbot added a comment. Change 467415 merged by ArielGlenn: [operations/puppet@production] Fix lexeme error msgs https://gerrit.wikimedia.org/r/467415TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerrit

[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-15 Thread Smalyshev
Smalyshev added a comment. I think I found the bug. From what it looks like it shouldn't have influenced the dump.TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Aklapper, Smalyshev, A

[Wikidata-bugs] [Maniphest] [Commented On] T207030: wikidata rdf dumps cron job complaining for lexemes phase

2018-10-15 Thread gerritbot
gerritbot added a comment. Change 467415 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [operations/puppet@production] Fix lexeme error msgs https://gerrit.wikimedia.org/r/467415TASK DETAILhttps://phabricator.wikimedia.org/T207030EMAIL PREFERENCEShttps://phabricator.wikimedia.o