[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-02-20 Thread Smalyshev
Smalyshev added a comment. It almost sounds like rather than having a live stream of edit events, or at least acting entirely on a live stream of edit events, the updater should instead do internal batching I thought about it, the problem here is that the starting point can be anything, so batchin

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-02-20 Thread Addshore
Addshore added a comment. No, this has horrible performance impact if several edits happen in a row, since it only fetches the oldid, not the latest one, so instead of one update with the latest ID you get 10 updates with each of the intermediate IDs. I think we have had this exact conversation on

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-02-19 Thread Smalyshev
Smalyshev added a comment. It might make sense to remove this and instead pass in either the oldid or revision param No, this has horrible performance impact if several edits happen in a row, since it only fetches the oldid, not the latest one, so instead of one update with the latest ID you get 1

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-02-19 Thread Addshore
Addshore added a comment. In T210044#4767023, @Smalyshev wrote: Maybe the Wikidata replica Updater is reading from is lagging behind the updates and returns an old revision? Not sure if that is possible. If it's possible to read old data after we've got Kafka message with new data, that would be a

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-02-18 Thread Smalyshev
Smalyshev added a comment. @Floatingpurr I have fixed some problematic data but still didn't find the root cause yet. Do you have some specific issues you think are related to this?TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-02-08 Thread Floatingpurr
Floatingpurr added a comment. Hey guys! Any news about this problem?TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, FloatingpurrCc: MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multic

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-01-10 Thread doctaxon
doctaxon added a comment. @Smalyshev What's the status to this task? There are still problems, -> https://www.wikidata.org/wiki/Wikidata:Request_a_query#SPARQL_query_result_erroneousTASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-05 Thread Fnielsen
Fnielsen added a comment. May clearing of deleted items be related? https://www.wikidata.org/wiki/Lexeme:L31707 has been deleted since 14 November 2018, but is still in WDQS: https://query.wikidata.org/#DESCRIBE%20wd%3AL31707TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttp

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-04 Thread Smalyshev
Smalyshev added a comment. The query is captured in F27383365.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstu

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-04 Thread Smalyshev
Smalyshev added a comment. SPARQL dumps show that data is present in SPARQL but not in the database. Filed https://github.com/blazegraph/database/issues/109 with upstream and will dig into it further to see what we can find out there.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERE

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-04 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2018-12-04T19:35:30Z] Finished deploy [wdqs/wdqs@81dac18]: Install new Updater for T210044 investigation (duration: 10m 36s)TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/set

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-04 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2018-12-04T19:24:53Z] Started deploy [wdqs/wdqs@81dac18]: Install new Updater for T210044 investigationTASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpref

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-04 Thread gerritbot
gerritbot added a comment. Change 477429 merged by Gehel: [operations/puppet@production] Enable SPARQL logging to a separate file https://gerrit.wikimedia.org/r/477429TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferen

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-03 Thread gerritbot
gerritbot added a comment. Change 477429 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [operations/puppet@production] Enable SPARQL logging to a separate file https://gerrit.wikimedia.org/r/477429TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phab

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-03 Thread gerritbot
gerritbot added a comment. Change 477410 merged by Gehel: [operations/puppet@production] Stop RDF dumps https://gerrit.wikimedia.org/r/477410TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritb

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-03 Thread gerritbot
gerritbot added a comment. Change 477410 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [operations/puppet@production] Stop RDF dumps https://gerrit.wikimedia.org/r/477410TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/sett

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-03 Thread Smalyshev
Smalyshev added a comment. RDF dumps confirm that data is coming fine through This also happens for single-update items, which haven't been touched for a while, so it's not some kind of update race. Weird thing is it happens on multiple servers in the same way. E.g. check on Q3601865 reveals: ['w

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-28 Thread Smalyshev
Smalyshev added a comment. @Lea_Lacroix_WMDE yes, this is possible.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow,

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-26 Thread gerritbot
gerritbot added a comment. Change 475243 merged by Gehel: [operations/puppet@production] Enable dumping RDF on test & internal https://gerrit.wikimedia.org/r/475243TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-22 Thread gerritbot
gerritbot added a comment. Change 475241 merged by Gehel: [operations/puppet@production] Enable dumping RDF data for debugging purposes https://gerrit.wikimedia.org/r/475241TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpr

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-21 Thread Smalyshev
Smalyshev added a comment. timings for the last one: wdq5: 2018-11-21T23:35:56Z wdq4: 2018-11-21T23:35:04Z wdq6: 2018-11-21T23:34:44Z wdq21: 2018-11-21T23:34:55Z wdq22: 2018-11-21T23:34:49Z wdq23: 2018-11-21T23:35:07Z wdq3: 2018-11-21T23:35:03Z wdq7: 2018-11-21T23:34:51Z wdq8: 2018-11-21T23:34:56Z

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-21 Thread Smalyshev
Smalyshev added a comment. I also discover for some items the data is not the latest revision: e.g. for Q57529925 we have all servers except wdq5 on 795730255 but wdq5 on 795729753. This seems to be related to bursts of robotic edits on the same entry, which may suggest there's some kind of race c

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-21 Thread gerritbot
gerritbot added a comment. Change 475243 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [operations/puppet@production] Enable dumping RDF on test & internal https://gerrit.wikimedia.org/r/475243TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabric

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-21 Thread gerritbot
gerritbot added a comment. Change 475241 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [operations/puppet@production] Enable dumping RDF data for debugging purposes https://gerrit.wikimedia.org/r/475241TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps:

[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-20 Thread Smalyshev
Smalyshev added a comment. Timestamps for data updates: wdq10: 2018-11-20T05:49:19Z wdq6: 2018-11-20T05:49:25Z wdq26: 2018-11-20T05:49:31Z wdq21: 2018-11-20T05:49:32Z wdq22: 2018-11-20T05:49:32Z wdq3: 2018-11-20T05:49:35Z wdq7: 2018-11-20T05:49:40Z wdq9: 2018-11-20T05:49:39Z wdq8: 2018-11-20T05: