[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-05-08 Thread ArielGlenn
ArielGlenn added a comment. Great news! TASK DETAIL https://phabricator.wikimedia.org/T74348 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, ArielGlenn Cc: JanZerebecki, Jimkont, Wikidata-bugs, Tobi_WMDE_SW, jayvdb, Svick, ArielGlenn, Ricordis

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-05-08 Thread daniel
daniel added a comment. The double-check didn't turn anything up either. The dump seems to be clean. TASK DETAIL https://phabricator.wikimedia.org/T74348 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, daniel Cc: JanZerebecki, Jimkont, Wikidata

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-05-06 Thread daniel
daniel added a comment. I'm now running the following on tool labs to find "old" serializations: daniel@tools-bastion-01:/public/dumps/public/wikidatawiki/20150330$ bzgrep ',"entity":"[qQpP][0-9]*"\}' wikidatawiki-20150330-pages-meta-history.xml.bz2 | tee ~/wikidatawiki-20150330-pages-meta-h

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-05-06 Thread daniel
daniel added a comment. @Jimkont: broken serialization of empty lists is a separate issue, unrelated to unconverted old-style serializations. TASK DETAIL https://phabricator.wikimedia.org/T74348 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !ass

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-05-06 Thread Jimkont
Jimkont added a comment. other examples of old serializations can be found here: https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/impl/json/JsonWikiParser.scala#L62-L67 TASK DETAIL https://phabricator.wikimedia.org/T74348 REPLY

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-05-05 Thread daniel
daniel added a comment. Btw, if someone can tell me where to find a full history dump of wikidata, I'd be happy to check this myself. The annoying part here is to download and store the behemoth... TASK DETAIL https://phabricator.wikimedia.org/T74348 REPLY HANDLER ACTIONS Reply to comment

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-05-05 Thread daniel
daniel added a comment. @JanZerebecki: Redirects are serialized like this: {"entity":"Q23","redirect":"Q42"} Old style serialization ends with this: ,"entity":"q207"} So, if you grep for `,"entity"}`, you should find only old style serializations. Also, old style serialization will conta

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-04-07 Thread daniel
daniel added a comment. Fore redirects, the encoding {"entity"} is correct. There is no "old" encoding for redirects, entity redirects didn't exist when we used the old serialization format. So, searching for "entity" is not a good indicator for detecting old-style serialization. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-04-07 Thread hoo
hoo added a comment. @Daniel: Could you have a quick look at this? Looks fixed to me, but I think you're the only one who can tell for sure. TASK DETAIL https://phabricator.wikimedia.org/T74348 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assi

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-03-04 Thread ArielGlenn
ArielGlenn added a comment. Is anyone looking at the redirects serialization? TASK DETAIL https://phabricator.wikimedia.org/T74348 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/set

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-02-27 Thread ArielGlenn
ArielGlenn added a comment. Um, "with this format" means new redirects are dumped with {"entity" ... etc. TASK DETAIL https://phabricator.wikimedia.org/T74348 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https:/

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-02-27 Thread ArielGlenn
ArielGlenn added a comment. OK, I no longer feel as stupid. The number of items with the 'entity' format is small in comparison to the total number of qualities, we would expect the opposite if old revisions were being kept as is. And as I said I had checked with local testing that the expor

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-02-26 Thread ArielGlenn
ArielGlenn added a comment. ugh, I stare it for an hour and I'm still blind. Let me look at it for another hour... sorry. TASK DETAIL https://phabricator.wikimedia.org/T74348 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFE

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-02-26 Thread hoo
hoo added a comment. In https://phabricator.wikimedia.org/T74348#1069658, @ArielGlenn wrote: > right. this is what you want; the old style 'entity' is gone, the new > style 'descriptions' is present. or am I missing something? To me it seems like the old style entity is still present. TA

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-02-26 Thread ArielGlenn
ArielGlenn added a comment. right. this is what you want; the old style 'entity' is gone, the new style 'descriptions' is present. or am I missing something? TASK DETAIL https://phabricator.wikimedia.org/T74348 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim,

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-02-24 Thread hoo
hoo added a comment. In https://phabricator.wikimedia.org/T74348#1059331, @ArielGlenn wrote: > Hello? Any wikidata dumps consumers on this ticket? Otherwise I'll ask in > xmlatadumps-l. In https://phabricator.wikimedia.org/T74348#768660, @daniel wrote: > Bumping to critical, since it may

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-02-24 Thread hoo
hoo added a comment. In https://phabricator.wikimedia.org/T74348#1062351, @Lydia_Pintscher wrote: > @hoo: could you have a look? Just kicked of the download of a dump, I'll verify some old revisions once that's done (later today). TASK DETAIL https://phabricator.wikimedia.org/T74348 REPLY

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-02-24 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. @hoo: could you have a look? TASK DETAIL https://phabricator.wikimedia.org/T74348 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emai

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-02-23 Thread ArielGlenn
ArielGlenn added a comment. Hello? Any wikidata dumps consumers on this ticket? Otherwise I'll ask in xmlatadumps-l. TASK DETAIL https://phabricator.wikimedia.org/T74348 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign . EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-02-12 Thread ArielGlenn
ArielGlenn added a comment. I ran a series of tests locally and also checked production output. I can verify that the transform is actually applied, the output looks good to me for prefetch or from the database, but a consumer of the data should probably look at it for 5 seconds to verify that

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2015-01-12 Thread ArielGlenn
ArielGlenn added a comment. Thanks for the patch! I will check it out in the next couple of days. I'm really sorry for the long delay; I've been out for medical reasons and am now trying to get caught up on everything. TASK DETAIL https://phabricator.wikimedia.org/T74348 REPLY HANDLER ACT

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2014-11-26 Thread ArielGlenn
ArielGlenn added a comment. Old revisions are indeed read from the old dump, as long as the length of the revision text is correct. And indeed this is a necessity; the db servers cannot handle requests for all revisions anew, and even if they could the dumps would take many times loger to gener

[Wikidata-bugs] [Maniphest] [Commented On] T74348: Wikidata dumps contain old-style serialization.

2014-11-26 Thread hoo
hoo added a comment. >>! In T74348#787697, @Lydia_Pintscher wrote: > Can I please have a status update on this? Do we know why it is happening? As far as I know the problem is that during dump creation content from the last dump is being scraped in case nothing changed. That's probably fine for