ArielGlenn added a comment.
Great news!
TASK DETAIL
https://phabricator.wikimedia.org/T74348
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: hoo, ArielGlenn
Cc: JanZerebecki, Jimkont, Wikidata-bugs, Tobi_WMDE_SW, jayvdb, Svick,
ArielGlenn, Ricordis
daniel added a comment.
The double-check didn't turn anything up either. The dump seems to be clean.
TASK DETAIL
https://phabricator.wikimedia.org/T74348
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: hoo, daniel
Cc: JanZerebecki, Jimkont, Wikidata
daniel added a comment.
I'm now running the following on tool labs to find "old" serializations:
daniel@tools-bastion-01:/public/dumps/public/wikidatawiki/20150330$ bzgrep
',"entity":"[qQpP][0-9]*"\}'
wikidatawiki-20150330-pages-meta-history.xml.bz2 | tee
~/wikidatawiki-20150330-pages-meta-h
daniel added a comment.
@Jimkont: broken serialization of empty lists is a separate issue, unrelated to
unconverted old-style serializations.
TASK DETAIL
https://phabricator.wikimedia.org/T74348
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !ass
Jimkont added a comment.
other examples of old serializations can be found here:
https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/impl/json/JsonWikiParser.scala#L62-L67
TASK DETAIL
https://phabricator.wikimedia.org/T74348
REPLY
daniel added a comment.
Btw, if someone can tell me where to find a full history dump of wikidata, I'd
be happy to check this myself. The annoying part here is to download and store
the behemoth...
TASK DETAIL
https://phabricator.wikimedia.org/T74348
REPLY HANDLER ACTIONS
Reply to comment
daniel added a comment.
@JanZerebecki: Redirects are serialized like this:
{"entity":"Q23","redirect":"Q42"}
Old style serialization ends with this:
,"entity":"q207"}
So, if you grep for `,"entity"}`, you should find only old style
serializations.
Also, old style serialization will conta
daniel added a comment.
Fore redirects, the encoding {"entity"} is correct. There is no "old"
encoding for redirects, entity redirects didn't exist when we used the old
serialization format.
So, searching for "entity" is not a good indicator for detecting
old-style serialization.
TASK DETAIL
hoo added a comment.
@Daniel: Could you have a quick look at this? Looks fixed to me, but I think
you're the only one who can tell for sure.
TASK DETAIL
https://phabricator.wikimedia.org/T74348
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assi
ArielGlenn added a comment.
Is anyone looking at the redirects serialization?
TASK DETAIL
https://phabricator.wikimedia.org/T74348
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign
.
EMAIL PREFERENCES
https://phabricator.wikimedia.org/set
ArielGlenn added a comment.
Um, "with this format" means new redirects are dumped with {"entity"
... etc.
TASK DETAIL
https://phabricator.wikimedia.org/T74348
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign
.
EMAIL PREFERENCES
https:/
ArielGlenn added a comment.
OK, I no longer feel as stupid. The number of items with the 'entity' format
is small in comparison to the total number of qualities, we would expect the
opposite if old revisions were being kept as is. And as I said I had checked
with local testing that the expor
ArielGlenn added a comment.
ugh, I stare it for an hour and I'm still blind. Let me look at it for another
hour... sorry.
TASK DETAIL
https://phabricator.wikimedia.org/T74348
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign
.
EMAIL PREFE
hoo added a comment.
In https://phabricator.wikimedia.org/T74348#1069658, @ArielGlenn wrote:
> right. this is what you want; the old style 'entity' is gone, the new
> style 'descriptions' is present. or am I missing something?
To me it seems like the old style entity is still present.
TA
ArielGlenn added a comment.
right. this is what you want; the old style 'entity' is gone, the new style
'descriptions' is present. or am I missing something?
TASK DETAIL
https://phabricator.wikimedia.org/T74348
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim,
hoo added a comment.
In https://phabricator.wikimedia.org/T74348#1059331, @ArielGlenn wrote:
> Hello? Any wikidata dumps consumers on this ticket? Otherwise I'll ask in
> xmlatadumps-l.
In https://phabricator.wikimedia.org/T74348#768660, @daniel wrote:
> Bumping to critical, since it may
hoo added a comment.
In https://phabricator.wikimedia.org/T74348#1062351, @Lydia_Pintscher wrote:
> @hoo: could you have a look?
Just kicked of the download of a dump, I'll verify some old revisions once
that's done (later today).
TASK DETAIL
https://phabricator.wikimedia.org/T74348
REPLY
Lydia_Pintscher added a comment.
@hoo: could you have a look?
TASK DETAIL
https://phabricator.wikimedia.org/T74348
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign
.
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emai
ArielGlenn added a comment.
Hello? Any wikidata dumps consumers on this ticket? Otherwise I'll ask in
xmlatadumps-l.
TASK DETAIL
https://phabricator.wikimedia.org/T74348
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign
.
EMAIL PREFERENC
ArielGlenn added a comment.
I ran a series of tests locally and also checked production output. I can
verify that the transform is actually applied, the output looks good to me for
prefetch or from the database, but a consumer of the data should probably look
at it for 5 seconds to verify that
ArielGlenn added a comment.
Thanks for the patch! I will check it out in the next couple of days. I'm
really sorry for the long delay; I've been out for medical reasons and am now
trying to get caught up on everything.
TASK DETAIL
https://phabricator.wikimedia.org/T74348
REPLY HANDLER ACT
ArielGlenn added a comment.
Old revisions are indeed read from the old dump, as long as the length of the
revision text is correct. And indeed this is a necessity; the db servers cannot
handle requests for all revisions anew, and even if they could the dumps would
take many times loger to gener
hoo added a comment.
>>! In T74348#787697, @Lydia_Pintscher wrote:
> Can I please have a status update on this? Do we know why it is happening?
As far as I know the problem is that during dump creation content from the last
dump is being scraped in case nothing changed. That's probably fine for
23 matches
Mail list logo