Milimetric added a comment.

  I translated T238878#5708511 
<https://phabricator.wikimedia.org/T238878#5708511> to Hive to familiarize 
myself with it and get ahead of productionizing it.  I got similar numbers 
first as a sanity check and then grouped the numbers by the month of the 
page_latest revision's timestamp.  I was wondering if the numbers increase in 
some nice way that we could report on regardless of the overall total.  There's 
a fairly clear trend towards more structured data.  And this is not a clear way 
to show the trend because we're just looking at the latest revision not all 
revisions, but maybe it's useful to @Abit as she thinks about this metric:
  
  | month   | _c1    |
  | 2019-01 | 50396  |
  | 2019-02 | 90102  |
  | 2019-03 | 108367 |
  | 2019-04 | 112658 |
  | 2019-05 | 140429 |
  | 2019-06 | 442834 |
  | 2019-07 | 142744 |
  | 2019-08 | 399805 |
  | 2019-09 | 757923 |
  | 2019-10 | 255584 |
  | 2019-11 | 531414 |
  |
  
  Also,
  
  In T238878#5708257 <https://phabricator.wikimedia.org/T238878#5708257>, 
@daniel wrote:
  
  > By the way, if you find rev_deleted != 0 for the current revision, it's a 
bug. The deletion flags for the current revisions will be ignored by the 
storage layer.
  
  I checked this as I was re-reading and there are 2500 such problems:
  
    use wmf_raw;
    
     select p.wiki_db,
            page_id,
            page_title,
            page_namespace,
            rev_deleted
    
       from mediawiki_page p
                inner join
            mediawiki_revision r    on r.wiki_db = p.wiki_db
                                    and page_id = rev_page
                                    and page_latest = rev_id
                                    and rev_deleted <> 0
    
      where p.snapshot = '2019-11'
        and r.snapshot = '2019-11'
    
      order by p.wiki_db
    ;
  
  F31468843: pages_with_rev_deletion_on_latest.tsv 
<https://phabricator.wikimedia.org/F31468843>
  
  (side-note: queries on these raw tables as imported from mediawiki are slow 
because these tables are not stored in parquet format)

TASK DETAIL
  https://phabricator.wikimedia.org/T238878

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Milimetric
Cc: Milimetric, Cparle, nettrom_WMF, Ladsgroup, daniel, Mayakp.wiki, gsingers, 
matthiasmullie, Addshore, kzimmerman, mpopov, Ramsey-WMF, Abit, Nuria, 
4748kitoko, darthmon_wmde, DannyS712, Nandana, JKSTNK, Akovalyov, Lahi, 
PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, 
QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, _jensen, 
rosalieper, Scott_WUaS, Susannaanas, JAllemandou, Jane023, terrrydactyl, 
Wikidata-bugs, Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, 
Fabrice_Florin, Raymond, Steinsplitter, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to