[Bug 37291] updateArticleCount.php script is broken

2012-06-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=37291

--- Comment #8 from Nemo_bis federicol...@tiscali.it 2012-06-12 07:11:26 UTC 
---
(In reply to comment #6)
 The exact definition I'm using for a good article is: non-redirect in a
 content namespace with any kind of internal-style [[wikilink]]: page, 
 category,
 image/file, interlanguage, or interwiki.  AFAIK, that's the definition
 currently in use.

I wouldn't be so sure. As I already told you, interwikis and/or category links
may not being counted now (which makes sense, especially for interwikis).

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 37291] updateArticleCount.php script is broken

2012-06-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=37291

Donald Lancon dc...@obkb.com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||INVALID

--- Comment #9 from Donald Lancon dc...@obkb.com 2012-06-12 16:02:01 UTC ---
OK... so I finally looked at r88113, which is apparently where all of this
changed radically.

Let's be very precise here.  There are many different kinds of wikilinks (a
fact that has contributed greatly to confusion over this issue):

1. page: e.g., [[link]] or [[Special:Statistics]]
2. category: [[Category:English]]
3. image/file: [[File:Yes.png]]
4. interlanguage: [[:de:]] or [[de:]]
5. interwiki: [[species:]]
6. hidden: !-- [[don't look at me]] --
7. deactivated: nowiki[[look at me]]/nowiki
8-14. template-provided versions of, respectively, 1-7

Before r88113, 1-7 (in fact, _any_ instance of [[) were all counted, but not
8-14.  Afterwards, 1 and 8 are counted and no others.  (Even though I can't
check 8-14 with my script, checking for only type 1 links gave counts that
matched {{NUMBEROFARTICLES}} on four wikis I tried it on.  So there ya go.)

Unfortunately, this means you can't tell anymore just from the raw page source
whether a page will be an article or not (I mean, say, if it has a template on
it but no page links); it must be parsed first.

Seems to me, this amounts to a fundamental change in the way articles are
counted (the changes in article counts that have resulted is proof enough of
this) that was only ever discussed beforehand by a handful of people in bug
11868 -- and nobody there seemed to actually be discussing _this_ particular
counting method!  (Brion, for example, stated that the new method would
overcount articles, which is the opposite of what has happened!)

IOW, this new state of affairs (which, although over a year old at this
point, has not yet propagated to projects beyond Wikisource and Wiktionary,
because updateArticleCount.php hasn't been run on them) was not arrived at
through any real consensus process.  In fact, Nemo_bis, I see that's
essentially what you said just 3 weeks before the changes were committed by
IAlex https://bugzilla.wikimedia.org/show_bug.cgi?id=24754#c1.

So, anyway... I guess this bug is finished, and I need to start a (now more
informed) discussion about this on Meta

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 37291] updateArticleCount.php script is broken

2012-06-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=37291

--- Comment #10 from Platonides platoni...@gmail.com 2012-06-12 16:08:59 UTC 
---
(In reply to comment #9)
 OK... so I finally looked at r88113, which is apparently where all of this
 changed radically.

Wow, I wasn't aware of that.



 Unfortunately, this means you can't tell anymore just from the raw page source
 whether a page will be an article or not (I mean, say, if it has a template on
 it but no page links); it must be parsed first.

For the verification purposes discussed here, you can use pagelinks.sql.gz
though.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 37291] updateArticleCount.php script is broken

2012-06-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=37291

--- Comment #11 from Donald Lancon dc...@obkb.com 2012-06-12 19:13:54 UTC ---
Yeah, I just realized that! [g]

For some reason I was thinking that the way my script was doing it would miss
links provided by templates, but of course that's not true: what my script does
is _exactly_ what the MW code itself does when not triggered by a page edit: it
checks page.sql for the existence of links originating from the page in
question!

I don't know what I was thinking

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 37291] updateArticleCount.php script is broken

2012-06-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=37291

Platonides platoni...@gmail.com changed:

   What|Removed |Added

 CC||platoni...@gmail.com

--- Comment #3 from Platonides platoni...@gmail.com 2012-06-11 20:33:30 UTC 
---
Is your script available somewhere?

Maybe you could point out a small wiki with the count of your script, for
comparing with the numbers provided by updateArticleCount for that wiki?

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 37291] updateArticleCount.php script is broken

2012-06-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=37291

Nemo_bis federicol...@tiscali.it changed:

   What|Removed |Added

 CC||federicol...@tiscali.it

--- Comment #4 from Nemo_bis federicol...@tiscali.it 2012-06-11 21:30:22 UTC 
---
(In reply to comment #3)
 Is your script available somewhere?

aka, what definition of good article are you using to say that the count is
not correct?

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 37291] updateArticleCount.php script is broken

2012-06-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=37291

Nemo_bis federicol...@tiscali.it changed:

   What|Removed |Added

 Status|NEW |UNCONFIRMED
 Ever Confirmed|1   |0

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 37291] updateArticleCount.php script is broken

2012-06-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=37291

--- Comment #5 from Platonides platoni...@gmail.com 2012-06-11 22:04:26 UTC 
---
See above: «based on the current non-redirect with at least one wikilink
criteria»

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 37291] updateArticleCount.php script is broken

2012-06-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=37291

--- Comment #6 from Donald Lancon dc...@obkb.com 2012-06-11 22:22:28 UTC ---
If you haven't already, please see the Meta page I pointed to in my initial
post: http://meta.wikimedia.org/wiki/User:Dcljr/Article_counts

That gives all the information I think someone would need to independently
check my counts  (In fact, it might be a good idea for someone to try to
count the articles themselves without seeing my code first.  My script is
currently not available anywhere, but I can put it up at Meta if it's really
necessary.)

I've posted stats for 12 Wiktionaries and 15 Wikisources so far (each for a
date before and a date after the running of the maintenance script).  Take your
pick for which one(s) you want to check.

The exact definition I'm using for a good article is: non-redirect in a
content namespace with any kind of internal-style [[wikilink]]: page, category,
image/file, interlanguage, or interwiki.  AFAIK, that's the definition
currently in use.  Which pages contain each of these types of links are gleaned
from the respective database dumps.  For details, see the Meta page.  For the
Wiktionaries, I also show counts using three other sets of criteria (also
explained at the Meta page).

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 37291] updateArticleCount.php script is broken

2012-06-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=37291

--- Comment #7 from Donald Lancon dc...@obkb.com 2012-06-12 03:22:13 UTC ---
Me: AFAIK, that's the definition currently in use.

This only applies to the link article-count method, of course -- which all
Wikisources and Wiktionaries are currently using, according to
http://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php
(search for wgArticleCountMethod).

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 37291] updateArticleCount.php script is broken

2012-06-07 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=37291

--- Comment #2 from Donald Lancon dc...@obkb.com 2012-06-07 08:33:21 UTC ---
I see that the updateArticleCount.php script itself does very little. Instead,
it relies on other code to actually count the articles. I followed the
dependencies for a while, but eventually gave up before I found the actual code
that does the counting. Someone more familiar with MW code will have to say
where the problem lies...

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 37291] updateArticleCount.php script is broken

2012-06-01 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=37291

Donald Lancon dc...@obkb.com changed:

   What|Removed |Added

   Keywords||analytics
URL||http://meta.wikimedia.org/w
   ||iki/User:Dcljr/Article_coun
   ||ts

--- Comment #1 from Donald Lancon dc...@obkb.com 2012-06-02 04:49:56 UTC ---
BTW, I should point out that the undercounting cannot be because it's not
considering all the content namespaces, because all Wiktionaries use only ns0
for content.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l