Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-15 Thread Aryeh Gregor
On Sun, Mar 15, 2009 at 11:22 AM, Platonides  wrote:
> Note that some namespaces such as WP: or Image: not explicit on the list
> above are aliases for them.

On Sun, Mar 15, 2009 at 12:00 PM, Chad  wrote:
> Media: also comes to mind.

Page titles cannot begin with these prefixes, so I deliberately
omitted them.  What I said is correct.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-15 Thread Platonides
Aryeh Gregor wrote:
> On Fri, Mar 13, 2009 at 6:26 PM, O. O.  wrote:
>> Thanks  Daniel. I had not understood the meaning of NS0. Anyway I found
>> the details of NS0 from http://en.wikipedia.org/wiki/Wikipedia:NS0
>> However this confuses me even more.
> 
> Pages on the English Wikipedia that start with any of the following
> prefixes are *not* in the main namespace (ns0):
> 
> Talk:
> User:
> User talk:
> Wikipedia:
> Wikipedia talk:
> File:
> File talk:
> MediaWiki:
> MediaWiki talk:
> Template:
> Template talk:
> Help:
> Help talk:
> Category:
> Category talk:
> Portal:
> Portal talk:
> Special:
> 
> All pages that do not start with one of these special prefixes are
> automatically in namespace 0.  To check the namespace number of a page
> if you're uncertain, you can view the page source and check the body
> element's classes.  namespace 0 pages will have the class "ns-0".
> Other pages will have some other number; for instance, "Talk:" pages
> will have "ns-1", because "Talk:" is namespace 1.  "User:" is 2, "User
> talk:" is 3, etc.

Note that some namespaces such as WP: or Image: not explicit on the list
above are aliases for them.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-14 Thread Aryeh Gregor
On Fri, Mar 13, 2009 at 6:26 PM, O. O.  wrote:
> Thanks  Daniel. I had not understood the meaning of NS0. Anyway I found
> the details of NS0 from http://en.wikipedia.org/wiki/Wikipedia:NS0
> However this confuses me even more.

Pages on the English Wikipedia that start with any of the following
prefixes are *not* in the main namespace (ns0):

Talk:
User:
User talk:
Wikipedia:
Wikipedia talk:
File:
File talk:
MediaWiki:
MediaWiki talk:
Template:
Template talk:
Help:
Help talk:
Category:
Category talk:
Portal:
Portal talk:
Special:

All pages that do not start with one of these special prefixes are
automatically in namespace 0.  To check the namespace number of a page
if you're uncertain, you can view the page source and check the body
element's classes.  namespace 0 pages will have the class "ns-0".
Other pages will have some other number; for instance, "Talk:" pages
will have "ns-1", because "Talk:" is namespace 1.  "User:" is 2, "User
talk:" is 3, etc.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-14 Thread Daniel Kinzler
> plotting number of articles could help a observer "see" the grown of a
> wiki, but is a bad number to see the "dead" of a wiki.

For this kind of analysis, check out WikiXRay and WikiTracer.

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-14 Thread Tei
On Sat, Mar 14, 2009 at 8:46 AM, Daniel Kinzler  wrote:
> Andrew Garrett schrieb:
>> On Sat, Mar 14, 2009 at 9:34 AM, O. O.  wrote:
>>> Andrew Garrett wrote:
 On Sat, Mar 14, 2009 at 9:26 AM, O. O.  wrote:
>        The above link says that “only articles” and no redirects are in 
> the
> namespace NS0. Also Talk: pages are not included in the NS0.
> Then, when the current English Wikipedia advertises 2,791,033 Articles,
> I cannot understand why the list of Titles contains 5716820 Titles? This
> is a little more than double?
 The larger number includes redirects, the smaller number doesn't.

>>> Then why does this http://en.wikipedia.org/wiki/Wikipedia:NS0 say that
>>> “Redirects” are not considered as Articles and hence are not in NS0?
>>
>> It doesn't say that, it says "Not all pages in the article namespace
>> are considered to be articles", listing redirects as an example.
>
> The terminology is indeed confusing. ns0 is the "main" namespace, which is 
> used
> for "articles". But it also contains redirects. For the statistics, the 
> software
> tries to count "real" or "good" articles, which is defined to be in ns0, not a
> redirect, and containing at least one link. It may in the future even be
> redefined not to include disambiguation pages. The title list however contains
> all pages in ns0.
>
> Talk pages are in their own namesapace, or rather, namespaces. Namespaces come
> in pairs: the namespace itself (even id), and the corresponding talk namespace
> (odd id).

plotting number of articles could help a observer "see" the grown of a
wiki, but is a bad number to see the "dead" of a wiki.

but.. he!.. maybe all wikis on the mediawiki proyect are just growing,
so we don't have this phenomenon just now, maybe in a few years we
will see some "wastelands wikis".  Immense amounts of text that no one
can maintain (are interested in maintain) and let on his own suffer a
continuous degradation.  Anyway all our wikis are on his infancy, and
I am thinking  5+ years forward, and there are lots and lots of urgent
problems just now.

please ignore this email




-- 
--
ℱin del ℳensaje.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-14 Thread Daniel Kinzler
Andrew Garrett schrieb:
> On Sat, Mar 14, 2009 at 9:34 AM, O. O.  wrote:
>> Andrew Garrett wrote:
>>> On Sat, Mar 14, 2009 at 9:26 AM, O. O.  wrote:
The above link says that “only articles” and no redirects are in the
 namespace NS0. Also Talk: pages are not included in the NS0.
 Then, when the current English Wikipedia advertises 2,791,033 Articles,
 I cannot understand why the list of Titles contains 5716820 Titles? This
 is a little more than double?
>>> The larger number includes redirects, the smaller number doesn't.
>>>
>> Then why does this http://en.wikipedia.org/wiki/Wikipedia:NS0 say that
>> “Redirects” are not considered as Articles and hence are not in NS0?
> 
> It doesn't say that, it says "Not all pages in the article namespace
> are considered to be articles", listing redirects as an example.

The terminology is indeed confusing. ns0 is the "main" namespace, which is used
for "articles". But it also contains redirects. For the statistics, the software
tries to count "real" or "good" articles, which is defined to be in ns0, not a
redirect, and containing at least one link. It may in the future even be
redefined not to include disambiguation pages. The title list however contains
all pages in ns0.

Talk pages are in their own namesapace, or rather, namespaces. Namespaces come
in pairs: the namespace itself (even id), and the corresponding talk namespace
(odd id).

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-13 Thread Andrew Garrett
On Sat, Mar 14, 2009 at 9:34 AM, O. O.  wrote:
> Andrew Garrett wrote:
>> On Sat, Mar 14, 2009 at 9:26 AM, O. O.  wrote:
>>>        The above link says that “only articles” and no redirects are in the
>>> namespace NS0. Also Talk: pages are not included in the NS0.
>>> Then, when the current English Wikipedia advertises 2,791,033 Articles,
>>> I cannot understand why the list of Titles contains 5716820 Titles? This
>>> is a little more than double?
>>
>> The larger number includes redirects, the smaller number doesn't.
>>
> Then why does this http://en.wikipedia.org/wiki/Wikipedia:NS0 say that
> “Redirects” are not considered as Articles and hence are not in NS0?

It doesn't say that, it says "Not all pages in the article namespace
are considered to be articles", listing redirects as an example.

-- 
Andrew Garrett
Sent from: Sydney New South Wales Australia.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-13 Thread O. O.
Andrew Garrett wrote:
> On Sat, Mar 14, 2009 at 9:26 AM, O. O.  wrote:
>>The above link says that “only articles” and no redirects are in the
>> namespace NS0. Also Talk: pages are not included in the NS0.
>> Then, when the current English Wikipedia advertises 2,791,033 Articles,
>> I cannot understand why the list of Titles contains 5716820 Titles? This
>> is a little more than double?
> 
> The larger number includes redirects, the smaller number doesn't.
> 
Then why does this http://en.wikipedia.org/wiki/Wikipedia:NS0 say that 
“Redirects” are not considered as Articles and hence are not in NS0?

O.O.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-13 Thread Andrew Garrett
On Sat, Mar 14, 2009 at 9:26 AM, O. O.  wrote:
>        The above link says that “only articles” and no redirects are in the
> namespace NS0. Also Talk: pages are not included in the NS0.
> Then, when the current English Wikipedia advertises 2,791,033 Articles,
> I cannot understand why the list of Titles contains 5716820 Titles? This
> is a little more than double?

The larger number includes redirects, the smaller number doesn't.

-- 
Andrew Garrett

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-13 Thread O. O.
Daniel Kinzler wrote:
> O. O. schrieb:
>> Aryeh Gregor wrote:
>>> On Fri, Mar 13, 2009 at 2:44 PM, O. O.  wrote:
 Hi,
I am looking at the dump of the English Wikipedia at
 http://download.wikimedia.org/enwiki/20081008/ There is a file called
 “all-titles-in-ns0.gz” which is supposed to contain the List of Page
 Titles.  If I do

 cat enwiki-20081008-all-titles-in-ns0 | wc -l

 I get 5716820. On the same page, a little above in
 “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
>>> The description for pages-articles.xml.bz2 says it contains "Articles,
>>> templates, image descriptions, and primary meta-pages."
>>> all-titles-in-ns0.gz contains (as the name suggests) only the titles
>>> in ns0, i.e., the main namespace, articles.  It does not contain
>>> templates, image descriptions, or "primary meta-pages" (whatever those
>>> are).
>> Thanks Ilmari and Aryeh.
>>
>>  I am not sure what are “primary meta-pages” – however “templates”, and 
>> “image descriptions” do have Titles. You can check this in the online 
>> version of the English Wikipedia.
> 
> Sure they have titles. But they are not "ns0" and thus not contained in this
> list. Wich is ns0 only (that is, main "article" namespace).
> 
> -- daniel
> 
Thanks  Daniel. I had not understood the meaning of NS0. Anyway I found 
the details of NS0 from http://en.wikipedia.org/wiki/Wikipedia:NS0 
However this confuses me even more.

The above link says that “only articles” and no redirects are in the 
namespace NS0. Also Talk: pages are not included in the NS0.
Then, when the current English Wikipedia advertises 2,791,033 Articles, 
I cannot understand why the list of Titles contains 5716820 Titles? This 
is a little more than double?

Thanks for helping out,
O. O.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-13 Thread Daniel Kinzler
O. O. schrieb:
> Aryeh Gregor wrote:
>> On Fri, Mar 13, 2009 at 2:44 PM, O. O.  wrote:
>>> Hi,
>>>I am looking at the dump of the English Wikipedia at
>>> http://download.wikimedia.org/enwiki/20081008/ There is a file called
>>> “all-titles-in-ns0.gz” which is supposed to contain the List of Page
>>> Titles.  If I do
>>>
>>> cat enwiki-20081008-all-titles-in-ns0 | wc -l
>>>
>>> I get 5716820. On the same page, a little above in
>>> “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
>> The description for pages-articles.xml.bz2 says it contains "Articles,
>> templates, image descriptions, and primary meta-pages."
>> all-titles-in-ns0.gz contains (as the name suggests) only the titles
>> in ns0, i.e., the main namespace, articles.  It does not contain
>> templates, image descriptions, or "primary meta-pages" (whatever those
>> are).
> 
> Thanks Ilmari and Aryeh.
> 
>   I am not sure what are “primary meta-pages” – however “templates”, and 
> “image descriptions” do have Titles. You can check this in the online 
> version of the English Wikipedia.

Sure they have titles. But they are not "ns0" and thus not contained in this
list. Wich is ns0 only (that is, main "article" namespace).

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-13 Thread O. O.
Aryeh Gregor wrote:
> On Fri, Mar 13, 2009 at 2:44 PM, O. O.  wrote:
>> Hi,
>>I am looking at the dump of the English Wikipedia at
>> http://download.wikimedia.org/enwiki/20081008/ There is a file called
>> “all-titles-in-ns0.gz” which is supposed to contain the List of Page
>> Titles.  If I do
>>
>> cat enwiki-20081008-all-titles-in-ns0 | wc -l
>>
>> I get 5716820. On the same page, a little above in
>> “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
> 
> The description for pages-articles.xml.bz2 says it contains "Articles,
> templates, image descriptions, and primary meta-pages."
> all-titles-in-ns0.gz contains (as the name suggests) only the titles
> in ns0, i.e., the main namespace, articles.  It does not contain
> templates, image descriptions, or "primary meta-pages" (whatever those
> are).

Thanks Ilmari and Aryeh.

I am not sure what are “primary meta-pages” – however “templates”, and 
“image descriptions” do have Titles. You can check this in the online 
version of the English Wikipedia.

O. O.



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-13 Thread Aryeh Gregor
On Fri, Mar 13, 2009 at 2:44 PM, O. O.  wrote:
> Hi,
>        I am looking at the dump of the English Wikipedia at
> http://download.wikimedia.org/enwiki/20081008/ There is a file called
> “all-titles-in-ns0.gz” which is supposed to contain the List of Page
> Titles.  If I do
>
> cat enwiki-20081008-all-titles-in-ns0 | wc -l
>
> I get 5716820. On the same page, a little above in
> “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.

The description for pages-articles.xml.bz2 says it contains "Articles,
templates, image descriptions, and primary meta-pages."
all-titles-in-ns0.gz contains (as the name suggests) only the titles
in ns0, i.e., the main namespace, articles.  It does not contain
templates, image descriptions, or "primary meta-pages" (whatever those
are).

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Understanding the meaning of “List of page titles”

2009-03-13 Thread Ilmari Karonen
O. O. wrote:
>   I am looking at the dump of the English Wikipedia at 
> http://download.wikimedia.org/enwiki/20081008/ There is a file called 
> “all-titles-in-ns0.gz” which is supposed to contain the List of Page 
> Titles.  If I do
> 
> cat enwiki-20081008-all-titles-in-ns0 | wc -l
> 
> I get 5716820. On the same page, a little above in 
> “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
> 
> So why are these two numbers different? Are there pages without a Title?

The description of pages-articles.xml.bz2 says "Articles, templates, 
image descriptions, and primary meta-pages."  Presumably the 1932231 
non-article pages in it are the "templates, image descriptions, and 
primary meta-pages".

-- 
Ilmari Karonen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l