[Wikidata-bugs] [Maniphest] [Commented On] T191639: Wikidata JSON dumps do not have the 'ns' (namespace)

2020-05-04 Thread marcmiquel
marcmiquel added a comment.


  I assumed that Categories and Wikipedia: pages coming from language editions 
would maintain the ns from their origin wiki. We can close this. Now it is 
clear. Thanks, Addshore.

TASK DETAIL
  https://phabricator.wikimedia.org/T191639

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: marcmiquel
Cc: Chicocvenancio, Addshore, marcmiquel, jannee_e, darthmon_wmde, Nandana, 
Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Svick, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T191639: Wikidata JSON dumps do not have the 'ns' (namespace)

2020-05-02 Thread Addshore
Addshore added a comment.


  ns is not part of the data model for Wikidata / wikibase entities, which is 
why it does not appear in the JSON dumps.
  The docs on that page don't mention na other than in the example, which comes 
from the API and does include the ns.
  All Items on Wikidata org are in ns 0.

TASK DETAIL
  https://phabricator.wikimedia.org/T191639

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Chicocvenancio, Addshore, marcmiquel, jannee_e, darthmon_wmde, Nandana, 
Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Svick, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T191639: Wikidata JSON dumps do not have the 'ns' (namespace)

2020-04-30 Thread marcmiquel
marcmiquel added a comment.


  The wikidata dump still does not include the namespace tag.
  
  It is specified in the JSON DataModel and it would be useful for the same use 
I explained in this task.
  https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON
  
  Could you give me an update on this?

TASK DETAIL
  https://phabricator.wikimedia.org/T191639

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: marcmiquel
Cc: Chicocvenancio, Addshore, marcmiquel, jannee_e, darthmon_wmde, Nandana, 
Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Svick, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T191639: Wikidata JSON dumps do not have the 'ns' (namespace)

2019-02-15 Thread marcmiquel
marcmiquel added a comment.
I need all the Wikidata qitems that relate to Wikipedia articles. If I understand it correctly, these are qitems that have namespace 0. Although not all qitems with namespace 0 necessarily have sitelinks (they could be just qitems without an article).

The thing is that I'm not sure all wikidata qitems have namespace main (0).

I explain you what I did.

Since I cannot use the namespace XML tag in the dump to just parse the namespace 0 and skip the rest I managed to use the wikidata mysql replica database.

In this case, I consulted: 
select count(page_namespace), page_namespace from page group by page_namespace order by 1 desc;

This is the result:

---++


count(page_namespace)page_namespace



+---++


569860530
1522501198
450223
42573146
362044
333202
165411
108742600
746410
73715
5940121
5887120
367514
30328
180012
462828
2989
19311
13113
66829
62147
1415
37
31199



+---++

So it seems that there are many pages with namespace 1198, 146, 2600... 
besides 3, 4, 2, 1 which are user talk, project, user page, talk page.

I don't know how many of these are in the dump. But I only need those which are 0.  So, the solution that I found is retrieving all the qitems with namespace 0 from the wikidata replica mysql database and storing them into a database.

Then I consult this database when parsing and I skip those which haven't been previously inserted. This way I the parsing is shorter.

Do you think there is any other way to do it? 
Thanks.TASK DETAILhttps://phabricator.wikimedia.org/T191639EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: marcmiquelCc: Addshore, Chicocvenancio, marcmiquel, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, gnosygnu, Wikidata-bugs, aude, Svick, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T191639: Wikidata JSON dumps do not have the 'ns' (namespace)

2019-02-08 Thread Addshore
Addshore added a comment.

In T191639#4937630, @marcmiquel wrote:
The use case is to process the dumps and filter out qitems which do not relate to articles, this is why we put NS0.


That sounds like you are referring to the namespace of the sitelinks of the entity?

On wikidata.org all "qitems" are in the main namespace, which is namespace 0.
The sitelinks held within those items can be on any number of different namespaces on the wikidata clients.

The JSON dump sample says there is ns field but in the final dump there is no such field.

This is the namespace of the item itself, not of the sitelinks.
Could you link to those docs? it could be that they are only meant for the API serialization.?TASK DETAILhttps://phabricator.wikimedia.org/T191639EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: Addshore, Chicocvenancio, marcmiquel, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, gnosygnu, Wikidata-bugs, aude, Svick, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T191639: Wikidata JSON dumps do not have the 'ns' (namespace)

2019-02-08 Thread marcmiquel
marcmiquel added a comment.
The use case is to process the dumps and filter out qitems which do not relate to articles, this is why we put NS0. The JSON dump sample says there is ns field but in the final dump there is no such field.TASK DETAILhttps://phabricator.wikimedia.org/T191639EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: marcmiquelCc: Addshore, Chicocvenancio, marcmiquel, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, gnosygnu, Wikidata-bugs, aude, Svick, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T191639: Wikidata JSON dumps do not have the 'ns' (namespace)

2019-02-08 Thread Addshore
Addshore added a comment.
Hmm, what's the usecase here? Is this for wikidata dumps? Right now Items being in NS 0 is a pretty safe assumption, they don't appear anywhere else.TASK DETAILhttps://phabricator.wikimedia.org/T191639EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: Addshore, Chicocvenancio, marcmiquel, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, gnosygnu, Wikidata-bugs, aude, Svick, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs