Re: [Analytics] Researcher Student

2016-04-12 Thread Andrew Gray
I believe there's a parsed dump which already covers this - https://dumps.wikimedia.org/itwiki/20160407/itwiki-20160407-geo_tags.sql.gz It seems to have ~260k items with 'earth' coordinates, which is about one in five pages on itwp. You can use this to skip the first step and go straight to

Re: [Analytics] Researcher Student

2016-04-12 Thread Joseph Allemandou
I second Kevin in the understanding of the problem. I think one approach could be: - Parse current version of Italian Wikipedia dump (no need to go for revisions history, only current version should be enough) and extract pages info (id and title) which contain GPS info (Since I don't know how