You look up the linked page in linktarget, then count links to that using pagelinks.
If you want to look the most linked articles by articles, you could do: SELECT lt_title, COUNT(pl_from) AS count FROM linktarget JOIN pagelinks ON (lt_id=pl_target_id) WHERE pl_from_namespace =0 AND lt_namespace=0 GROUP BY pl_target_id ORDER BY count DESC LIMIT 100; which for enwiki returns the below result. Beware that some templates present on many articles may bias the output (depending on what you want to measure). For instance, it is obvious that the high number of links to identifiers (ISBN, GPS, Doi...) are related to articles including coordinates, including ISBN in their references... not necessarily *discussing* GPS. Depending on what kind of you conclusions you seek, that may be useful or noise. Also note that this doesn't include links through the category system, which would need to be accounted separatedly. Regards +-------------------------------------------------+---------+ | ISBN_(identifier) | 1557626 | | Geographic_coordinate_system | 1242077 | | Doi_(identifier) | 646084 | | Wayback_Machine | 566217 | | Wikidata | 473357 | | Taxonomy_(biology) | 469525 | | Time_zone | 444403 | | Global_Biodiversity_Information_Facility | 441781 | | United_States | 436111 | | ISSN_(identifier) | 435609 | | ASCII | 418733 | | Typographic_ligature | 416779 | | Greek_alphabet | 416495 | | Diacritical_mark | 416336 | | Open_Tree_of_Life | 380712 | | IMDb_(identifier) | 370081 | | Binomial_nomenclature | 366398 | | Animal | 351794 | | Catalogue_of_Life | 351292 | | Eukaryote | 342025 | | Interim_Register_of_Marine_and_Nonmarine_Genera | 305561 | | INaturalist | 283754 | | Daylight_saving_time | 276830 | | Encyclopedia_of_Life | 264940 | | France | 258105 | | S2CID_(identifier) | 254469 | | Wikispecies | 248856 | | Association_football | 243125 | | Record_label | 235444 | | OCLC_(identifier) | 232519 | | National_Center_for_Biotechnology_Information | 230716 | | Arthropod | 220520 | | Music_genre | 216694 | | PMID_(identifier) | 214443 | | Germany | 208326 | | United_Kingdom | 202985 | | Insect | 196545 | | List_of_sovereign_states | 196378 | | Record_producer | 193718 | | India | 185757 | | Italy | 183172 | | Surname | 182182 | | World_War_II | 181185 | | Australia | 175278 | | Synonym_(taxonomy) | 174819 | | The_New_York_Times | 166959 | | Integrated_Taxonomic_Information_System | 166733 | | Japan | 164887 | | Canada | 164580 | | Russia | 161388 | | Poland | 157299 | | Spain | 156856 | | England | 156133 | | Bibcode_(identifier) | 146095 | | JSTOR_(identifier) | 142740 | | Common_name | 141434 | | New_York_City | 135577 | | Biological_nomenclature | 133976 | | PMC_(identifier) | 132931 | | China | 132623 | | Central_European_Time | 129616 | | London | 127722 | | Central_European_Summer_Time | 126416 | | Brazil | 125904 | | Netherlands | 124340 | | Album | 124049 | | Single_(music) | 121842 | | Sweden | 117744 | | Barcode_of_Life_Data_System | 116870 | | English_language | 112510 | | Switzerland | 111163 | | U.S._state | 106763 | | Russian_language | 105655 | | Given_name | 104402 | | Iran | 101650 | | Billboard_(magazine) | 101132 | | Lepidoptera | 100443 | | Village | 99583 | | Turkey | 96563 | | Alma_mater | 96208 | | The_Guardian | 95350 | | Belgium | 94848 | | National_Register_of_Historic_Places | 94507 | | County_seat | 92805 | | World_Register_of_Marine_Species | 92677 | | Austria | 90980 | | World_War_I | 90910 | | Public_domain | 90410 | | Midfielder | 90404 | | South_Africa | 90243 | | National_Park_Service | 89946 | | AllMusic | 89904 | | Argentina | 89179 | | Mexico | 88627 | | California | 88542 | | IUCN_Red_List | 88078 | | Soviet_Union | 86969 | | Plant | 86862 | | YouTube | 86425 | | H:S | 86326 | +-------------------------------------------------+---------+ 100 rows in set (1 hour 2 min 7.110 sec) On Sat, 1 Mar 2025 at 04:23, Abraham Israeli <[email protected]> wrote: > Yes, this is very helpful! Thanks a lot! > So, if I get it right, I actually need some mapping from the link target > table to the page table in order to have both IDs to be "consistent". > Maybe I'm doing something too complicated here -- what I'm actually > looking for is a way to know the "in degree" of wiki articles. This is the > number of articles that point to a specific article. I'm only interested in > existing articles from namespace 0. Any better approach to do it? > > thanks again! > > On Fri, Feb 28, 2025 at 2:09 PM Amir Sarabadani <[email protected]> > wrote: > >> Hi, >> Please note that the source and the target point to ids in different >> tables. pl_from points to id in page table (page_id) but pl_target_id >> points to id in linktarget table (lt_id). Also note that pages can link to >> non-existent pages (that's one of the reasons the target doesn't point to >> page_id but to a different table). >> >> Hope that helps, let me know if it doesn't fix your problem. >> >> Best >> >> Am Fr., 28. Feb. 2025 um 19:45 Uhr schrieb Abraham Israeli < >> [email protected]>: >> >>> Hi, >>> I am trying to work with the Pagelinks file in order to extract the >>> number of articles that link to a specific Wiki page (the same idea as can >>> be seen in the "What-links-here" tool: >>> https://en.wikipedia.org/wiki/Help:What_links_here). >>> >>> However, when I loop over this SQL-like file, I find very weird cases of >>> links that don't seem to exist in Wikipedia. >>> >>> For example, the first line in the file indicates that there is a link >>> between page ids 1939 and 2. PageID 2 doesn't even seem to exist. >>> >>> Even when I look at pages that do exist, the link indicated in the file >>> does not exist on the actual wiki page. >>> >>> Am I missing anything? >>> _______________________________________________ >>> Xmldatadumps-l mailing list -- [email protected] >>> To unsubscribe send an email to [email protected] >>> >> >> >> -- >> Amir (he/him) >> >> > > -- > Best, > Abraham > --------- > > Abraham I. > Postdoc Researcher > University of Michigan | School of Information > pronouns: he/him > abraham.com <https://www.avrahami-israeli.com/> > _______________________________________________ > Xmldatadumps-l mailing list -- [email protected] > To unsubscribe send an email to [email protected] >
_______________________________________________ Xmldatadumps-l mailing list -- [email protected] To unsubscribe send an email to [email protected]
