Yes, that's the idea more or less, but I'm not sure that our search engine is able to search for headings, though I might be wrong. I suspect, however, that it will be required to process dumps article by article (or at least a random sample), and in big projects this could be extremely time consuming.But maybe there's a faster way of which I am not aware?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore 2015-07-13 23:41 GMT+03:00 Pine W <wiki.p...@gmail.com>: > Would it be possible to run a search on the full text of Wikipedias for > lines that start and end with "==", "===", "====", and lines that start > with ";", then make a list of those strings, and count the number of times > that each title appears in the list? > > Pine > On Jul 13, 2015 10:29 AM, "Jonathan Morgan" <jmor...@wikimedia.org> wrote: > >> Cross-posting this request to wiki-research-l. Anyone have data on >> frequently used section titles in articles (any language), or know of >> datasets/publications that examined this? >> >> I'm not aware of any off the top of my head, Amir. >> >> - Jonathan >> >> ---------- Forwarded message ---------- >> From: Amir E. Aharoni <amir.ahar...@mail.huji.ac.il> >> Date: Sat, Jul 11, 2015 at 3:29 AM >> Subject: [Wikitech-l] statistics about frequent section titles >> To: Wikimedia developers <wikitec...@lists.wikimedia.org> >> >> >> Hi, >> >> Did anybody ever try to collect statistics about frequent section titles >> in >> Wikimedia projects? >> >> For Wikipedia, for example, titles such as "Biography", "Early life", >> "Bibliography", "External links", "References", "History", etc., appear in >> a lot of articles, and their counterparts appear in a lot of languages. >> >> There are probably similar things in Wikivoyage, Wiktionary and possibly >> other projects. >> >> Did anybody ever try to collect statistics of the most frequent section >> titles in each language and project? >> >> -- >> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי >> http://aharoni.wordpress.com >> “We're living in pieces, >> I want to live in peace.” – T. Moore >> _______________________________________________ >> Wikitech-l mailing list >> wikitec...@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> >> >> >> -- >> Jonathan T. Morgan >> Senior Design Researcher >> Wikimedia Foundation >> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> >> >> >> _______________________________________________ >> Wiki-research-l mailing list >> Wiki-research-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> >> > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > >
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l