You can get section titles (and hierarchy) directly from the API, though I
don't know if this approach scales the way you need it to:
https://en.wikipedia.org/w/api.php?action=parse&page=Albania&prop=sections&format=jsonfm

On Mon, Jul 13, 2015 at 1:52 PM, Amir E. Aharoni <
amir.ahar...@mail.huji.ac.il> wrote:

> Yes, that's the idea more or less, but I'm not sure that our search engine
> is able to search for headings, though I might be wrong. I suspect,
> however, that it will be required to process dumps article by article (or
> at least a random sample), and in big projects this could be extremely time
> consuming.But maybe there's a faster way of which I am not aware?
>
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
>
> 2015-07-13 23:41 GMT+03:00 Pine W <wiki.p...@gmail.com>:
>
>> Would it be possible to run a search on the full text of Wikipedias for
>> lines that start and end with "==", "===", "====", and lines that start
>> with ";", then make a list of those strings, and count the number of times
>> that each title appears in the list?
>>
>> Pine
>> On Jul 13, 2015 10:29 AM, "Jonathan Morgan" <jmor...@wikimedia.org>
>> wrote:
>>
>>> Cross-posting this request to wiki-research-l. Anyone have data on
>>> frequently used section titles in articles (any language), or know of
>>> datasets/publications that examined this?
>>>
>>> I'm not aware of any off the top of my head, Amir.
>>>
>>> - Jonathan
>>>
>>> ---------- Forwarded message ----------
>>> From: Amir E. Aharoni <amir.ahar...@mail.huji.ac.il>
>>> Date: Sat, Jul 11, 2015 at 3:29 AM
>>> Subject: [Wikitech-l] statistics about frequent section titles
>>> To: Wikimedia developers <wikitec...@lists.wikimedia.org>
>>>
>>>
>>> Hi,
>>>
>>> Did anybody ever try to collect statistics about frequent section titles
>>> in
>>> Wikimedia projects?
>>>
>>> For Wikipedia, for example, titles such as "Biography", "Early life",
>>> "Bibliography", "External links", "References", "History", etc., appear
>>> in
>>> a lot of articles, and their counterparts appear in a lot of languages.
>>>
>>> There are probably similar things in Wikivoyage, Wiktionary and possibly
>>> other projects.
>>>
>>> Did anybody ever try to collect statistics of the most frequent section
>>> titles in each language and project?
>>>
>>> --
>>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
>>> http://aharoni.wordpress.com
>>> ‪“We're living in pieces,
>>> I want to live in peace.” – T. Moore‬
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> wikitec...@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>>
>>>
>>> --
>>> Jonathan T. Morgan
>>> Senior Design Researcher
>>> Wikimedia Foundation
>>> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> Wiki-research-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>


-- 
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to