We discussed a potential approach to scaling content extraction on
wiktionary across languages in https://phabricator.wikimedia.org/T138709.
The idea was to define HTML microformats that can be integrated in
templates to provide a uniform marker for specific bits of content.
On Jan 12, 2017 1:17 P
Hey all, just want to drop in some thoughts on his thread…
I would say the premise of this email is absolutely right: Parsing out this
data by hand coding against specific HTML structures is untenable.
On Wikipedia we have a wealth of community curated information: featured
articles, in the news,
Thanks Florian.
I see your point about ease of forking. Would you have time later, perhaps
off thread, to detail the challenges you faced?
Regarding the data endpoint topic of this thread, it isn't app-specific
despite being part of 'mediawiki/services/mobileapps'. We'll probably want
a more gene
What is also in my mind: the app wasn't easy to fork and use on other (third-party) projects before that, too, however with the track in this direction (specific Parsons of contents, which apply to Wikimedia projects only) makes it, in reality, impossible to fork (and maintain) the app projects to
> I see little effort on finding solutions potentially able to scale to all
our projects and languages
See my reply to your initial comment on that ticket. This was just a first
hack at implementing this functionality. If you had simply asked if there
were plans to expand this to other projects/la
Is it considered acceptable now to produce a service or API that
hardcodes wiki-specific parsing of certain wikitext or HTML patterns in
certain wiki pages (such as the "On this day" section of the main page
of one wiki)?
I'm confused by the status of things and after my comment
https://phabr