Have you seen the html structure of parsoid? E.g. https://en.wikipedia.org/api/rest_v1/page/html/Dog
-- Bawolff On Monday, January 10, 2022, Adam Sobieski <adamsobie...@hotmail.com> wrote: > Wikitech-l, > > > > Hello. I have a question about the HTML output of wiki parsers. I wonder > about how simple or complex that it would be for a wiki parser to output, > instead of a flat document structure inside of a <div> element, an > <article> element containing nested <section> elements? > > > > Recently, in the Community Wishlist Survey Sandbox > <https://meta.wikimedia.org/wiki/Community_Wishlist_Survey/Sandbox>, the > speech synthesis of Wikipedia articles > <https://meta.wikimedia.org/wiki/Community_Wishlist_Survey/Sandbox#Spoken_articles> > was broached. The proposer of these ideas indicated that, for best results, > some content, e.g., “See also” sections, should not be synthesized. > > > > In response to these interesting ideas, I mentioned some ideas from EPUB, > referencing > pronunciation lexicons from HTML > <https://www.w3.org/publishing/epub3/epub-contentdocs.html#sec-pls> and SSML > attributes in HTML > <https://www.w3.org/publishing/epub3/epub-contentdocs.html#sec-xhtml-ssml-attrib>, > the CSS Speech Module <https://www.w3.org/TR/css-speech-1/>, and that > output HTML content could be styled using the CSS Speech Module’s speak > property. > > > > In these regards, I started thinking about how one might extend wikitext > syntax to be able to style sections, e.g.,: > > > > == See also == {style="speak:never"} > > > > Next, I inspected the HTML of some Wikipedia articles and realized that, > due to the structure of the output HTML documents, it isn’t simple to style > or to add attributes to sections. There are only <h2>, <h3>, <h4> (et > cetera) elements inside of a containing <div> element; sections are not > yet structured elements. > > > > The gist is that, instead of outputting HTML like: > > > > <div class="mw-parser-output"> > > <h2><span class="mw-headline" id="Heading">Heading</span></h2> > > <p>Paragraph 1</p> > > <p>Paragraph 2</p> > > <h3><span class="mw-headline" id="Subheading">Subheading</span></h3> > > <p>Paragraph 3</p> > > <p>Paragraph 4</p> > > </div> > > > > could a wiki parser output HTML5 like: > > > > <article class="mw-parser-output"> > > <section id="Heading"> > > <header><h2><span class="mw-headline">Heading</span></h2></header> > > <p>Paragraph 1</p> > > <p>Paragraph 2</p> > > <section id="Subheading"> > > <header><h3><span class="mw-headline">Subheading</span></h3></ > header> > > <p>Paragraph 3</p> > > <p>Paragraph 4</p> > > </section> > > </section> > > </article> > > > > Initial thoughts regarding the latter HTML5 include that it is better > structured, more semantic, more styleable, and potentially more accessible. > If there is any interest, I could write up some lengthier discussion about > one versus the other, why one might be better – and more useful – than the > other. > > > > Is this the correct mailing list to discuss any of these wiki technology, > wiki parsing, wikitext, document model, and HTML5 output topics? > > > > > > Best regards, > > Adam > > >
_______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/