Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

Subramanya Sastry Sat, 17 May 2014 08:52:28 -0700

(Top posting to quickly summarize what I gathered from the discussionand what would be required for Parsoid to expand pages with thesetransclusions).

Parsoid currently relies on the mediawiki API to preprocesstransclusions and return wikitext (uses action=expandtemplates for this)which it then parses using native Parsoid pipeline. Parsoid processesextension tags via action=parse and weaves the result back into thetop-level content of the page.

As per your original email, I am assuming the T is a page with a specialcontent model that generates HTML and another page P has a transclusion{{T}}.

So, when Parsoid encounters {{T}}, it should be able to replace {{T}}with the HTML to generate the right parse output for P.

So, I am listing below 4 possible ways action=expandtemplates canprocess {{T}}


1. Your newest implementation (that just returns back {{T}}):

* If Parsoid gets back {{T}}, one of two things can happen:

--- Parsoid, as usual, tries to parse it as wikitext, and it gets stuckin an infinite loop (query MW api for expansion of {{T}}, get back{{T}}, parse it as {{T}}, query MW api for expansion of {{T}}, .... ).So, this will definitely not work.--- Parsoid adds a special case check to see if the API sent back {{T}},and in which case, requires a different API endpoint(action=expandtohtml maybe?) to send back the html expansion based onthe assumption about output of expandtemplates. This would work andwould require the new endpoint to be implemented, but feels hacky.

So, going back to your original implementation, here are at least 3 waysI see this working:

2. action=expandtemplates returns a <html>...</html> for the expansionof {{T}}, but also provides an additional API response header that tellsParsoid that T was a special content model page and that the raw HTMLthat it received should not be sanitized.

3. action=expandtemplates returns <html>...</html> for the expansion of{{T}} and no other indication about T being a special content model pageor not. However, if Parsoid (and other clients) are to trust these htmloutput always without sanitization, expandtemplates implementationshould have a conditional sanitization of <html> tags encountered inwikitext to prevent XSS. As far as I understand, expandtemplates (onmaster, not your patch) does not do this tag sanitization. But,independent of that, what Parsoid and clients need is a guarantee thatit is safe to blindly splice the contents of any <html>...</html> itreceives for any {{T}} no matter whether what content model T implements.

4. Parsoid first queries the MW-api to find out the content model of Tfor every transclusion {{T}} it encounters on the page P and based onthe content-model info, knows how to process the output ofaction=expandtemplates.

Clearly 4. is expensive and 3. seems hacky, but if it can be made towork, we can work with that.

But, both Gabriel and I think that solution 2. is the cleanest solutionfor now that would work. The PHP parser (in your patch to handle {{T}})already has information about the content model of T when it isexpanding {{T}} and it seems simplest and cleanest to return thisinformation back to clients in the non-default content content-modelexpansions. That gives clients like Parsoid the cleanest way of handlingthese.

If I am missing something or this is unclear, and this getting into toomuch back and forth on email and it is simpler to discuss this on IRC, Ican hop onto any IRC channel on Monday or we can do this on#mediawiki-parsoid, and one of us could later summarize the discussionback onto this thread.


Thanks,
Subbu.


On 05/17/2014 02:54 AM, Daniel Kinzler wrote:

Am 16.05.2014 21:07, schrieb Gabriel Wicke:

On 05/15/2014 04:42 PM, Daniel Kinzler wrote:

The one thing that will not work on wikis with
$wgRawHtml disabled is parsing the output of expandtemplates.

Yes, which means that it won't work with Parsoid, Flow, VE and other users.

And it has been fixed now. In the latest version, expandtemplates will just
return {{Foo}} as it was if {{Foo}} can't be expanded to wikitext.

I do think that we can do better, and I pointed out possible ways to do so
in my earlier mail:

My preference
would be to let the consumer directly ask for pre-expanded wikitext *or*
HTML, without overloading action=expandtemplates. Even indicating the
content type explicitly in the API response (rather than inline with an HTML
tag) would be a better stop-gap as it would avoid some of the security and
compatibility issues described above.

I don't quite understand what you are asking for... action=parse returns HTML,
action=expandtemplates returns wikitext. The issue was with "mixed" output, that
is, representing the expandion of templates that generate HTML in wikitext. The
solution I'm going for no is to simply not expand them.

-- daniel



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

Reply via email to