[Bug 17486] Parser generates malformed HTML with a list inside a table

2013-06-09 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=17486

--- Comment #31 from Daniel Friesen  ---
(In reply to comment #30)
> My comment was on topic simply because the malformed output is caused by
> incorrect specification about how distinct content elements can be safely
> embedded into each other.
> 
> And the whole topic is about this issue: the basic wiki syntax interacts very
> badly with the HTML (or XML) syntax based on *explicit* closure of tags (or
> wiki syntaxes). The current parsing rules contradict between each other, and
> we
> constantly have to find tricks to avoid these issues and incorrect output
> (which may parse as valid HTML5 but was in fact not the one intended and will
> be wrong XHTML5 anyway).

Specification and mixing custom WikiText syntaxes with HTML is irrelevant.
We're supposed to fail silently when bad WikiText is used and output valid HTML
even when given crap, not output malformed markup.

This WikiText:
* List item 1. 

 Cell 1. 


* List item 2.

Outputs this:
 List item 1. 


 Cell 1. 


 List item 2.


There's a  right after the  it leaves 
and  elements outside of a table, that's invalid.

This but has nothing to do with integrating the WikiText list syntax and HTML
table markup. The fix for this issue is simply making sure that the garbage we
output for this invalid input is still well-formed markup.

Try inserting that garbage output back into a wiki page:
 List item 1. 


 Cell 1. 


 List item 2.


This is essentially the same garbage that the user gives us. But this time the
parser outputs:
 List item 1. 


 Cell 1. 


 List item 2.



While there is a minor validity issue in the fact that we have a string of text
inside of a  but outside of a cell -- fixing that would probably be a
separate bug -- that aside the markup is still well formed XML. Tags are
properly paired up, same number of each, and they are closed in the correct
order. When output into an XHTML5 page parsed with an XML parser this will work
and won't give you an XML parse error.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 17486] Parser generates malformed HTML with a list inside a table

2013-06-09 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=17486

--- Comment #30 from Philippe Verdy  ---
My comment was on topic simply because the malformed output is caused by
incorrect specification about how distinct content elements can be safely
embedded into each other.

And the whole topic is about this issue: the basic wiki syntax interacts very
badly with the HTML (or XML) syntax based on *explicit* closure of tags (or
wiki syntaxes). The current parsing rules contradict between each other, and we
constantly have to find tricks to avoid these issues and incorrect output
(which may parse as valid HTML5 but was in fact not the one intended and will
be wrong XHTML5 anyway).

Note that I did not discuss about XHTML 1.0, but HTML5 is still intended to
have a valid XMHTL representation, so that XHTML5 should be parsed by *both* an
XML parser or an HTML5 parser (generating a compatible DOM structure using
either parsers).

All out issues are in fact created when inserting contents from utility
templates (this reduces their reusability or forces them to use very ugly
tricks, or ugly parameters where they are used, and this does not make them
simpler to use in articles).

I maintain that wiki syntaxes should be fully integrated with the HTML syntax
under the same content model (offering to users the choice between them, using
HTML where the wiki syntax is too limited, but without breaking parsing rules;
the wiki syntax should then only be a purely *local* shorthand of the HTML
syntax, everything being generated with knowledge of the HTML DOM, even if the
syntax generated will also be compatible with XML/XHTML parsers).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 17486] Parser generates malformed HTML with a list inside a table

2013-06-09 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=17486

--- Comment #29 from Daniel Friesen  ---
(In reply to comment #28)
> Note that the intent now is to deprecate the support for valid XHTML, but we
> still have a problem for HTML5 with the (more lenient) HTML5 parsing rules.

XHTML 1.0 is dead, there is no deprecation, we do not support XHTML 1.0 at all
anymore.

However we are NOT deprecating well-formed XML output. We still intend for
parser and interface output to be well-formed XML when `$wgWellFormedXml =
true;` is set. We also try to support XHTML5 when you set `$wgMimeType =
'application/xhtml+xml';`. And even when well formed XML is false we still want
to output non-malformed HTML.

And THIS bug is about mixed table/list WikiText creating invalid output that
isn't tidied up by the parser when Tidy is not enabled.

NOT about some syntax improvements to WikiText you want. Please stop talking
about those here and create a real bug for them. You've been warned about this
already.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 17486] Parser generates malformed HTML with a list inside a table

2013-06-09 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=17486

--- Comment #28 from Philippe Verdy  ---
Note that the intent now is to deprecate the support for valid XHTML, but we
still have a problem for HTML5 with the (more lenient) HTML5 parsing rules.

We still need to find solution to have a way to ease the integration of
templates possibly generating tables (or similar) within our old wiki syntax
based on prefixes of single lines (this concerns our syntax for numbered lists,
bulleted lists, definition lists/indented blocks, tables, as well as horizontal
rulers, base on the first character of lines in "{|!-;:#*", as well as doube
newlines for creating new paragraphs).

I still think that we should have an alternate way to avoid the syntaxic
limitations introduced by newlines and specific parsing at begining of lines,
to allow more flexibility, and less problems for parsing them.

Our current syntax for tables is the most problematic one, forcing us to use
ugly syntaxes in templates, and nightmares when we want to integrat them (e.g.
in navigation templates and infoboxes).

But bulleted lists and numbered lists still suffer from the lack of support for
adding attributes (e.g. in numbered lists we still annot set the initial
number, we cannot specify classes or styles)

We should be able to use:

* item1
*|attributes...| item2

and also allow generation using an explicit list initiator:

{*attributes for the list...
|| item1
|attributes for item2| item2 || item3 ||
item4
}

as if it was a table containing a single row where each item is a cell, except
that newlines are trateed here like other whitespaces, so it is equivalent to:

{*attributes for the list...
|| item1
|attributes for item2| item2
|| item3
|| item4
}

or to:

{*|attribute...||item1|attributes for item2|item2||item3||item4}

This last syntax shows that it allows easy syntaxes in templates. Such syntax
will remain integrable into another list, or indented block or in a table cell,
for example here with embedded numbered lists:

{#
||item1
||item2
  {#
  ||item2.1
  ||item2.2
  }
||item3
||item4
}

In such syntax, all newlines are treated like whitespaces, and whitespaces are
trimmed, allowing free form for indenting in wiki sources, and easier syntaxes
for templates. The previous example could as well be compacted into a single
line, with all "cells" fully trimmed:

{#||item1||item2{#||item2.1||item2.2}||item3||item4}

And optional attributes are specifiable everywhere if needed (between the
doubled pipes in this example).

The alternative is to just allow the HTML5 syntax and improve its parsing in
MediaWiki so that it can be really used everywhere, when our old syntax as
problems (HTML5 does not force us to close all tags, this is good for wiki
editing, even if it is a bit verbose, but it should not be a problem for
creating complex templates like infoboxes, designed by a few competent
contributors). But MediaWiki still does not treat HTML tags like its old
"simplified" syntax for equivalent tags.

Example of limitation, these are not recognized as one list:


* item1
* item2


Second example with similar limitation:

* item1
 item2
* item3  item4

If those limitations were solved, we would have less problems to generate
contents by mixing the best of the HTML syntax, when it solves problems in
templates, or the "simpler" wiki inherited syntax. We would alno no longer have
to suffer the current nightmare of newlines.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 17486] Parser generates malformed HTML with a list inside a table

2013-06-08 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=17486

Daniel Friesen  changed:

   What|Removed |Added

 Blocks||49337

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 17486] Parser generates malformed HTML with a list inside a table

2013-06-08 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=17486

Daniel Friesen  changed:

   What|Removed |Added

Summary|Parser generates malformed  |Parser generates malformed
   |XML with a list inside a|HTML with a list inside a
   |table   |table

--- Comment #27 from Daniel Friesen  ---
This is really invalid HTML too so changing the subject.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l