Re: [Wikitext-l] Markup syntax

Pavel Tkachenko Sun, 12 Feb 2012 03:34:56 -0800

On Wed, 8 Feb 2012 15:20:41 +0100, Mihaly Heder <[email protected]>
> If wikitext is going to be replaced the new language should be
> designed on an abstract level first.
This is correct but if we're talking about a universal DOM that could
represent all potential syntax and has space for extensions (nodes of
the new type can be safely added in future) then new markup can be
discussed in general terms before the DOM itself.


It doesn't really matter unless we start correlating DOM and markup -
then it will be time for BNFs.

> So the real question is whether a new-gen wiki syntax will be
> compatible with a consensual data model we might have in the future.
I don't think it's a good idea to design wiki DOM and new wiki syntax
separately, otherwise it'll be the same trouble current wikitext is
stuck in.

The real problem is whether the core devs is interested in new markup
at all or not. I don't think anything difficult in designing new DOM
except a few tricky places (templates, inclusions) but it should not
take another year to be complete, definitely not.

On Wed, 8 Feb 2012 07:42:33 -0700, Stanton McCandlish
<[email protected]> wrote:
> I'm a "geek" and do not "dislike" or "despise" XML/[X]HTML or WYSIWYG or
> wikimarkup. They all have their uses for different users and even the
> same user in different situations for different purposes.
Indeed, but my point was that XML is hardly more usable in text
editing environments than some convenient wiki markup. You seem to
agree with this later in your text.

> applying a 'class="work-title"', would produce factually incorrect output
> (i.e., in one context at least, outright *corrupt data*) that said that a
> comma-space was part of the title of the work.
This is because wikitext allows dealing with underlying/resulting HTML
on low-level. A proper markup must abstract the user out of everything
so he can't just insert a tag wherever he feels is pertinent. If a
user does need to insert a tag then the markup is not well-planned and
must be corrected.

This will both increase security (XSS prevention, etc.), uniformity
(someone writes <b>, someone <strong>) and portability - this in
particular because this is why current wikitext is so problematic with
all of its low-level HTML stuff that must be transformed on upgrade.

> It's crucial that I be able to tweak stuff at the
> character-by-character level, and alter the markup around that content in
> any way I need to.
Good point. Also, in text-only environments text tools like Search &
Replace can be used and not only to edit text itself but its markup as
well.

> But for actual article drafting, in prose sentences and paragraphs, as
> opposed to tweaking, I vastly prefer WYSIWYG.  I seriously doubt I'm
> alone in any of this, even in the combination of preferences I've
> outlined.
This might be true. This only seconds the point everybody seems to
agree on - to have both markup- and WYSIWYG editors in one place.

On Wed, 8 Feb 2012 16:06:55 +0100, "Oren Bochman" <[email protected]>
> I disagree that that xhtml is a geek only storage format or that the
> current Wikisyntax has a lower learning curve.
This is exactly the problem of current wikitext. I would compare it
with C++ and "ideal" wiki markup - with Pascal or even BASIC.

> I think that an xml subset is the ideal should be the underlying format.
Underlying format NOT MEANT for human interaction directly. Not by
non-geeks. This is what I meant under "storage format".

> This could provide interoperability with other wikis format and a
> friendlier variant of the existing wiki markup.
Good point.

> easy to parse (unambiguous, won't require context or semantics to parse)
This definition should be extended to "context-specific" because some
items might be ambiguous but used in different places. For example,
anything inside a code block is unprocessed can can be as ambigous as
the editor desires - this is the point of a code block. It only needs
to have a proper end token.

> Would be fully learnable in a couple of hours...
Starting editor should be able to learn the new markup in 5 minutes.
Or have all of its basic formatting listed under a small help box.

> If we put our heads together and come up with something like that we will
> make some real progress.
This is what I'm trying to push here. One point that keeps me from
starting doing this myself and presenting the results is whether this
research will actually be used by the MediaWiki team - Gabriel says
it's "all planned" which I read "things just won't get worse".

On Wed, 8 Feb 2012 16:27:57 +0100, Mihaly Heder <[email protected]>
> But then there are millions of pages already written in legacy
> wikitext and those must be editable with the new editor. So right now
> instead the rational approach, an empirical one should be taken - they
> have to rather ''find'' than invent a good enough model for those old
> articles, and also store everything in the old format.
This is bad practice. I agree that the amount of pages written in
"outdated" markup is overwhelming; however, this only means that the
migration layer must be well-tested and thoroughly written, nothing
else. If you will "find" with a "good enough model" you will end up
with the same millions of pages (or more by that time) that will
hopefully use slightly better markup.

After all, even if some hundreds of wiki pages cannot be converted in
completely automatic mode Wikipedia/WMF has enough staff to fix this
in a sane period.

At this time a cardinal action must be taken to eliminate old syntax
completely, one and for all. Otherwise the same discussion will arise
several years later (after several more years of "searching for a
model").

> I just mean that noone standed up and proposed "Hey, this would look
> better in this different way".
All those millions of people who edited pages didn't think this
actually looks wrong? I doubt it very much, perhaps there was just no
place to tell their thoughts or there was no one to hear because "this
is fine for amateurs".

> Anyone can create new templates, with any name and parameters he wishes.
Templates are powerful but widely abused feature since they can be
used to hide parser/markup bugs. I even think templates should only be
created by devs after discussion, otherwise it results in what we see
now.

>> 3. {{About "Something, something and something", of kind}}
>> As you can see, no character is banned from the title (...)
> What about the separator? Eg. [[The character "]]
Nothing, it's fine. Two options exist for the parser:
1. Either it treats all " as starting a new context and thus [[The
character "]]"]] actually creates a link with caption <The character
"]]">.
2. Or it treats ]] as an ultimate token and standalone " is output as is.

>> Right, and pipes should not appear in templates either. It's too special
>> symbol.
> Why so? So far the only reason you gave is that it's not on all keyboard
> layouts.
And is not used in most languages, yes. Is it bad enough reason? Why
choose it for an international project like MediaWiki, if there are
alternatives?

>>   * remote links can also get their title from <title> after fetching
>> first 4 KiB of that page or something
> No way. That can be good for preloading a title on link insertion and
> storing it indefinitely, but not for doing it every time.
Of course not every time, the engine might maintain the cache with
remote links or somehow else alleviate the traffic. And it can be
disabled and then the parser wil use some other means of generating
title for titleless external links.

> Only if pages with no spaces are more common than pages with spaces in
> the name.
> Taking enwiki articles as a sample:
> * 7746101 articles with space.
> * 1416235 articles without space.
Thanks for the statistics. Well, then my point about "half of the
cases" isn't fair; however, this doesn't change the fact the pipe
isn't as universal as double equality sign which can still typed with
the same speed and is less prone to misoperations because it's double
and has less chances to appear in-text.

> However, you could take advantage of the space-is-underscore, and use
> [[Some_page this page]] (but still not 'clean')
Yes, this is not very clean and relies on parser/engine behavior. It
should be fine to have "Some page" and "Some_page" as two different
pages.

> Nitpicking, first heading has just one equal sign [at each side] :)
And this is the problem. Even DokuWiki uses not less than 2 "=" for
headings. First-level heading appears so rare in the document that it
can have "==". Actually, since a document has just one first-level
heading all others (2+) can use two "==" as well because there's no
sense in creating second-level heading before the document title
(first-level).

I think MediaWiki currently lets the user create even the 6th level
heading before the doc title?

>> Standardizing is fine unless it starts looking unnatural. The following
>> example might be argued but I can't think of another one quickly:
>> tar -czf file.tar.gz .
> Not a bad example, as that's one of those utilities with odd parameters
> "... The different styles were developed at different times ..."
Yes, you've got my idea.

> That's a source of problems. It's fine having dumb programs that you
> need to walk-through. When the programs are smart, if they don't go up
> to the leve, that's an issue.
This is true but this just requires more conscious developer. Nobody
will argue that it's harder to write smart programs that dump programs
following certain preexisting conventions (e.g. cryptic *nix CL
interface that can explain everything or nearly so).

When designing a text markup why should we follow bad guidelines?

> No. Clean syntax of
> 1. Foo
> 2. Bar
> 3. Baz
This is the syntax I have suggested for ordered lists earlier. "1. 1.
1." only compliments it.

>> Not at all because we are talking about context-specific grammar.
>> Addresses in links can hold no formatting and thus all but context
>> ending tokens (]], space and ==) are ignored there.
> Oh, you're not autolinking urls.
I didn't really understand that.

Well, it seems like the thread ends here.

Signed,
P. Tkachenko

_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Re: [Wikitext-l] Markup syntax

Reply via email to