In the interest of gathering slightly larger statistics, I manually
reviewed 200 VE entries on recent changes.

I am classifying these as

* "Good" edit
* Test edits / newbie errors likely to happen in either editor (not VE's fault)
* Obvious vandal edit (not VE's fault)
* Damaged source that renders correctly (e.g. VE altering source
formatting that most source editors like to maintain, mostly involving
missing newlines)
* Test edits / newbie errors that occur with VE but were unlikely to
occur previously (e.g. people using lots of extra spaces / newlines to
visually "format" the positioning of content on a page)
* Corrupted page content that appears to be caused by the unfamiliar
UI (e.g. <nowiki>[[Foo]]</nowiki>)
* Corrupted page content that appears to be directly caused by VE /
Parsoid bugs (e.g. dirty diffs)

For the sake of simplicity, I'm counting any edit that is well-formed
and not obviously of destructive intent as "good" without
consideration of whether it satisfies Wikipedia content policies such
as NPOV, OR, etc.  There are inevitably many edits whose content isn't
appropriate or needs to be cleaned up, but I'm not worrying about
those.  In the rare case that an edit might fall in multiple
categories, I'm counting it towards the most severe of the categories
listed above (i.e. lower on the list).  This only applied a handful of
times.

Altogether, I rated the edits as follows:

* "Good": 157 (78.5%)
* Test / newbie edits (generic): 13 (6.5%)
* Obvious vandal: 7 (3.5%)
* Damaged source: 7 (3.5%)
* Test / newbie edits (VE oriented): 3 (1.5%)
* Corrupted page due to unfamiliar UI: 11 (5.5%)
* VE / Parsoid errors: 1 (0.5%) [This involved the editor spewing out
extra garbage when trying to save a page that started with malformed
table syntax.]

That's better than I would have guessed before going into this
exercise.  In particular, Subbu appears to be correct that the true
error rate for VE / Parsoid is relatively small (though I'd prefer the
error rate be too small to notice in a test like this).

The tendency of users to add nowikis and create other corruptions due
to the unfamiliar UI is still a concern though.  This is not at all
helped by the fact we have dozen of help pages that teach wiki syntax
like "[[Foo]]" which are now directly unhelpful to people using VE.
Over time, that will diminish somewhat as experienced users learn the
new system, though if it is truly to be useful for new users we
probably want to look carefully at the various ways new users may
misunderstand the new UI.  Also, I'm not thrilled by the examples of
VE manipulating the use of newlines so that template parameters,
categories, and other things sometimes run together in the source.

If the immediate goal is to ensure that using VE is not harmful for
people who know how to use it, then the developers are probably
getting pretty close.  Given the nowikis and such, VE probably still
causes higher levels of accidental harm when operated by unfamiliar
users than I would personally be comfortable with.  And of course,
there is still a long ways to go in terms of feature completeness and
usability before we can really discuss Erik's dream of having VE be
perceived as better than source editing.

-Robert Rohde

On Tue, Jul 23, 2013 at 9:06 AM, Subramanya Sastry
<ssas...@wikimedia.org> wrote:
> Hi John and Risker,
>
> First off, I do want to once again clarify that my intention in the previous
> post was not to claim that VE/Parsoid is perfect.  It was more that we've
> fixed sufficient bugs at this point that the most significant "bugs" (bugs,
> not missing features) that need fixing (and are being fixed) are those that
> have to do with usability tweaks.  My intention in that post was also not
> one to put some distance between us and the complaints, just to clarify that
> we are fixing things as fast as we can and it can be seen in the recent
> changes stream.
>
> John: specific answers to the edit diffs you highlighted in your post.  I
> acknowledge your intention to make sure we dont make false claims about
> VE/Parsoid's usability.   Thanks for taking the time for digging them up.
> My answers below are made with an intention of figuring out what the issues
> are so they can be fixed where they need to be.
>
>
> On 07/23/2013 02:50 AM, John Vandenberg wrote:
>>
>> On Tue, Jul 23, 2013 at 4:32 PM, Subramanya Sastry
>> <ssas...@wikimedia.org> wrote:
>>>
>>> On 07/22/2013 10:44 PM, Tim Starling wrote:
>>>>
>>>> Round-trip bugs, and bugs which cause a given wikitext input to give
>>>> different HTML in Parsoid compared to MW, should have been detected
>>>> during automated testing, prior to beta deployment. I don't know why
>>>> we need users to report them.
>>>
>>>
>>> 500+ edits are being done per hour using Visual Editor [1] (less at this
>>> time given that it is way past midnight -- I have seen about 700/hour at
>>> times).  I did go and click on over 100 links and examined the diffs.  I
>>> did
>>> that twice in the last hour.  I am happy to report clean diffs on all
>>> edits
>>> I checked both times.
>>>
>>> I did run into a couple of nowiki-insertions which
>>> is, strictly speaking not erroneous and based on user input, but is more
>>> a
>>> usability issue.
>>
>> What is a dirty diff?  One that inserts junk unexpectedly, unrelated
>> to the user's input?
>
>
> That is correct.  Strictly speaking, yes, any changes to the wikitext markup
> that arose from what the user didn't change.
>
>> The broken table injection bugs are still happening.
>>
>>
>> https://en.wikipedia.org/w/index.php?title=Sai_Baba_of_Shirdi&curid=144175&diff=565442800&oldid=565354286
>>
>> If the parser isnt going to be fixed quickly to ignore tables it
>> doesnt understand, we need to find the templates and pages with these
>> broken tables - preferably using SQL and heuristics and fix them.  The
>> same needs to be done for all the other wikis, otherwise they are
>> going to have the same problems happening randomly, causing lots of
>> grief.
>
>
> This maybe related to this:
> https://bugzilla.wikimedia.org/show_bug.cgi?id=51217  and I have a tentative
> fix for it as of y'day.
>
> VE and Parsoid devs have put in a lot and lot of effort to recognize broken
> wikitext source, fix it or isolate it, and protect it across edits, and
> roundtrip it back in original form to prevent corruption.  I think we have
> been largely successful but we still have more cases to go that are being
> exposed here which we will fix.  But, occasionally, these kind of errors do
> show up -- and we ask for your patience as we fix these.  Once again, this
> is not a claim to perfection, but a claim that this is not a significant
> source of corrupt edits.  But, yes even a 0.1% error rate does mean a big
> number in the absolute when thousands of pages are being edited -- and we
> will continue to pare this down.
>
>
>> I presume this is also a dirty diff
>>
>>
>> https://en.wikipedia.org/w/index.php?title=Dubbing_%28filmmaking%29&curid=8860&diff=565438776&oldid=565408739
>
>
> Yes, this is a dirty diff where Parsoid reformatted a 2-line image wikitext
> source into one by removing a line break.  Again, relative to the # of edits
> being made, these are not frequent -- that was all that I claimed, which I
> think is still true.  But that said, this is a relatively minor and harmless
> change.
>
>
>> In addition to nowikis, there are also wikilinks that are not what the
>> user intended
>>
>>
>> https://en.wikipedia.org/w/index.php?title=Ben_Tre&curid=1822927&diff=565439119&oldid=561995413
>>
>> https://en.wikipedia.org/w/index.php?title=Celton_Manx&curid=28176434&diff=565439020&oldid=565436056
>
>
> You are correct, but this is not a dirty diff.  I dont want to claim this is
> an user error entirely  -- but a combination of user and software error.
>
>
>> Here is three edits to try to add a section header and a sentence,
>> with a wikilink in the section header.
>> (In the process they added other junk into the page, probably
>> unintentionally.)
>>
>>
>> https://en.wikipedia.org/w/index.php?title=Port_of_Davao&action=history&offset=2013072307&limit=4
>
>
> What is the problem here exactly?  (that is a question, not a challenge).
> The user might have entered those newlines as well.
>
>
>> A leading line feed in a parameter - what the?
>>
>>
>> https://en.wikipedia.org/w/index.php?title=List_of_Sam_%26_Cat_episodes&curid=39469556&diff=565437324&oldid=565416618
>
>
> This is something I'll have to investigate.
>
>
>> External links which should be converted to internal links:
>>
>>
>> https://en.wikipedia.org/w/index.php?title=Krishna_%28TV_series%29&diff=565437699&oldid=564621100
>
>
> This could be an enhancement to Parsoid.  Thanks for the bug report :-).
>
>
>> That is all in the last hour, and I've only checked ~100 diffs.
>>
>> I appreciate that some of these are a result of user input, and the RC
>> feed of non-VE edits will have similar problems exhibited by newbies,
>> albeit different because it is the source editor.  And it is great to
>> see so many constructive VE edits.
>
>
> Thank you for acknowledging this!
>
>
>> But you're not going to get much
>> love by claiming that it is now stable and not causing broken diffs.
>
>
> I did not make that specific claim, but it is possible that my tone was more
> aggressive than I intended it to be.  Just so there is no confusion,
> VE/Parsoid can still cause dirty/broken diffs, but I think the claim is that
> at this time, the vast majority of ongoing edits do not corrupt wikitext
> source and where there are diffs, we have narrowed that down to a couple of
> causes (where VE/Parsoid inserts nowiki wrappers) which are being fixed.
>
>
>> In addition, VE can crash a Google Chrome tab, and it can cause
>> (unsaved changes) dataloss in most browser configurations.
>
>
> Subbu.
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to