[Wikitech-l] Big problem to solve: good WYSIWYG on WMF wikis

2010-12-28 Thread David Gerard
[crossposted to foundation-l and wikitech-l]


There has to be a vision though, of something better. Maybe something
that is an actual wiki, quick and easy, rather than the template
coding hell Wikipedia's turned into. - something Fred Bauder just
said on wikien-l.


Our current markup is one of our biggest barriers to participation.

AIUI, edit rates are about half what they were in 2005, even as our
fame has gone from popular through famous to part of the
structure of the world. I submit that this is not a good or healthy
thing in any way and needs fixing.

People who can handle wikitext really just do not understand how
offputting the computer guacamole is to people who can cope with text
they can see.

We know this is a problem; WYSIWYG that works is something that's been
wanted here forever. There are various hideous technical nightmares in
its way, that make this a big and hairy problem, of the sort where the
hair has hair.

However, I submit that it's important enough we need to attack it with
actual resources anyway.


This is just one data point, where a Canadian government office got
*EIGHT TIMES* the participation in their intranet wiki by putting in a
(heavily locally patched) copy of FCKeditor:


   http://lists.wikimedia.org/pipermail/mediawiki-l/2010-May/034062.html

I have to disagree with you given my experience. In one government
department where MediaWiki was installed we saw the active user base
spike from about 1000 users to about 8000 users within a month of having
enabled FCKeditor. FCKeditor definitely has it's warts, but it very
closely matches the experience non-technical people have gotten used to
while using Word or WordPerfect. Leveraging skills people already have
cuts down on training costs and allows them to be productive almost
immediately.

   http://lists.wikimedia.org/pipermail/mediawiki-l/2010-May/034071.html

Since a plethora of intelligent people with no desire to learn WikiCode
can now add content, the quality of posts has been in line with the
adoption of wiki use by these people. Thus one would say it has gone up.

In the beginning there were some hard core users that learned WikiCode,
for the most part they have indicated that when the WYSIWYG fails, they
are able to switch to WikiCode mode to address the problem. This usually
occurs with complex table nesting which is something that few of the
users do anyways. Most document layouts are kept simple. Additionally,
we have a multilingual english/french wiki. As a result the browser
spell-check is insufficient for the most part (not to mention it has
issues with WikiCode). To address this a second spellcheck button was
added to the interface so that both english and french spellcheck could
be available within the same interface (via aspell backend).


So, the payoffs could be ridiculously huge: eight times the number of
smart and knowledgeable people even being able to *fix typos* on
material they care about.

Here are some problems. (Off the top of my head; please do add more,
all you can think of.)


- The problem:

* Fidelity with the existing body of wikitext. No conversion flag day.
The current body exploits every possible edge case in the regular
expression guacamole we call a parser. Tim said a few years ago that
any solution has to account for the existing body of text.

* Two-way fidelity. Those who know wikitext will demand to keep it and
will bitterly resist any attempt to take it away from them.

* FCKeditor (now CKeditor) in MediaWiki is all but unmaintained.

* There is no specification for wikitext. Well, there almost is -
compiled as C, it runs a bit slower than the existing PHP compiler.
But it's a start!
http://lists.wikimedia.org/pipermail/wikitext-l/2010-August/000318.html


- Attempting to solve it:

* The best brains around Wikipedia, MediaWiki and WMF have dashed
their foreheads against this problem for at least the past five years
and have got *nowhere*. Tim has a whole section in the SVN repository
for new parser attempts. Sheer brilliance isn't going to solve this
one.

* Tim doesn't scale. Most of our other technical people don't scale.
*We have no resources and still run on almost nothing*.

($14m might sound like enough money to run a popular website, but for
comparison, I work as a sysadmin at a tiny, tiny publishing company
with more money and staff just in our department than that to do
*almost nothing* compared to what WMF achieves. WMF is an INCREDIBLY
efficient organisation.)


- Other attempts:

* Starting from a clear field makes it ridiculously easy. The
government example quoted above is one. Wikia wrote a good WYSIWYG
that works really nicely on new wikis (I'm speaking here as an
experienced wikitext user who happily fixes random typos on Wikia). Of
course, I noted that we can't start from a clear field - we have an
existing body of wikitext.


So, specification of the problem:

* We need good WYSIWYG. The government example suggests that a simple
word-processor-like 

Re: [Wikitech-l] dataset1, xml dumps

2010-12-28 Thread Ed Summers
On Wed, Dec 15, 2010 at 4:56 PM, Ariel T. Glenn ar...@wikimedia.org wrote:
 We want people besides us to host it.  We expect to put a copy at the
 new data center (at least), as well.

Does anyone know if the Wikipedia XML Data AWS Public Dataset [1] is
being routinely updated? It's showing a last update of September 29,
2009 1:09 AM GMT, but perhaps that's just the last update to the
dataset metadata? I guess I could mount the EBS volume to check
myself... It might be nice if the database dumps were included as well
I guess.

//Ed

[1] http://aws.amazon.com/datasets/2506

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Sentence-level editing usability videos

2010-12-28 Thread Jan Paul Posma
Hi all,

A short note for everyone interested in usability testing: the videos of the 
usability testing of sentence-level editing are available on Wikimedia Commons. 
Videos are in Dutch, transcripts/notes are in English. 
http://commons.wikimedia.org/wiki/Category:Sentence-level_editing

Best wishes,
Jan Paul
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] Big problem to solve: good WYSIWYG on WMF wikis

2010-12-28 Thread David Gerard
On 28 December 2010 16:06, Victor Vasiliev vasi...@gmail.com wrote:

 I have thought about WYSIWYG editor for Wikipedia and found it
 technically impossible. The main and key problem of WYSIWIG are
 templates. You have to understand that templates are not single
 element of Wikipedia syntax, they are integral part of page markup.
 You do not insert infobox template, you insert infobox *itself*, and
 from what I heard the templates were the main concern of many editors
 who were scared of wikitext.
 Now think of how many templates are there in Wikipedia, how frequently
 they are changed and how much time it would take to implement their
 editing.


Yes. So how do we sensibly - usably - deal with templates in a
word-processor-like layout? Is there a way that passes usability
muster for non-geeks? How do others do it? Do their methods actually
work?

e.g. Wikia has WYSIWYG editing and templates. They have a sort of
solution to template editing in WYSIWYG. It's not great, but people
sort of cope. How did they get there? What can be done to make it
better, *conceptually*?

What I'm saying there is that we don't start from the assumption that
we know nothing and have to start from scratch, forming our answers
only from pure application of personal brilliance; we should start
from the assumption that we know actually quite a bit, if we only know
who to ask and where. Does it require throwing out all previous work?
etc., etc. And this is the sort of question that requires actual
expense on resources to answer.

Given that considerable work has gone on already, what would we do
with resources to apply to the problem?


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] Big problem to solve: good WYSIWYG on WMF wikis

2010-12-28 Thread Brion Vibber
On Tue, Dec 28, 2010 at 8:43 AM, David Gerard dger...@gmail.com wrote:

 e.g. Wikia has WYSIWYG editing and templates. They have a sort of
 solution to template editing in WYSIWYG. It's not great, but people
 sort of cope. How did they get there? What can be done to make it
 better, *conceptually*?

 What I'm saying there is that we don't start from the assumption that
 we know nothing and have to start from scratch, forming our answers
 only from pure application of personal brilliance; we should start
 from the assumption that we know actually quite a bit, if we only know
 who to ask and where. Does it require throwing out all previous work?
 etc., etc. And this is the sort of question that requires actual
 expense on resources to answer.

 Given that considerable work has gone on already, what would we do
 with resources to apply to the problem?


My primary interest at the moment in this area is to reframe the question a
bit; rather than how do we make good WYSIWYG that works on the way
Wikipedia pages' markup and templates are structured now -- which we know
has been extremely hard to get going -- to instead consider how do we make
good WYSIWYG that does the sorts of things we currently use markup and
templates for, plus the things we wish we could do that we can't?

We have indeed learned a *huge* amount from the last decade of Wikipedia and
friends, among them:

* authors and readers crave advanced systems for data  format-sharing (eg
putting structured info into infoboxes) and interactive features (even just
sticking a marker on a map!)
* most authors prefer simplicity of editing (keep the complicated stuff out
of the way until you need it)
* some authors will happily dive into hardcore coding to create the tools
they need (templates, user/site JS, gadgets)
* many other authors will very happily use those tools once they're created
* the less the guts of those tools are exposed, the easier it is for other
people to reuse them


The incredible creativity of Wikimedians in extending the frontend
capabilities of MediaWiki through custom JavaScript, and the markup system
through templates, has been blowing my mind for years. I want to find a way
to point that creativity straight forward, as it were, and use it to kick
some ass. :)


Within the Wikimedia ecosystem, we can roughly divide the world into
Wikipedia and all the other projects. MediaWiki was created for
Wikipedia, based on previous software that had been adapted to the needs of
Wikipedia; and while the editing and template systems are sometimes awkward,
they work.

Our other projects like Commons, Wiktionary, Wikibooks, Wikiversity, and
Wikinews have *never* been as well served. The freeform markup model --
which works very well for body text on Wikipedia even if it's icky for
creating tables, diagrams and information sets -- has been a poorer fit, and
little effort has been spent on actually creating ways to support them well.

Commons needs better tools for annotating and grouping media resources.

Wiktionary needs structured data with editing and search tools geared
towards it.

Wikibooks needs a structure model that's based on groups of pages and media
resources, instead of just standalone freetext articles which may happen to
link to each other.

Wikiversity needs all those, and more interactive features and the ability
for users to group themselves socially and work together.


Getting anything done that would work on the huge, well-developed,
wildly-popular Wikipedia has always been a non-starter because it has to
deal with 10 years of backwards-compatibility from the get-go. I think it's
going to be a *lot* easier to get things going on those smaller projects
which are now so poorly served that most people don't even know they exist.
:)

This isn't a problem specific to Wikimedia; established organizations of all
sorts have a very difficult time getting new ideas over that hump from not
good enough for our core needs to *bam* slap it everywhere. By
concentrating on the areas that aren't served at all well by the current
system, we can make much greater headway in the early stages of development;
Clayton Christensen's The Innovator's Dilemma calls this competing
against non-consumption.


For the Wikipedia case, we need to incubate the next generation of
templating up to the point that they can actually undercut and replace
today's wikitext templates, or I worry we're just going to be sitting around
going gosh I wish we could replace these templates and have markup that
works cleanly in wysiwyg forever.


My current thoughts are to concentrate on a few areas:
1) create a widget/gadget/template/extension/plugin model built around
embedding blocks of information within a larger context...
2) ...where the data and rendering can be reasonably separate... (eg, not
having to pull tricks where you manually mix different levels of table
templates to make the infobox work right)
3) ...and the rendering can be as simple, or as fancy as complex, 

[Wikitech-l] StringFunctions on enwiki?

2010-12-28 Thread Carl (CBM)
At some point in the past, it was determined that the StringFunctions
extension (now part of the ParserFunctions extension) would be
disabled on enwiki. I know saw a comment to the effect of: if
StringFunctions was turned on, it would only encourage people to start
writing parsers in wikicode.

Maybe other people were already aware, but not me, that we have a set
of hacked-up string functions on enwiki, for example [[Template:Str
len]]. There's a whole category of them at [[Category:String
manipulation templates]].

I'd like to know the current opinion of the server ops about these
things. Is there any chance of StringFunctions being enabled? If not,
should we feel free to work around it as these templates do?

I'm writing to this list so it will be possible to link from enwiki to
the mailing list archives, so responses on-list would be best.

- Carl

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] StringFunctions on enwiki?

2010-12-28 Thread Alex Brollo
I too don't understand precisely why string functions are so discouraged. I
saw extremely complex templates built just to do (with a high server load I
suppose in my ignorance...) what could be obtained with an extremely simple
string function.

Alex
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] StringFunctions on enwiki?

2010-12-28 Thread OQ
On Tue, Dec 28, 2010 at 3:14 PM, Alex Brollo alex.bro...@gmail.com wrote:
 I too don't understand precisely why string functions are so discouraged. I
 saw extremely complex templates built just to do (with a high server load I
 suppose in my ignorance...) what could be obtained with an extremely simple
 string function.


This seems like it comes up every few months. I think the prevailing
opinion on why StringFuncs wasn't ever going to be enabled was think
wikimarkup has been bastardized enough as is and StringFuncs would
send the wiki into the next circle of markup syntax hell as it would
be giving editors more rope to hang themselves with.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] StringFunctions on enwiki?

2010-12-28 Thread MZMcBride
Alex Brollo wrote:
 I too don't understand precisely why string functions are so discouraged. I
 saw extremely complex templates built just to do (with a high server load I
 suppose in my ignorance...) what could be obtained with an extremely simple
 string function.

https://bugzilla.wikimedia.org/show_bug.cgi?id=6455#c92 (and subsequent
comments)

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] Big problem to solve: good WYSIWYG on WMF wikis

2010-12-28 Thread David Gerard
On 28 December 2010 16:54, Stephanie Daugherty sdaughe...@gmail.com wrote:

 Not only is the current markup a barrier to participation, it's a barrier to
 development. As I argued on Wikien-l, starting over with a markup that can
 be syntacticly validated, preferably one that is XML based would reap huge
 rewards in the safety and effectiveness of automated tools - authors of
 tools like AWB have just as much trouble making software handle the corner
 cases in wikitext markup as new editors have understanding it.


In every discussion so far, throwing out wikitext and replacing it
with something that isn't a crawling horror has been considered a
non-starter, given ten years and terabytes of legacy wikitext.

If you think you can swing throwing out wikitext and barring the
actual code from human editing - XML is not safely human editable in
any circumstances - then good luck to you, but I don't like your
chances.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] Big problem to solve: good WYSIWYG on WMF wikis

2010-12-28 Thread George Herbert
On Tue, Dec 28, 2010 at 3:43 PM, David Gerard dger...@gmail.com wrote:
 On 28 December 2010 16:54, Stephanie Daugherty sdaughe...@gmail.com wrote:

 Not only is the current markup a barrier to participation, it's a barrier to
 development. As I argued on Wikien-l, starting over with a markup that can
 be syntacticly validated, preferably one that is XML based would reap huge
 rewards in the safety and effectiveness of automated tools - authors of
 tools like AWB have just as much trouble making software handle the corner
 cases in wikitext markup as new editors have understanding it.


 In every discussion so far, throwing out wikitext and replacing it
 with something that isn't a crawling horror has been considered a
 non-starter, given ten years and terabytes of legacy wikitext.

 If you think you can swing throwing out wikitext and barring the
 actual code from human editing - XML is not safely human editable in
 any circumstances - then good luck to you, but I don't like your
 chances.

That is true - We can't do away with Wikitext always been the
intermediate conclusion (in between My god, we need to do something
about this problem and This is hopeless, we give up again).

Perhaps it's time to start some exercises in noneuclidian Wiki
development, and just assume the opposite and see what happens.


-- 
-george william herbert
george.herb...@gmail.com

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Big problem to solve: good WYSIWYG on WMF wikis

2010-12-28 Thread Happy-melon
There are some things that we know:

1) as Brion says, MediaWiki currently only presents content in one way: as 
wikitext run through the parser.  He may well be right that there is a 
bigger fish which could be caught than WYSIWYG editing by saying that MW 
should present data in other new and exciting ways, but that's actually a 
separate question.  *If* you wish to solve WYSIWYG editing, your baseline is 
wikitext and the parser.

2) guacamole is one of the more unusual descriptors I've heard for the 
parser, but it's far from the worst.  We all agree that it's horribly messy 
and most developers treat it like either a sleeping dragon or a *very* 
grumpy neighbour.  I'd say that the two biggest problems with it are that a) 
it's buried so deep in the codebase that literally the only way to get your 
wikitext parsed is to fire up the whole of the rest of MediaWiki around it 
to give it somewhere comfy to live in, and b) there is as David says no way 
of explaining what it's supposed to be doing except saying follow the code; 
whatever it does is what it's supposed to do.  It seems to be generally 
accepted that it is *impossible* to represent everything the parser does in 
any standard grammar.

Those are all standard gripes, and nothing new or exciting.  There are also, 
to quote a much-abused former world leader, some known unknowns:

1) we don't know how to explain What You See when you parse wikitext except 
by prodding an exceedingly grumpy hundred thousand lines of PHP and *asking 
What it thinks* You Get.

2) We don't know how to create a WYSIWYG editor for wikitext.

Now, I'd say we have some unknown unknowns.

1) *is* it because of wikitext's idiosyncracies that WYSIWYG is so 
difficult?  Is wikitext *by its nature* not amenable to WYSIWYG editing?

2) would a wikitext which *was* representable in a standard grammar be 
amenable to WYSIWYG editing?

3) would a wikitext which had an alternative parser, one that was not buried 
in the depths of MW (perhaps a full JS library that could be called in 
real-time on the client), be amenable to WYSIWYG editing?

4) are questions 2 and 3 synonymous?

--HM


David Gerard dger...@gmail.com wrote in 
message news:aanlktimthux-undo1ctnexcrqbpp89t2m-pvha6fk...@mail.gmail.com...
 [crossposted to foundation-l and wikitech-l]


 There has to be a vision though, of something better. Maybe something
 that is an actual wiki, quick and easy, rather than the template
 coding hell Wikipedia's turned into. - something Fred Bauder just
 said on wikien-l.


 Our current markup is one of our biggest barriers to participation.

 AIUI, edit rates are about half what they were in 2005, even as our
 fame has gone from popular through famous to part of the
 structure of the world. I submit that this is not a good or healthy
 thing in any way and needs fixing.

 People who can handle wikitext really just do not understand how
 offputting the computer guacamole is to people who can cope with text
 they can see.

 We know this is a problem; WYSIWYG that works is something that's been
 wanted here forever. There are various hideous technical nightmares in
 its way, that make this a big and hairy problem, of the sort where the
 hair has hair.

 However, I submit that it's important enough we need to attack it with
 actual resources anyway.


 This is just one data point, where a Canadian government office got
 *EIGHT TIMES* the participation in their intranet wiki by putting in a
 (heavily locally patched) copy of FCKeditor:


   http://lists.wikimedia.org/pipermail/mediawiki-l/2010-May/034062.html

 I have to disagree with you given my experience. In one government
 department where MediaWiki was installed we saw the active user base
 spike from about 1000 users to about 8000 users within a month of having
 enabled FCKeditor. FCKeditor definitely has it's warts, but it very
 closely matches the experience non-technical people have gotten used to
 while using Word or WordPerfect. Leveraging skills people already have
 cuts down on training costs and allows them to be productive almost
 immediately.

   http://lists.wikimedia.org/pipermail/mediawiki-l/2010-May/034071.html

 Since a plethora of intelligent people with no desire to learn WikiCode
 can now add content, the quality of posts has been in line with the
 adoption of wiki use by these people. Thus one would say it has gone up.

 In the beginning there were some hard core users that learned WikiCode,
 for the most part they have indicated that when the WYSIWYG fails, they
 are able to switch to WikiCode mode to address the problem. This usually
 occurs with complex table nesting which is something that few of the
 users do anyways. Most document layouts are kept simple. Additionally,
 we have a multilingual english/french wiki. As a result the browser
 spell-check is insufficient for the most part (not to mention it has
 issues with WikiCode). To address this a second spellcheck button was
 added to the interface so that 

Re: [Wikitech-l] Big problem to solve: good WYSIWYG on WMF wikis

2010-12-28 Thread Krinkle
Hi,

When this topic was raised before a few years ago (I dont remember  
which time,
it's been continuingly discussed throughout the years) I found an idea  
especially
interesting but it got buried in the mass.

 From memory and imagination:

The idea is to write a new parser that is not deep in MediaWiki and  
can therefor be used apart from
MediaWiki and is fairly easy to be translated to, for example,  
javascript.

This parser accepts similar input as we do now (ie. '''bold''',  
{{template}}, [[link|text]] etc.)
however totally rewritten and with more logical behavour. Call it a  
2.0 parser without any
worries about compatibilty or old wikitext edge cases which (ab)use  
the edge cases of the current parser.

This would become the default in MediaWiki for new pages created, and  
indicated by an int
in the revision table  (ie. rev_pv (parserversion) ). A WYSIWYG editor  
can be written for this
in javascript and it's great.

So what about articles with the old paser (ie. rev_pv=NULL /  
rev_pv=1) ? No problem,
the old parser stick around for a while and such articles simply dont  
have a WYSIWYG editor.

Editing articles with the old parser will show a small notice on top  
(like the one for pages larger than
x bytes due to old browser limits) showing an option 'switch' it. That  
would result in previewing the page's wikitext
with the new parser. The user can then make adjuistment as needed to  
make it look good again (if neccecary at all)
and save page (which saves the new revision with rev_pv=2, like it  
would do for new articles).

Since there are lots of articles which likely will have the same  
output in HTML and require no modification whatshowever
there could be a script written (either as a userbot for the end user  
or as a maintenance script) that would automatically check
all pages that have the old rev_pv and compare them to the output of  
the new parser and automatically update the rev_rv
field if it matches. All others would be visible on a SpecialPage for  
pages of which the last revision has an older version of the parser,
with a link to an MW.org page with an overview of a few things that  
regulars may want to know (ie. the most common differences).

Just an idea :)
--
Krinkle

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] Big problem to solve: good WYSIWYG on WMF wikis

2010-12-28 Thread Rob Lanphier
Hi Brion,

Thanks for laying out the problem so clearly!  I agree wholeheartedly
that we need to avoid thinking about this problem too narrowly as a
user interface issue on top of existing markup+templates.  More
inline:

On Tue, Dec 28, 2010 at 9:27 AM, Brion Vibber br...@pobox.com wrote:
 This isn't a problem specific to Wikimedia; established organizations of all
 sorts have a very difficult time getting new ideas over that hump from not
 good enough for our core needs to *bam* slap it everywhere. By
 concentrating on the areas that aren't served at all well by the current
 system, we can make much greater headway in the early stages of development;
 Clayton Christensen's The Innovator's Dilemma calls this competing
 against non-consumption.

Thankfully, we at least we're not trying to defend a business model
and cost structure that's fundamentally incompatible with making a
change here.  However, I know that's not the part that you're
highlighting, and I agree that Christensen's competing against
non-consumption concept is well worth learning about in this
context[1], as well as the concepts of disruptive innovation vs
continuous innovation[2].  As you've said, we've learned a lot in
the past decade of Wikipedia about how people use our the technology.
A new editing model that incorporates that learning will almost
certainly take a while to reach full parity in flexibility, power, and
performance.  The current editor base of English Wikipedia probably
won't be patient with any changes that result in a loss of
flexibility, power and performance.  Furthermore, many (perhaps even
most) things we'd be inclined to try would *not* have a measurable and
traceable impact on new editor acquisition and retention, which will
further diminish patience.  A mature project like Wikipedia is a hard
place to hunt for willing guinea pigs.

 For the Wikipedia case, we need to incubate the next generation of
 templating up to the point that they can actually undercut and replace
 today's wikitext templates, or I worry we're just going to be sitting around
 going gosh I wish we could replace these templates and have markup that
 works cleanly in wysiwyg forever.

 My current thoughts are to concentrate on a few areas:
 1) create a widget/gadget/template/extension/plugin model built around
 embedding blocks of information within a larger context...
 2) ...where the data and rendering can be reasonably separate... (eg, not
 having to pull tricks where you manually mix different levels of table
 templates to make the infobox work right)
 3) ...and the rendering can be as simple, or as fancy as complex, as your
 imagination and HTML5 allow.

Let me riff on what you're saying here (partly just to confirm that I
understand fully what you're saying).  It'd be very cool to have the
ability to declare a single article, or probably more helpfully, a
single revision of an article to use a completely different syntax.
There's already technically a kludgy model for that now:  wrap the
whole thing in a tag, and put the parser for the new syntax in a tag
extension.  That said, it would probably exacerbate our problems if we
allowed intermixing of old syntax and new syntax in a single revision.
 The goal should be to move articles irreversibly toward a new model,
and I don't think it'd be possible to do this without the tools to
prevent us from backsliding (for example, tools that allow editors to
convert an article from old syntax to new syntax, and also tools that
allow administrators to lock down the syntax choice for an article
without locking down the article).

Still, it's pretty alluring to think about the upgrade of syntax as an
incremental problem within an article.   We could figure out how to
solve one little corner of the data/rendering separation problem and
then move on to the next.  For example, we could start with citations
and make sure it's possible to insert citations easily and cleanly,
and to extract citations from an article without relying on scraping
the HTML to get them.  Or maybe we do that certain types of infoboxes
instead, and then gradually get more general.  We can take advantage
of the fact that we've got millions of articles to help us choose
which particular types of data will benefit from a targeted approach,
and tailor extensions to very specific data problems, and then
generalize after we sort out what works/doesn't work with a few
specific cases.

So, which problem first?

Rob

[1]  Those with an aversion to business-speak will require steely
fortitude to even click on the url, let alone actually read the
article, but it's still worth extracting the non-business points from
this article:
http://businessinnovationfactory.com/weblog/christensen_worldinnovationforum
[2]  While there is a Wikipedia article describing this[3], a better
description of the important bits is here:
http://www.mail-archive.com/haskell@haskell.org/msg18498.html
[3]  Whee, footnote to a footnote!

Re: [Wikitech-l] [Foundation-l] Big problem to solve: good WYSIWYG on WMF wikis

2010-12-28 Thread Billinghurst
(Lying on the ground in the foetal position sobbing gently ... poor poor 
Wiksource, 
forgotten again.)

Wikisource - we have tried to get the source and structure by regulating the 
spaces that 
we can, however, formalising template fields to forms would be great ...

* extension for DynamicPageList (previously rejected)
* search engines that work with transcluded text
* extension for music notation (Lilypond?)
* pdf text extraction tool to be implemented
* good metadata  tools
* bibliographic tools, especially tools that allow sister cross-references
* book-making tools that work with transcluded text
* tools that allow What links here across all of WMF
...

Hell, I could even see that text from WS references could be framed and 
transcluded to WP, 
and provide a ready link back to the works at the sites.  Same for WQ to 
transclude quotes 
from a WS reference text, ready links from Wiktionary to usage in WS books.  
That should 
be the value of a wiki and sister sites.

Regards, Andrew


On 28 Dec 2010 at 9:27, Brion Vibber wrote:

 On Tue, Dec 28, 2010 at 8:43 AM, David Gerard dger...@gmail.com wrote:
 
  e.g. Wikia has WYSIWYG editing and templates. They have a sort of
  solution to template editing in WYSIWYG. It's not great, but people
  sort of cope. How did they get there? What can be done to make it
  better, *conceptually*?
 
  What I'm saying there is that we don't start from the assumption that
  we know nothing and have to start from scratch, forming our answers
  only from pure application of personal brilliance; we should start
  from the assumption that we know actually quite a bit, if we only know
  who to ask and where. Does it require throwing out all previous work?
  etc., etc. And this is the sort of question that requires actual
  expense on resources to answer.
 
  Given that considerable work has gone on already, what would we do
  with resources to apply to the problem?
 
 
 My primary interest at the moment in this area is to reframe the question a
 bit; rather than how do we make good WYSIWYG that works on the way
 Wikipedia pages' markup and templates are structured now -- which we know
 has been extremely hard to get going -- to instead consider how do we make
 good WYSIWYG that does the sorts of things we currently use markup and
 templates for, plus the things we wish we could do that we can't?
 
 We have indeed learned a *huge* amount from the last decade of Wikipedia and
 friends, among them:
 
 * authors and readers crave advanced systems for data  format-sharing (eg
 putting structured info into infoboxes) and interactive features (even just
 sticking a marker on a map!)
 * most authors prefer simplicity of editing (keep the complicated stuff out
 of the way until you need it)
 * some authors will happily dive into hardcore coding to create the tools
 they need (templates, user/site JS, gadgets)
 * many other authors will very happily use those tools once they're created
 * the less the guts of those tools are exposed, the easier it is for other
 people to reuse them
 
 
 The incredible creativity of Wikimedians in extending the frontend
 capabilities of MediaWiki through custom JavaScript, and the markup system
 through templates, has been blowing my mind for years. I want to find a way
 to point that creativity straight forward, as it were, and use it to kick
 some ass. :)
 
 
 Within the Wikimedia ecosystem, we can roughly divide the world into
 Wikipedia and all the other projects. MediaWiki was created for
 Wikipedia, based on previous software that had been adapted to the needs of
 Wikipedia; and while the editing and template systems are sometimes awkward,
 they work.
 
 Our other projects like Commons, Wiktionary, Wikibooks, Wikiversity, and
 Wikinews have *never* been as well served. The freeform markup model --
 which works very well for body text on Wikipedia even if it's icky for
 creating tables, diagrams and information sets -- has been a poorer fit, and
 little effort has been spent on actually creating ways to support them well.
 
 Commons needs better tools for annotating and grouping media resources.
 
 Wiktionary needs structured data with editing and search tools geared
 towards it.
 
 Wikibooks needs a structure model that's based on groups of pages and media
 resources, instead of just standalone freetext articles which may happen to
 link to each other.
 
 Wikiversity needs all those, and more interactive features and the ability
 for users to group themselves socially and work together.
 
 
 Getting anything done that would work on the huge, well-developed,
 wildly-popular Wikipedia has always been a non-starter because it has to
 deal with 10 years of backwards-compatibility from the get-go. I think it's
 going to be a *lot* easier to get things going on those smaller projects
 which are now so poorly served that most people don't even know they exist.
 :)
 
 This isn't a problem specific to Wikimedia; established organizations of all
 sorts 

[Wikitech-l] Does anybody have the 20080726 dump version?

2010-12-28 Thread Monica shu
Hi all,

I have looked through the web for the 20080726 version of the dump file
pages-articles.xml.bz2.
But I can't find any result.
Can anybody provide me a download link? Thank a lot!

Following are the summarization of other versions of this file I have yet
found.
Wish they are useful for you.

2010-10-11
http://download.wikimedia.org/enwiki/20101011/
2010-09-16
http://download.wikimedia.org/enwiki/20100916/
2010-09-04
http://download.wikimedia.org/enwiki/20100904/
2010-08-17
http://download.wikimedia.org/enwiki/20100904/
enwiki-20100817-pages-articles.xml.bz2http://www.monova.org/details/3873361/enwiki-20100817-pages-articles.xml.bz2.html
(6.06
GiB) on monova.org
2010-07-30
enwiki-20100730-pages-articles.xml.bz2http://www.monova.org/details/3869561/enwiki-2010730-pages-articles.xml.bz2.html
(6.07
GiB) on monova.org
2010-05-14
enwiki-20100514-pages-articles.xml.bz2http://www.monova.org/details/3780808/enwiki-20100514-pages-articles.xml.bz2.html
(5.87
GiB) on monova.org
http://dumps.wikimedia.org/archive/enwiki/20100514/
2010-03-12
http://dumps.wikimedia.org/archive/enwiki/20100312/
2010-01-30
http://download.wikimedia.org/enwiki/20100130/
2009-10-09
http://jeffkubina.org/data/download.wikimedia.org/enwiki/20091009/
2009-06-18
http://download.wikimedia.org/enwiki/20100130/Pirate
Bayhttp://thepiratebay.org/torrent/4978482
 has 
enwiki-20090618-pages-articles.xmlhttp://torrents.thepiratebay.org/4978482/enwiki-20090618-pages-articles.xml.bz2.4978482.TPB.torrent,
4.9 GiB (5258589574 Bytes)
2008-10-08
http://jeffkubina.org/data/download.wikimedia.org/enwiki/20081008/
2008-06-21
http://www.torrentportal.com/details/4621368/Wikipedia+Wiki+Static+HTML+Dump+-+English+-+2008-06-21+-+wikipedia-en-static-html.tar.7z.html
2008-01-03
http://jeffkubina.org/data/download.wikimedia.org/enwiki/20080103/
English Wikipedia dump from
2008-01-03http://www.archive.org/details/enwiki-20080103

Best,

Monica
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Does anybody have the 20080726 dump version?

2010-12-28 Thread Chad
On Wed, Dec 29, 2010 at 12:16 AM, Monica shu monicashu...@gmail.com wrote:
 Hi all,

 I have looked through the web for the 20080726 version of the dump file
 pages-articles.xml.bz2.
 But I can't find any result.
 Can anybody provide me a download link? Thank a lot!


True story: I used to have a copy of the 20080726 dump. I
deleted it like a year ago because I didn't need it anymore
and I didn't know it had gone missing at the time.

I should ask next time :(

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] How would you disrupt Wikipedia?

2010-12-28 Thread Neil Kandalgaonkar
I've been inspired by the discussion David Gerard and Brion Vibber 
kicked off, and I think they are headed in the right direction.

But I just want to ask a separate, but related question.

Let's imagine you wanted to start a rival to Wikipedia. Assume that you 
are motivated by money, and that venture capitalists promise you can be 
paid gazillions of dollars if you can do one, or many, of the following:

1 - Become a more attractive home to the WP editors. Get them to work on 
your content.

2 - Take the free content from WP, and use it in this new system. But 
make it much better, in a way Wikipedia can't match.

3 - Attract even more readers, or perhaps a niche group of 
super-passionate readers that you can use to build a new community.

In other words, if you had no legacy, and just wanted to build something 
from zero, how would you go about creating an innovation that was 
disruptive to Wikipedia, in fact something that made Wikipedia look like 
Friendster or Myspace compared to Facebook?

And there's a followup question to this -- but you're all smart people 
and can guess what it is.

-- 
Neil Kandalgaonkar (   ne...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Big problem to solve: good WYSIWYG on WMF wikis

2010-12-28 Thread Andrew Dunbar
On 29 December 2010 02:07, Happy-melon happy-me...@live.com wrote:
 There are some things that we know:

 1) as Brion says, MediaWiki currently only presents content in one way: as
 wikitext run through the parser.  He may well be right that there is a
 bigger fish which could be caught than WYSIWYG editing by saying that MW
 should present data in other new and exciting ways, but that's actually a
 separate question.  *If* you wish to solve WYSIWYG editing, your baseline is
 wikitext and the parser.

Specifically, it only presents content as HTML. It's not really a
parser because it doesn't create an AST (Abstract Syntax Tree). It's a
wikitext to HTML converter. The flavour of the HTML can be somewhat
modulated by the skin but it could never output directly to something
totally different like RTF or PDF.

 2) guacamole is one of the more unusual descriptors I've heard for the
 parser, but it's far from the worst.  We all agree that it's horribly messy
 and most developers treat it like either a sleeping dragon or a *very*
 grumpy neighbour.  I'd say that the two biggest problems with it are that a)
 it's buried so deep in the codebase that literally the only way to get your
 wikitext parsed is to fire up the whole of the rest of MediaWiki around it
 to give it somewhere comfy to live in,

I have started to advocate the isolation of the parser from the rest
of the innards or MediaWiki for just this reason:
https://bugzilla.wikimedia.org/show_bug.cgi?id=25984

Free it up so that anybody can embed it in their code and get exactly
the same rendering that Wikipedia et al get, guaranteed.

We have to find all the edges where the parser calls other parts of
MediaWiki and all the edges where other parts of MediaWiki call the
parser. We then define these edges as interfaces so that we can drop
an alternative parser into MediaWiki and drop the current parser into
say an offline viewer or whatever.

With a freed up parser more people will hack on it, more people will
come to grok it and come up with strategies to address some of its
problems. It should also be a boon for unit testing.

(I have a very rough prototype working by the way with lots of stub classes)

 and b) there is as David says no way
 of explaining what it's supposed to be doing except saying follow the code;
 whatever it does is what it's supposed to do.  It seems to be generally
 accepted that it is *impossible* to represent everything the parser does in
 any standard grammar.

I've thought a lot about this too. It certainly is not any type of
standard grammar. But on the other hand it is a pretty common kind of
nonstandard grammar. I call it a recursive text replacement grammar.

Perhaps this type of grammar has some useful characteristics we can
discover and document. It may be possible to follow the code flow and
document each text replacement in sequence as a kind of parser spec
rather than trying and failing again to shoehorn it into a standard
LALR grammar.

If it is possible to extract such a spec it would then be possible to
implement it in other languages.

Some research may even find that is possible to transform such a
grammar deterministically into an LALR grammar...

But even if not I'm certain it would demysitfy what happens in the
parser so that problems and edge cases would be easier to locate.

Andrew Dunbar (hippietrail)

 Those are all standard gripes, and nothing new or exciting.  There are also,
 to quote a much-abused former world leader, some known unknowns:

 1) we don't know how to explain What You See when you parse wikitext except
 by prodding an exceedingly grumpy hundred thousand lines of PHP and *asking
 What it thinks* You Get.

 2) We don't know how to create a WYSIWYG editor for wikitext.

 Now, I'd say we have some unknown unknowns.

 1) *is* it because of wikitext's idiosyncracies that WYSIWYG is so
 difficult?  Is wikitext *by its nature* not amenable to WYSIWYG editing?

 2) would a wikitext which *was* representable in a standard grammar be
 amenable to WYSIWYG editing?

 3) would a wikitext which had an alternative parser, one that was not buried
 in the depths of MW (perhaps a full JS library that could be called in
 real-time on the client), be amenable to WYSIWYG editing?

 4) are questions 2 and 3 synonymous?

 --HM


 David Gerard dger...@gmail.com wrote in
 message news:aanlktimthux-undo1ctnexcrqbpp89t2m-pvha6fk...@mail.gmail.com...
 [crossposted to foundation-l and wikitech-l]


 There has to be a vision though, of something better. Maybe something
 that is an actual wiki, quick and easy, rather than the template
 coding hell Wikipedia's turned into. - something Fred Bauder just
 said on wikien-l.


 Our current markup is one of our biggest barriers to participation.

 AIUI, edit rates are about half what they were in 2005, even as our
 fame has gone from popular through famous to part of the
 structure of the world. I submit that this is not a good or healthy
 thing in any way and needs fixing.

 

Re: [Wikitech-l] Does anybody have the 20080726 dump version?

2010-12-28 Thread Monica shu
@_...@...

Thanks any way:)

Anyone else hands  up?

On Wed, Dec 29, 2010 at 3:18 PM, Chad innocentkil...@gmail.com wrote:

 On Wed, Dec 29, 2010 at 12:16 AM, Monica shu monicashu...@gmail.com
 wrote:
  Hi all,
 
  I have looked through the web for the 20080726 version of the dump file
  pages-articles.xml.bz2.
  But I can't find any result.
  Can anybody provide me a download link? Thank a lot!
 

 True story: I used to have a copy of the 20080726 dump. I
 deleted it like a year ago because I didn't need it anymore
 and I didn't know it had gone missing at the time.

 I should ask next time :(

 -Chad

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l