Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-09-17 Thread Ricordisamoa
Stephen Niedzielski: "it seems like, as soon as you get the HTML the 
first thing you want to do, perhaps a little bit ironically because it's 
called Parsoid, it's parse the output a little bit more"

https://www.youtube.com/watch?v=3WJID_WC7BQ=35m14s

Il 23/07/2015 22:02, C. Scott Ananian ha scritto:

HTML5+RDFa is a machine-readable format.  But I think what you are asking
for is either better documentation of the template-related stuff (did you
read through the slides inhttps://phabricator.wikimedia.org/T105175  ?) or
HTML template parameter support (https://phabricator.wikimedia.org/T52587)
which is in the codebase but not enabled by default in production.
  --scott
​
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-09-17 Thread Subramanya Sastry

On 09/17/2015 07:44 PM, Ricordisamoa wrote:
Stephen Niedzielski: "it seems like, as soon as you get the HTML the 
first thing you want to do, perhaps a little bit ironically because 
it's called Parsoid, it's parse the output a little bit more"

https://www.youtube.com/watch?v=3WJID_WC7BQ=35m14s


That is somewhat of a misunderstanding that Scott clarified.

Parsing is involved whenever you want to convert a string format to an 
object format.


So, unless you want to figure out a way to transfer DOM objects between 
server and client, you are going to continue transferring HTML strings 
which you then parse in the browser to rebuild the DOM representation.


Subbu.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-09-17 Thread C. Scott Ananian
And as I responded there, if I gave you a JSON string instead, the first
thing you'd need to do is parse the JSON to turn it into something you can
use.

The difference is that JSON and html5 parsers are standard components in
every programming language.  Html5 even has a standard object
representation and manipulation library (DOM) which is available in every
major programming language.
  --scott
On Sep 17, 2015 8:45 PM, "Ricordisamoa" 
wrote:

> Stephen Niedzielski: "it seems like, as soon as you get the HTML the first
> thing you want to do, perhaps a little bit ironically because it's called
> Parsoid, it's parse the output a little bit more"
> https://www.youtube.com/watch?v=3WJID_WC7BQ=35m14s
>
> Il 23/07/2015 22:02, C. Scott Ananian ha scritto:
>
>> HTML5+RDFa is a machine-readable format.  But I think what you are asking
>> for is either better documentation of the template-related stuff (did you
>> read through the slides inhttps://phabricator.wikimedia.org/T105175  ?)
>> or
>> HTML template parameter support (https://phabricator.wikimedia.org/T52587
>> )
>> which is in the codebase but not enabled by default in production.
>>   --scott
>> ​
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-08-15 Thread Ricordisamoa

Great!
As a further improvement, it should be a separate package using the 
service's REST API.


Il 14/08/2015 00:20, C. Scott Ananian ha scritto:

Good news: https://doc.wikimedia.org/Parsoid/master/#!/guide/jsapi now
documents the new friendlier API for Parsoid.
  --scott

On Tue, Aug 11, 2015 at 3:03 PM, Ricordisamoa ricordisa...@openmailbox.org
wrote:


Il 03/08/2015 22:08, C. Scott Ananian ha scritto:


On Sat, Aug 1, 2015 at 2:23 AM, Ricordisamoaricordisa...@openmailbox.org
wrote:

Il 31/07/2015 21:08, C. Scott Ananian ha scritto:

I agree that we have not (to date) spent a lot of time on APIs supporting

direct editing of the Parsoid DOM.  I tend to do things directly using
the
low-level DOM methods myself (and that's how I presented my Parsoid
tutorial at wikimania this year) but I can see the attractiveness of the
`mwparserfromhell` API in abstracting some of the details of the
representation.

Thankfully you can have it both ways!  Over the past week I've cloned
the
`mwparserfromhell` API, build on top of the Parsoid DOM.  The initial
patches have been merged, but there's a little work to do to get the API
docs up on docs.wikimedia.org properly.  Once that's done I'll post
here
with pointers.

Thanks!

Unfortunately, that still requires using Node.js and depending on the
parsoid package.

Clearly you're just trying to bait me into porting my code to python.

I'm baiting you into exposing a mwparserfromhell-like AST from RESTBase.
Then I can deal with a Python client, a PHP one, etc. :-)

I assure you there is nothing JavaScript-specific about this; there are

HTML
DOM-manipulation libraries available in all major programming languages.
HTML *is* an AST (in this case, at least).
   --scott



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l







___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-08-14 Thread Derk-Jan Hartman
This is awesome, thank you Scott

DJ


 On 14 aug. 2015, at 00:20, C. Scott Ananian canan...@wikimedia.org wrote:
 
 Good news: https://doc.wikimedia.org/Parsoid/master/#!/guide/jsapi now
 documents the new friendlier API for Parsoid.
 --scott
 
 On Tue, Aug 11, 2015 at 3:03 PM, Ricordisamoa ricordisa...@openmailbox.org
 wrote:
 
 Il 03/08/2015 22:08, C. Scott Ananian ha scritto:
 
 On Sat, Aug 1, 2015 at 2:23 AM, Ricordisamoaricordisa...@openmailbox.org
 
 wrote:
 
 Il 31/07/2015 21:08, C. Scott Ananian ha scritto:
 
 I agree that we have not (to date) spent a lot of time on APIs supporting
 direct editing of the Parsoid DOM.  I tend to do things directly using
 the
 low-level DOM methods myself (and that's how I presented my Parsoid
 tutorial at wikimania this year) but I can see the attractiveness of the
 `mwparserfromhell` API in abstracting some of the details of the
 representation.
 
 Thankfully you can have it both ways!  Over the past week I've cloned
 the
 `mwparserfromhell` API, build on top of the Parsoid DOM.  The initial
 patches have been merged, but there's a little work to do to get the API
 docs up on docs.wikimedia.org properly.  Once that's done I'll post
 here
 with pointers.
 
 Thanks!
 Unfortunately, that still requires using Node.js and depending on the
 parsoid package.
 
 Clearly you're just trying to bait me into porting my code to python.
 
 
 I'm baiting you into exposing a mwparserfromhell-like AST from RESTBase.
 Then I can deal with a Python client, a PHP one, etc. :-)
 
 I assure you there is nothing JavaScript-specific about this; there are
 HTML
 DOM-manipulation libraries available in all major programming languages.
 HTML *is* an AST (in this case, at least).
  --scott
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 
 
 
 --
 (http://cscott.net)
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-08-13 Thread C. Scott Ananian
Good news: https://doc.wikimedia.org/Parsoid/master/#!/guide/jsapi now
documents the new friendlier API for Parsoid.
 --scott

On Tue, Aug 11, 2015 at 3:03 PM, Ricordisamoa ricordisa...@openmailbox.org
wrote:

 Il 03/08/2015 22:08, C. Scott Ananian ha scritto:

 On Sat, Aug 1, 2015 at 2:23 AM, Ricordisamoaricordisa...@openmailbox.org
 
 wrote:

 Il 31/07/2015 21:08, C. Scott Ananian ha scritto:

 I agree that we have not (to date) spent a lot of time on APIs supporting
 direct editing of the Parsoid DOM.  I tend to do things directly using
 the
 low-level DOM methods myself (and that's how I presented my Parsoid
 tutorial at wikimania this year) but I can see the attractiveness of the
 `mwparserfromhell` API in abstracting some of the details of the
 representation.

 Thankfully you can have it both ways!  Over the past week I've cloned
 the
 `mwparserfromhell` API, build on top of the Parsoid DOM.  The initial
 patches have been merged, but there's a little work to do to get the API
 docs up on docs.wikimedia.org properly.  Once that's done I'll post
 here
 with pointers.

 Thanks!
 Unfortunately, that still requires using Node.js and depending on the
 parsoid package.

 Clearly you're just trying to bait me into porting my code to python.


 I'm baiting you into exposing a mwparserfromhell-like AST from RESTBase.
 Then I can deal with a Python client, a PHP one, etc. :-)

 I assure you there is nothing JavaScript-specific about this; there are
 HTML
 DOM-manipulation libraries available in all major programming languages.
 HTML *is* an AST (in this case, at least).
   --scott


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
(http://cscott.net)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-08-11 Thread Ricordisamoa

Il 03/08/2015 22:08, C. Scott Ananian ha scritto:

On Sat, Aug 1, 2015 at 2:23 AM, Ricordisamoaricordisa...@openmailbox.org
wrote:


Il 31/07/2015 21:08, C. Scott Ananian ha scritto:


I agree that we have not (to date) spent a lot of time on APIs supporting
direct editing of the Parsoid DOM.  I tend to do things directly using the
low-level DOM methods myself (and that's how I presented my Parsoid
tutorial at wikimania this year) but I can see the attractiveness of the
`mwparserfromhell` API in abstracting some of the details of the
representation.

Thankfully you can have it both ways!  Over the past week I've cloned the
`mwparserfromhell` API, build on top of the Parsoid DOM.  The initial
patches have been merged, but there's a little work to do to get the API
docs up on docs.wikimedia.org properly.  Once that's done I'll post here
with pointers.


Thanks!
Unfortunately, that still requires using Node.js and depending on the
parsoid package.


Clearly you're just trying to bait me into porting my code to python.


I'm baiting you into exposing a mwparserfromhell-like AST from RESTBase.
Then I can deal with a Python client, a PHP one, etc. :-)


I assure you there is nothing JavaScript-specific about this; there are HTML
DOM-manipulation libraries available in all major programming languages.
HTML *is* an AST (in this case, at least).
  --scott



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-08-03 Thread C. Scott Ananian
On Sat, Aug 1, 2015 at 2:23 AM, Ricordisamoa ricordisa...@openmailbox.org
wrote:

 Il 31/07/2015 21:08, C. Scott Ananian ha scritto:

 I agree that we have not (to date) spent a lot of time on APIs supporting
 direct editing of the Parsoid DOM.  I tend to do things directly using the
 low-level DOM methods myself (and that's how I presented my Parsoid
 tutorial at wikimania this year) but I can see the attractiveness of the
 `mwparserfromhell` API in abstracting some of the details of the
 representation.

 Thankfully you can have it both ways!  Over the past week I've cloned the
 `mwparserfromhell` API, build on top of the Parsoid DOM.  The initial
 patches have been merged, but there's a little work to do to get the API
 docs up on docs.wikimedia.org properly.  Once that's done I'll post here
 with pointers.


 Thanks!
 Unfortunately, that still requires using Node.js and depending on the
 parsoid package.


Clearly you're just trying to bait me into porting my code to python.  I
assure you there is nothing JavaScript-specific about this; there are HTML
DOM-manipulation libraries available in all major programming languages.
HTML *is* an AST (in this case, at least).
 --scott

-- 
(http://cscott.net)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-08-03 Thread C. Scott Ananian
On Sat, Aug 1, 2015 at 3:39 AM, Ricordisamoa ricordisa...@openmailbox.org
wrote:

 You are right that there are some redundancies in information
 representation (because of having to serve multiple needs), but as far as I
 know, it is mostly around image attributes. If there is anything else
 specific (beyond image attributes) that is bothering you, can you flag that?



 https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec#Transclusion_content
 All template parameters are in data-mw but not parsed. Parameters ending
 up in the 'final' wikitext are parsed separately.


Parsed template parameters require the `addHTMLTemplateParameters`
parameter to Parsoid.  We are actively discussing how to expose this sort
of functionality via the Parsoid API.  But it's not strictly required: you
can just recursively invoke Parsoid on the template arguments.  I'll try to
whip up an example of this soon.


 I see huge demand for alternative wikignome-style editors. The more
 Parsoid's DOM is predictable, concise and documented, the more users you
 get.


 I think Parsoid's DOM is predictable :-) but, can you say more about what
 prompted you to say that?


 For example, to find images I have to search elements where typeof is one
 of mw:Image, mw:Image/Thumb, mw:Image/Frame, mw:Image/Frameless, then see
 if it's a figure or a span, and expect either a figcaption or data-mw
 accordingly. Add that the img tag's parent can be a or span...
 Instead, this is what I'd expect a proper structure to look like:



The CSS selector `figure, [typeof~=mw:Image]` will capture all of the
image elements.  Similarly, `figure  *:last-child, [typeof~=mw:Image] 
*:last-child` will always capture the caption element (more or less).  The
structure is actually pretty locked down.  (And my mwparserfromhell clone
has some image-related helpers to make it even easier.)

Part of the problem here is that media-related markup in wikitext is quite
fiendishly complicated, with lots of interlocking parts.  The presence of
one sort of option can completely change the meaning of others.  The
Parsoid DOM is designed to try to simplify this complexity, rather than
directly mirror the wikitext craziness.
  --scott

-- 
(http://cscott.net)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-08-01 Thread Ricordisamoa

Il 31/07/2015 21:08, C. Scott Ananian ha scritto:

I agree that we have not (to date) spent a lot of time on APIs supporting
direct editing of the Parsoid DOM.  I tend to do things directly using the
low-level DOM methods myself (and that's how I presented my Parsoid
tutorial at wikimania this year) but I can see the attractiveness of the
`mwparserfromhell` API in abstracting some of the details of the
representation.

Thankfully you can have it both ways!  Over the past week I've cloned the
`mwparserfromhell` API, build on top of the Parsoid DOM.  The initial
patches have been merged, but there's a little work to do to get the API
docs up on docs.wikimedia.org properly.  Once that's done I'll post here
with pointers.


Thanks!
Unfortunately, that still requires using Node.js and depending on the 
parsoid package.
Were the mwparserfromhell-like 'AST' exposed by RESTBase directly, 
there'd easily be lots of thin manipulation libraries in different 
programming languages.




Eventually I'd like to put the pieces together and implement something like
a `pywikibot` clone based on this API and using the RESTBase APIs for
read/write access to the wiki.  As has been mentioned, the RESTBase API for
saving edits is not yet quite complete (
https://phabricator.wikimedia.org/T101501); once that is done there should
be no problem connecting the dots.  (In the meantime you can use the API I
just implemented to reserialize the wikitext and then use the standard PHP
APIs, but that's a little bit clunky.)
  --scott
​
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-08-01 Thread Ricordisamoa

Il 01/08/2015 01:20, Subramanya Sastry ha scritto:

On 07/31/2015 12:55 PM, Ricordisamoa wrote:


Hi Subbu,
thank you for this thoughtful insight.


And thank you for starting this thread. :-)

HTML is not a barrier by itself. The problem seems to be Parsoid 
being built primarily with VisualEditor in mind.


While we want the DOM to be VE-friendly, we definitely don't want the 
DOM to be VE-centric and that has been the intention from the very 
beginning. Flow, CX also use the Parsoid DOM for their functionality. 
There are other users too [1].


VE, Flow, CX all take advantage of HTML. And I can't make any sense out 
of editProtectedHelper.js 
https://en.wikipedia.org/wiki/User:Jackmcbarn/editProtectedHelper.js :'(


We definitely want Parsoid's output to be useful and usable more 
broadly as the canonical output representation of wikitext and are 
open to fixing whatever prevents that.


As Scott noted in the other email on the thread, inspired (and maybe 
challenged by :-) ) by mwparserfromhell's utilities, he has already 
whipped out a layer that provides an easier interface for manipulating 
the DOM.


It is not clear to me how can a single DOM serving both view and edit 
modes avoid redundancy.


You are right that there are some redundancies in information 
representation (because of having to serve multiple needs), but as far 
as I know, it is mostly around image attributes. If there is anything 
else specific (beyond image attributes) that is bothering you, can you 
flag that?


https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec#Transclusion_content
All template parameters are in data-mw but not parsed. Parameters ending 
up in the 'final' wikitext are parsed separately.




I see huge demand for alternative wikignome-style editors. The more 
Parsoid's DOM is predictable, concise and documented, the more users 
you get. 


I think Parsoid's DOM is predictable :-) but, can you say more about 
what prompted you to say that? 


For example, to find images I have to search elements where typeof is 
one of mw:Image, mw:Image/Thumb, mw:Image/Frame, mw:Image/Frameless, 
then see if it's a figure or a span, and expect either a figcaption or 
data-mw accordingly. Add that the img tag's parent can be a or span...

Instead, this is what I'd expect a proper structure to look like:

Image
.src = title, internal or external link?
.repository?
.page = number or null
.language = code or null
.format = thumb etc.
.caption = wikitext parsed recursively
.link = internal or external link or null
.size
 .original
  .width = 1234
  .height = 4321
 .specified
  .width = 2468
 .computed
  .width = 2468
  .height = 8642

As for documentation, we document the DOM we generate and its 
semantics here [2]. 


It seems that some sections need updates, e.g. noinclude / includeonly / 
onlyinclude 
https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec#noinclude_.2F_includeonly_.2F_onlyinclude


As for size, I just looked at the Barack Obama page and here are some 
size numbers.


By concise I meant an antonym for redundant, not lengthy :-)



1540407 /tmp/Barack_Obama.parsoid.html
1197318 /tmp/Barack_Obama.parsoid.no-data-mw.html
1045161 /tmp/Barack_Obama.php-parser.output.footer-stripped.html

Right now, because we inline template and other editable information 
(as inline JSON attributes of the DOM), it is a bit bulky. However, we 
have always had plans to move the data-mw attribute into its own 
bucket which we might at some point in which case the size will be 
closer to the current PHP parser output. If we moved page properties 
and other metadata out, it will shrink it a little bit more.


For views that don't need to support editing or any other manipulation 
or analyses, we can more aggressively strip more from the HTML without 
affecting the rendering


Stripping HTML altogether would be a huge step forward. :-)

and get close to or even shrink the size below the PHP parser output 
size (there might be use cases where that might be appropriate thing 
to do). I could get this down to under 1M by stripping rel attributes, 
element ids, and about ids for identifying template output.


But, for editing (not just in VE) use cases, because of additional 
markup in place on the page (element ids, other markup for 
transclusions, extensions, links, etc.), the output will probably be 
somewhat larger than the corresponding PHP parser output. If we can 
keep it under 1.1x of php parser output size, I think we are good.



I hope we can meet in the middle :-)


Please file bugs and continue to report things that get in the way of 
using Parsoid.


Subbu.

[1] https://www.mediawiki.org/wiki/Parsoid/Users
[2] http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-31 Thread Ricordisamoa

Il 24/07/2015 15:53, Subramanya Sastry ha scritto:

On 07/23/2015 01:07 PM, Ricordisamoa wrote:

Il 23/07/2015 15:28, Antoine Musso ha scritto:

Le 23/07/2015 08:15, Ricordisamoa a écrit :

Are there any stable APIs for an application to get a parse tree in
machine-readable format, manipulate it and send the result back 
without

touching HTML?
I'm sorry if this question doesn't make any sense.

You might want to explain what you are trying to do and which wall you
have hit when attempting to use Parsoid :-)



For example, adding a template transclusion as new parameter in 
another template.

XHTML5+RDFa is the wall :-(
Can't Parsoid's deserialization be caught at some point to get a 
higher-level structure like mwparserfromhell 
https://github.com/earwig/mwparserfromhell's?


Parsoid and mwparserfromhell have different design goals and hence do 
things differently.


Parsoid is meant to support HTML editing and hence provides semantic 
information as annotations over the HTML document. It effectively 
maintains a bidirectional/reversible mapping between segments of 
wikitext and DOM trees. You can manipulate the DOM trees and get back 
wikitext that represents the edited tree. As for useless information 
and duplicate information -- I think if you looked at the Parsoid DOM 
spec [1], you will know what to look for and what to manipulate. The 
information on the DOM is meant to (a) render accurately (b) support 
the various bots / clients / gadgets that look for specific kinds of 
information, and (b) be editable easily. If that spec has holes or 
needs updates or fixing, we are happy to do that. Do let us know.


mwparserfromhell is an entirely wikitext-centric library as far as I 
can tell. It is meant to manipulate wikitext directly. It is a neat 
library which provides a lot of utilities and makes it easy to do 
wikitext transformations. It doesn't know about or care about HTML 
because it doesn't need to. It also seems to effectively gives you 
some kind of wikitext-centric AST. These are all impressions based on 
a quick scan of its docs -- so pardon any misunderstandings.


Parsoid does not provide you a wikitext AST directly since it doesn't 
construct one. All wikitext information shows up indirectly as DOM 
annotations (either attributes or JSON information in attributes). As 
Scott showed, you can still do document (wikitext) manipulations 
using DOM libraries, CSS-style queries, or directly by walking the 
DOM. There are lots of ways you can edit mediawiki pages without 
knowing about wikitext and using the vast array of HTML libraries. 
That happens to be our tagline: we deal with wikitext so you don't 
have to.


But, you are right. It can indeed seem cumbersome if you want to 
directly manipulate wikitext without the DOM getting in between or 
having to deal with DOM libraries. But that is not the use case we 
target. There are a vastly greater number of libraries in all kinds of 
languages (and developers) that know about HTML and can render, 
handle, and manipulate HTML easily than know how to (or want to) 
manipulate wikitext programmatically. Kind of the difference between 
the wikitext editor and the visual editor. They each have their 
constituencies and roles.


All that said, as Scott noted, it is possible to develop a 
mwparserfromhell like layer on top of the Parsoid DOM annotations if 
you want a wikitext-centric view (as opposed to a DOM-centric view 
that most editing clients seem to want). But, since that is not a use 
case that we target, that hasn't been on our radar. If someone does 
want to take that on, and thinks it would be useful, we are happy to 
provide assistance. It should not be too difficult.


Does that help summarize this issue and clarify the differences and 
approaches of these two tools? I am on vacation :-)  so responses 
will be delayed.


Subbu.

[1] http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec



Hi Subbu,
thank you for this thoughtful insight.
HTML is not a barrier by itself. The problem seems to be Parsoid being 
built primarily with VisualEditor in mind. It is not clear to me how can 
a single DOM serving both view and edit modes avoid redundancy.
I see huge demand for alternative wikignome-style editors. The more 
Parsoid's DOM is predictable, concise and documented, the more users you 
get. I hope we can meet in the middle :-)


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-31 Thread C. Scott Ananian
I agree that we have not (to date) spent a lot of time on APIs supporting
direct editing of the Parsoid DOM.  I tend to do things directly using the
low-level DOM methods myself (and that's how I presented my Parsoid
tutorial at wikimania this year) but I can see the attractiveness of the
`mwparserfromhell` API in abstracting some of the details of the
representation.

Thankfully you can have it both ways!  Over the past week I've cloned the
`mwparserfromhell` API, build on top of the Parsoid DOM.  The initial
patches have been merged, but there's a little work to do to get the API
docs up on docs.wikimedia.org properly.  Once that's done I'll post here
with pointers.

Eventually I'd like to put the pieces together and implement something like
a `pywikibot` clone based on this API and using the RESTBase APIs for
read/write access to the wiki.  As has been mentioned, the RESTBase API for
saving edits is not yet quite complete (
https://phabricator.wikimedia.org/T101501); once that is done there should
be no problem connecting the dots.  (In the meantime you can use the API I
just implemented to reserialize the wikitext and then use the standard PHP
APIs, but that's a little bit clunky.)
 --scott
​
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-31 Thread Subramanya Sastry

On 07/31/2015 12:55 PM, Ricordisamoa wrote:


Hi Subbu,
thank you for this thoughtful insight.


And thank you for starting this thread. :-)

HTML is not a barrier by itself. The problem seems to be Parsoid being 
built primarily with VisualEditor in mind.


While we want the DOM to be VE-friendly, we definitely don't want the 
DOM to be VE-centric and that has been the intention from the very 
beginning. Flow, CX also use the Parsoid DOM for their functionality. 
There are other users too [1]. We definitely want Parsoid's output to be 
useful and usable more broadly as the canonical output representation of 
wikitext and are open to fixing whatever prevents that.


As Scott noted in the other email on the thread, inspired (and maybe 
challenged by :-) ) by mwparserfromhell's utilities, he has already 
whipped out a layer that provides an easier interface for manipulating 
the DOM.


It is not clear to me how can a single DOM serving both view and edit 
modes avoid redundancy.


You are right that there are some redundancies in information 
representation (because of having to serve multiple needs), but as far 
as I know, it is mostly around image attributes. If there is anything 
else specific (beyond image attributes) that is bothering you, can you 
flag that?


I see huge demand for alternative wikignome-style editors. The more 
Parsoid's DOM is predictable, concise and documented, the more users 
you get. 


I think Parsoid's DOM is predictable :-) but, can you say more about 
what prompted you to say that? As for documentation, we document the DOM 
we generate and its semantics here [2]. As for size, I just looked at 
the Barack Obama page and here are some size numbers.


1540407 /tmp/Barack_Obama.parsoid.html
1197318 /tmp/Barack_Obama.parsoid.no-data-mw.html
1045161 /tmp/Barack_Obama.php-parser.output.footer-stripped.html

Right now, because we inline template and other editable information (as 
inline JSON attributes of the DOM), it is a bit bulky. However, we have 
always had plans to move the data-mw attribute into its own bucket which 
we might at some point in which case the size will be closer to the 
current PHP parser output. If we moved page properties and other 
metadata out, it will shrink it a little bit more.


For views that don't need to support editing or any other manipulation 
or analyses, we can more aggressively strip more from the HTML without 
affecting the rendering and get close to or even shrink the size below 
the PHP parser output size (there might be use cases where that might be 
appropriate thing to do). I could get this down to under 1M by stripping 
rel attributes, element ids, and about ids for identifying template output.


But, for editing (not just in VE) use cases, because of additional 
markup in place on the page (element ids, other markup for 
transclusions, extensions, links, etc.), the output will probably be 
somewhat larger than the corresponding PHP parser output. If we can keep 
it under 1.1x of php parser output size, I think we are good.



I hope we can meet in the middle :-)


Please file bugs and continue to report things that get in the way of 
using Parsoid.


Subbu.

[1] https://www.mediawiki.org/wiki/Parsoid/Users
[2] http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-24 Thread Marko Obrovac
On 24 July 2015 at 07:34, Ricordisamoa ricordisa...@openmailbox.org wrote:

 Il 24/07/2015 06:35, C. Scott Ananian ha scritto:

 Well, it's really just a different way of thinking about things.  Instead
 of:
 ```

 import mwparserfromhell
 text = I has a template! {{foo|bar|baz|eggs=spam}} See it?
 wikicode = mwparserfromhell.parse(text)
 templates = wikicode.filter_templates()

 ```
 you would write:
 ```
 js Parsoid = require('parsoid');
 js text = I has a template! {{foo|bar|baz|eggs=spam}} See it?;
 js Parsoid.parse(text, { document: true }).then(function(res) {
templates =
 res.out.querySelectorAll('[typeof~=mw:Transclusion]');
console.log(templates);
   }).done();
 ```

 That said, it wouldn't be hard to clone the API of

 http://mwparserfromhell.readthedocs.org/en/latest/api/mwparserfromhell.html


 Parsoid's expressiveness seems to convey useless information, overlook
 important details, or duplicate them in different places.
 If I want to resize an image, am I supposed to change data-file-width
 and data-file-height? width and height? Or src?
 I think what I'm looking for is sort of an 'enhanced wikitext' rather than
 'annotated HTML'.

  and that would probably be a great addition to the parsoid package API.

 HTML is just a tree structured data representation.  Think of it as XML if
 it makes you happier.  It just happens to come with well-defined semantics
 and lots of manipulation libraries.

 I don't know about edits tagged as VisualEditor.  That seems like that
 should only be done by VE.


 All edits made via visualeditoredit 
 https://www.mediawiki.org/w/api.php?action=helpmodules=visualeditoredit
 are tagged.

  I take it you would like an easy work flow to
 fetch a page, make edits, and then write the new revision back?


 Right.


RESTBase could help you there. With one API call, you can get the (stored)
latest HTML revision of a page in Parsoid format~[1], but without the need
to wait for Parsoid to parse it (if the latest revision is in RESTBase's
storage). There is also section API support (you can get individual HTML
fragments of a page by ID, and send only those back for transformation into
wikitext~[2]). There is also support for page editing (aka saving), but
these endpoints have not yet been enabled for WMF wikis in production due
to security concerns.

[1]
https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/page_html__title__get
[2]
https://en.wikipedia.org/api/rest_v1/?doc#!/Transforms/transform_sections_to_wikitext__title___revision__post

Cheers,
Marko




mwparserfromhell doesn't actually seem to have that functionality


 It is actually pretty easy to do with Pywikibot.
 But since Parsoid happens to work server-side, it makes sense to request
 and send back the structured tree directly.

  , but it
 would also be nice to facilitate that use case if we can.
--scott

 ​
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


 Thanks for your time.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-24 Thread James Forrester
On 23 July 2015 at 22:34, Ricordisamoa ricordisa...@openmailbox.org wrote:

 Il 24/07/2015 06:35, C. Scott Ananian ha scritto:

 I don't know about edits tagged as VisualEditor.  That seems like that

 should only be done by VE.


 All edits made via visualeditoredit 
 https://www.mediawiki.org/w/api.php?action=helpmodules=visualeditoredit
 are tagged.


​Yes. That's because that is the *private* API for VisualEditor. It
absolutely should not ever be used by anyone else.​ It's not like any of
the 'real' APIs in MediaWiki – it is designed for exactly one use case
(VisualEditor), makes huge assumptions about the world and what is needed
(like tagging edits), and we make breaking changes all the time.
Unfortunately, the request to badge internal APIs got turned into flagging
it and similar APIs in MediaWiki as This module is internal or unstable.,
which isn't strong enough on just how bad an idea it is to use it. I would
extremely strongly suggest that you do not use it, ever.

As Marko, Subbu and Scott point out, we have actual public APIs for this
kind of stuff, in the forms of RESTbase and Parsoid, and that's what you
should use.

Yours,
-- 
James D. Forrester
Lead Product Manager, Editing
Wikimedia Foundation, Inc.

jforres...@wikimedia.org | @jdforrester
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-24 Thread C. Scott Ananian
As a proof of concept, I started to build a `mwparserfromhell`-like
interface to the Parsoid DOM.

You can see it at https://gerrit.wikimedia.org/r/226734

I started by translating the template examples from the mwparserfromhell
documentation, which means I'm really jumping in at the deep end.  Most
non-template manipulations should be much easier!
 --scott
​
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-24 Thread Ricordisamoa

Il 24/07/2015 15:56, C. Scott Ananian ha scritto:

On Fri, Jul 24, 2015 at 12:34 AM, Ricordisamoa ricordisa...@openmailbox.org

wrote:
Parsoid's expressiveness seems to convey useless information, overlook
important details, or duplicate them in different places.
If I want to resize an image, am I supposed to change data-file-width
and data-file-height? width and height? Or src?


These are great points, and reports from folks like you will help to
improve our documentation.  My goal for Parsoid's DOM[1] is that every bit
of information from the wikitext is represented exactly *once* in the
result.


Be it so!



In your example, `data-file-width` and `data-file-height` represent the
*unscaled* size of the *source* image.  Many image scaling operations want
to know this, so we include it in the DOM.  It is ignored when you convert
back to wikitext.

The `width` and `height` attributes are what you should modify if you want
to resize an image, just like you would do for any naive html editor.


AFAICS there's still no way to know exactly how an image's size was 
specified in the original wikitext.




The `src` attribute is again mostly ignored (sigh); the 'resource'
attribute specifies the url of the unscaled image.  Of course if 'resource'
is missing we'll try to make do with `src`; we really try hard to do
something reasonable with whatever we're given.
   --scott

[1] There is a tension between don't repeat yourself and the use of
Parsoid DOM for read views.  Certain attributes (like alt and title)
get duplicated by default by the PHP parser.  So far I think we've been
mostly successful in not letting this sort of thing infect the Parsoid DOM,
but there may be corner cases we accomodate for the sake of ease-of-use for
viewers.




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-24 Thread Gabriel Wicke
On Fri, Jul 24, 2015 at 10:58 AM, Ricordisamoa ricordisa...@openmailbox.org
 wrote:


 RESTBase could help you there. With one API call, you can get the (stored)
 latest HTML revision of a page in Parsoid format~[1], but without the need
 to wait for Parsoid to parse it (if the latest revision is in RESTBase's
 storage).


 What if it isn't?



If it is not in storage, then it will be generated transparently. This
should only sometimes happen when you request a revision less than a
handful of seconds after it was saved.


  There is also section API support (you can get individual HTML
 fragments of a page by ID, and send only those back for transformation
 into
 wikitext~[2]). There is also support for page editing (aka saving), but
 these endpoints have not yet been enabled for WMF wikis in production due
 to security concerns.


 Then I guess HTML would have to be converted into wikitext before saving?
 +1 API call


As Marko mentioned, the HTML save end point is not yet enabled in
production. Once it is, you will be able to directly POST modified HTML to
save it, without adding a VisualEditor tag or having to perform extra API
requests.

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-24 Thread Subramanya Sastry

On 07/23/2015 01:07 PM, Ricordisamoa wrote:

Il 23/07/2015 15:28, Antoine Musso ha scritto:

Le 23/07/2015 08:15, Ricordisamoa a écrit :

Are there any stable APIs for an application to get a parse tree in
machine-readable format, manipulate it and send the result back without
touching HTML?
I'm sorry if this question doesn't make any sense.

You might want to explain what you are trying to do and which wall you
have hit when attempting to use Parsoid :-)



For example, adding a template transclusion as new parameter in 
another template.

XHTML5+RDFa is the wall :-(
Can't Parsoid's deserialization be caught at some point to get a 
higher-level structure like mwparserfromhell 
https://github.com/earwig/mwparserfromhell's?


Parsoid and mwparserfromhell have different design goals and hence do 
things differently.


Parsoid is meant to support HTML editing and hence provides semantic 
information as annotations over the HTML document. It effectively 
maintains a bidirectional/reversible mapping between segments of 
wikitext and DOM trees. You can manipulate the DOM trees and get back 
wikitext that represents the edited tree. As for useless information and 
duplicate information -- I think if you looked at the Parsoid DOM spec 
[1], you will know what to look for and what to manipulate. The 
information on the DOM is meant to (a) render accurately (b) support the 
various bots / clients / gadgets that look for specific kinds of 
information, and (b) be editable easily. If that spec has holes or needs 
updates or fixing, we are happy to do that. Do let us know.


mwparserfromhell is an entirely wikitext-centric library as far as I can 
tell. It is meant to manipulate wikitext directly. It is a neat library 
which provides a lot of utilities and makes it easy to do wikitext 
transformations. It doesn't know about or care about HTML because it 
doesn't need to. It also seems to effectively gives you some kind of 
wikitext-centric AST. These are all impressions based on a quick scan of 
its docs -- so pardon any misunderstandings.


Parsoid does not provide you a wikitext AST directly since it doesn't 
construct one. All wikitext information shows up indirectly as DOM 
annotations (either attributes or JSON information in attributes). As 
Scott showed, you can still do document (wikitext) manipulations using 
DOM libraries, CSS-style queries, or directly by walking the DOM. There 
are lots of ways you can edit mediawiki pages without knowing about 
wikitext and using the vast array of HTML libraries. That happens to be 
our tagline: we deal with wikitext so you don't have to.


But, you are right. It can indeed seem cumbersome if you want to 
directly manipulate wikitext without the DOM getting in between or 
having to deal with DOM libraries. But that is not the use case we 
target. There are a vastly greater number of libraries in all kinds of 
languages (and developers) that know about HTML and can render, handle, 
and manipulate HTML easily than know how to (or want to) manipulate 
wikitext programmatically. Kind of the difference between the wikitext 
editor and the visual editor. They each have their constituencies and roles.


All that said, as Scott noted, it is possible to develop a 
mwparserfromhell like layer on top of the Parsoid DOM annotations if you 
want a wikitext-centric view (as opposed to a DOM-centric view that most 
editing clients seem to want). But, since that is not a use case that we 
target, that hasn't been on our radar. If someone does want to take that 
on, and thinks it would be useful, we are happy to provide assistance. 
It should not be too difficult.


Does that help summarize this issue and clarify the differences and 
approaches of these two tools? I am on vacation :-)  so responses will 
be delayed.


Subbu.

[1] http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-24 Thread C. Scott Ananian
On Fri, Jul 24, 2015 at 12:34 AM, Ricordisamoa ricordisa...@openmailbox.org
 wrote:

 Parsoid's expressiveness seems to convey useless information, overlook
 important details, or duplicate them in different places.
 If I want to resize an image, am I supposed to change data-file-width
 and data-file-height? width and height? Or src?


These are great points, and reports from folks like you will help to
improve our documentation.  My goal for Parsoid's DOM[1] is that every bit
of information from the wikitext is represented exactly *once* in the
result.

In your example, `data-file-width` and `data-file-height` represent the
*unscaled* size of the *source* image.  Many image scaling operations want
to know this, so we include it in the DOM.  It is ignored when you convert
back to wikitext.

The `width` and `height` attributes are what you should modify if you want
to resize an image, just like you would do for any naive html editor.

The `src` attribute is again mostly ignored (sigh); the 'resource'
attribute specifies the url of the unscaled image.  Of course if 'resource'
is missing we'll try to make do with `src`; we really try hard to do
something reasonable with whatever we're given.
  --scott

[1] There is a tension between don't repeat yourself and the use of
Parsoid DOM for read views.  Certain attributes (like alt and title)
get duplicated by default by the PHP parser.  So far I think we've been
mostly successful in not letting this sort of thing infect the Parsoid DOM,
but there may be corner cases we accomodate for the sake of ease-of-use for
viewers.

-- 
(http://cscott.net)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-24 Thread Ricordisamoa

Thanks Marko. Replies inline

Il 24/07/2015 15:07, Marko Obrovac ha scritto:

On 24 July 2015 at 07:34, Ricordisamoa ricordisa...@openmailbox.org wrote:


Il 24/07/2015 06:35, C. Scott Ananian ha scritto:


Well, it's really just a different way of thinking about things.  Instead
of:
```


import mwparserfromhell

text = I has a template! {{foo|bar|baz|eggs=spam}} See it?
wikicode = mwparserfromhell.parse(text)
templates = wikicode.filter_templates()


```

you would write:
```
js Parsoid = require('parsoid');
js text = I has a template! {{foo|bar|baz|eggs=spam}} See it?;
js Parsoid.parse(text, { document: true }).then(function(res) {
templates =
res.out.querySelectorAll('[typeof~=mw:Transclusion]');
console.log(templates);
   }).done();
```

That said, it wouldn't be hard to clone the API of

http://mwparserfromhell.readthedocs.org/en/latest/api/mwparserfromhell.html


Parsoid's expressiveness seems to convey useless information, overlook
important details, or duplicate them in different places.
If I want to resize an image, am I supposed to change data-file-width
and data-file-height? width and height? Or src?
I think what I'm looking for is sort of an 'enhanced wikitext' rather than
'annotated HTML'.

  and that would probably be a great addition to the parsoid package API.

HTML is just a tree structured data representation.  Think of it as XML if
it makes you happier.  It just happens to come with well-defined semantics
and lots of manipulation libraries.

I don't know about edits tagged as VisualEditor.  That seems like that
should only be done by VE.


All edits made via visualeditoredit 
https://www.mediawiki.org/w/api.php?action=helpmodules=visualeditoredit
are tagged.

  I take it you would like an easy work flow to

fetch a page, make edits, and then write the new revision back?


Right.


RESTBase could help you there. With one API call, you can get the (stored)
latest HTML revision of a page in Parsoid format~[1], but without the need
to wait for Parsoid to parse it (if the latest revision is in RESTBase's
storage).


What if it isn't?


There is also section API support (you can get individual HTML
fragments of a page by ID, and send only those back for transformation into
wikitext~[2]). There is also support for page editing (aka saving), but
these endpoints have not yet been enabled for WMF wikis in production due
to security concerns.


Then I guess HTML would have to be converted into wikitext before 
saving? +1 API call




[1]
https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/page_html__title__get
[2]
https://en.wikipedia.org/api/rest_v1/?doc#!/Transforms/transform_sections_to_wikitext__title___revision__post

Cheers,
Marko




mwparserfromhell doesn't actually seem to have that functionality
It is actually pretty easy to do with Pywikibot.
But since Parsoid happens to work server-side, it makes sense to request
and send back the structured tree directly.

  , but it

would also be nice to facilitate that use case if we can.
--scott

​
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Thanks for your time.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l







___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-24 Thread Ricordisamoa

Il 24/07/2015 17:18, James Forrester ha scritto:

On 23 July 2015 at 22:34, Ricordisamoa ricordisa...@openmailbox.org wrote:


Il 24/07/2015 06:35, C. Scott Ananian ha scritto:


I don't know about edits tagged as VisualEditor.  That seems like that


should only be done by VE.
All edits made via visualeditoredit 
https://www.mediawiki.org/w/api.php?action=helpmodules=visualeditoredit
are tagged.


​Yes. That's because that is the *private* API for VisualEditor. It
absolutely should not ever be used by anyone else.​ It's not like any of
the 'real' APIs in MediaWiki – it is designed for exactly one use case
(VisualEditor), makes huge assumptions about the world and what is needed
(like tagging edits), and we make breaking changes all the time.
Unfortunately, the request to badge internal APIs got turned into flagging
it and similar APIs in MediaWiki as This module is internal or unstable.,
which isn't strong enough on just how bad an idea it is to use it. I would
extremely strongly suggest that you do not use it, ever.


Oops. https://test.wikipedia.org/w/index.php?title=Tablezaction=history



As Marko, Subbu and Scott point out, we have actual public APIs for this
kind of stuff, in the forms of RESTbase and Parsoid, and that's what you
should use.

Yours,



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-23 Thread Antoine Musso
Le 23/07/2015 08:15, Ricordisamoa a écrit :
 Are there any stable APIs for an application to get a parse tree in
 machine-readable format, manipulate it and send the result back without
 touching HTML?
 I'm sorry if this question doesn't make any sense.

You might want to explain what you are trying to do and which wall you
have hit when attempting to use Parsoid :-)

-- 
Antoine hashar Musso


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-23 Thread Ricordisamoa

Il 24/07/2015 06:35, C. Scott Ananian ha scritto:

Well, it's really just a different way of thinking about things.  Instead
of:
```

import mwparserfromhell
text = I has a template! {{foo|bar|baz|eggs=spam}} See it?
wikicode = mwparserfromhell.parse(text)
templates = wikicode.filter_templates()

```
you would write:
```
js Parsoid = require('parsoid');
js text = I has a template! {{foo|bar|baz|eggs=spam}} See it?;
js Parsoid.parse(text, { document: true }).then(function(res) {
   templates = res.out.querySelectorAll('[typeof~=mw:Transclusion]');
   console.log(templates);
  }).done();
```

That said, it wouldn't be hard to clone the API of
http://mwparserfromhell.readthedocs.org/en/latest/api/mwparserfromhell.html


Parsoid's expressiveness seems to convey useless information, overlook 
important details, or duplicate them in different places.
If I want to resize an image, am I supposed to change data-file-width 
and data-file-height? width and height? Or src?
I think what I'm looking for is sort of an 'enhanced wikitext' rather 
than 'annotated HTML'.



and that would probably be a great addition to the parsoid package API.

HTML is just a tree structured data representation.  Think of it as XML if
it makes you happier.  It just happens to come with well-defined semantics
and lots of manipulation libraries.

I don't know about edits tagged as VisualEditor.  That seems like that
should only be done by VE.


All edits made via visualeditoredit 
https://www.mediawiki.org/w/api.php?action=helpmodules=visualeditoredit 
are tagged.



I take it you would like an easy work flow to
fetch a page, make edits, and then write the new revision back?


Right.


  mwparserfromhell doesn't actually seem to have that functionality


It is actually pretty easy to do with Pywikibot.
But since Parsoid happens to work server-side, it makes sense to request 
and send back the structured tree directly.



, but it
would also be nice to facilitate that use case if we can.
   --scott

​
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Thanks for your time.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-23 Thread Ricordisamoa
The slides are interesting, but for now it seems VisualEditor-focused 
and not nearly as powerful as mwparserfromhell.

I don't care about presentation. I don't want HTML.
And I hate getting all edits tagged as VisualEditor.

Il 23/07/2015 22:02, C. Scott Ananian ha scritto:

HTML5+RDFa is a machine-readable format.  But I think what you are asking
for is either better documentation of the template-related stuff (did you
read through the slides in https://phabricator.wikimedia.org/T105175 ?) or
HTML template parameter support (https://phabricator.wikimedia.org/T52587)
which is in the codebase but not enabled by default in production.
  --scott
​
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-23 Thread C. Scott Ananian
Well, it's really just a different way of thinking about things.  Instead
of:
```
 import mwparserfromhell
 text = I has a template! {{foo|bar|baz|eggs=spam}} See it?
 wikicode = mwparserfromhell.parse(text)
 templates = wikicode.filter_templates()
```
you would write:
```
js Parsoid = require('parsoid');
js text = I has a template! {{foo|bar|baz|eggs=spam}} See it?;
js Parsoid.parse(text, { document: true }).then(function(res) {
  templates = res.out.querySelectorAll('[typeof~=mw:Transclusion]');
  console.log(templates);
 }).done();
```

That said, it wouldn't be hard to clone the API of
http://mwparserfromhell.readthedocs.org/en/latest/api/mwparserfromhell.html
and that would probably be a great addition to the parsoid package API.

HTML is just a tree structured data representation.  Think of it as XML if
it makes you happier.  It just happens to come with well-defined semantics
and lots of manipulation libraries.

I don't know about edits tagged as VisualEditor.  That seems like that
should only be done by VE.  I take it you would like an easy work flow to
fetch a page, make edits, and then write the new revision back?
 mwparserfromhell doesn't actually seem to have that functionality, but it
would also be nice to facilitate that use case if we can.
  --scott

​
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-23 Thread C. Scott Ananian
HTML5+RDFa is a machine-readable format.  But I think what you are asking
for is either better documentation of the template-related stuff (did you
read through the slides in https://phabricator.wikimedia.org/T105175 ?) or
HTML template parameter support (https://phabricator.wikimedia.org/T52587)
which is in the codebase but not enabled by default in production.
 --scott
​
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-23 Thread Ricordisamoa

Il 23/07/2015 15:28, Antoine Musso ha scritto:

Le 23/07/2015 08:15, Ricordisamoa a écrit :

Are there any stable APIs for an application to get a parse tree in
machine-readable format, manipulate it and send the result back without
touching HTML?
I'm sorry if this question doesn't make any sense.

You might want to explain what you are trying to do and which wall you
have hit when attempting to use Parsoid :-)



For example, adding a template transclusion as new parameter in another 
template.

XHTML5+RDFa is the wall :-(
Can't Parsoid's deserialization be caught at some point to get a 
higher-level structure like mwparserfromhell 
https://github.com/earwig/mwparserfromhell's?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l