[Wikitech-l] Re: Stream of recent changes diffs

2021-07-09 Thread Subramanya Sastry
Parsoid has a linter extension 
 which is well 
suited for something like this and was effectively developed with 
something like this in mind. It is currently enabled on *all* parses, 
but in the future, depending on how expensives lints become, we may find 
alternate ways of running this.


As it turns out, Parsoid's extension API exposes a lintHandler entry 
point 
 
for extensions to run their lints. So, of course, you can only support 
this in the context of Parsoid's parses. The other caveat is that we 
haven't fully thought through all the details, but if you are 
interested, this would be a good use case to explore this. But, see 
Cite's ref-tag's implementation for this. 
 
This implementation effectively calls Parsoid's default handlers on the 
wikitext encountered in the  tag, but, extensions which might deal 
with their own wikitext might do other processing here without calling 
the default handler.


This is my recommended approach rather than mess with change streams, 
diffs, etc.


Subbu.

On 7/1/21 7:16 AM, Physikerwelt wrote:

Dear all,

we have developed a tool that is (in some cases) capable of checking 
if formulae in -tags in the context of a wikitext fragment are 
likely to be correct or not. We would like to test the tool on the 
recent changes. From


https://www.mediawiki.org/wiki/API:Recent_changes_stream 



we can get the stream of recent changes. However, I did not find a way 
to get the diff (either in HTML or Wikitext) to figure out how the 
content was changed. The only option I see is to request the revision 
text manually additionally. This would be a few unnecessary requests 
since most of the changes do not change -tags. I assume that 
others, i.e., ORES


https://www.mediawiki.org/wiki/ORES ,

compute the diffs anyhow and wonder if there is an easier way to get 
the diffs from the recent changes stream without additional requests.


All the best
Physikerwelt (Moritz Schubotz)



___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-09 Thread Giuseppe Lavagetto
On Thu, Jul 1, 2021 at 3:10 PM Andrew Otto  wrote:

> This isn't helpful now, but your use case is relevant to something I hope
> to pursue in the future: comprehensive mediawiki change events, including
> content.  I don't have a great place yet for collecting these use cases, so
> I added it to Modern Event Platform parent ticket
>  so I don't forget. :)
>
>
I don't think this is the use-case at all. As someone else already pointed
out, diffs don't always give you the context and might be unparsable
wikitext. So what you can do is either:
1) Send always the full content of the page changed in the stream, along
with the diff. This is IMHO extremely wasteful, but it's also easy to
implement
2) find a way to analyze the edits  and emit specialized event tags that
define what has changed. This is the correct way to go forward, IMHO, but
it requires much more engineering time.

I don't think there is really a big value in adding the full content of the
page to every edit event. I'd rather suggest that people fetch the parsoid
HTML from the API, and ensure we do good edge-side caching.


Cheers,

Giuseppe
P.S. Please note that I'm only referring to streams offered to tools and in
general to the public internet. Internally to the production cluster the
use of content in events might (or might not) prove directly useful in some
cases.


-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-08 Thread Roy Smith
On that topic, I'll share some of my experience.

First, parsing wikitext is way more difficult than you probably imagine.  
People are often tempted to do a poor-man's job of it with regular expressions 
and the like.  Down that path lies madness.  Don't go there.

There's only two rational ways I know of to parse wikitext.

Parsoid is one.  It's complicated to get your head around, but it is the one 
true officially supported way.

The other is mwparserfromhell .  It 
has the advantage of being much simpler to use.  It has the disadvantage of not 
getting every possible edge case correct.  It also is only usable in Python, 
which is fine if you're using Python and a problem otherwise.

In either case, once you've got parsed versions of two revisions, you'll then 
be faced with the problem of diffing them.  That's going to be non-trivial.


> On Jul 8, 2021, at 7:01 PM, David Lynch  wrote:
> 
> The best I can say about this for your purposes is that using the parsoid 
> HTML would relieve you of having to parse wikitext to work out whether the 
> contents of a math tag were what changed. 路



___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-08 Thread David Lynch
I'm afraid that the visual differ isn't helpfully set up for this. Its
approach is to fetch the parsoid HTML for the revisions to compare, and
then generate the comparison output client-side. It's not terribly reusable
outside of the VisualEditor context -- all of the describing of changes
that it does leans heavily on interrogating VisualEditor's data model for
information about the things that changed.

The best I can say about this for your purposes is that using the parsoid
HTML *would* relieve you of having to parse wikitext to work out whether
the contents of a math tag were what changed. 路

If you do want to dig into this further, check out:
https://doc.wikimedia.org/VisualEditor/master/js/source/ve.init.mw.DiffLoader.html

~David

On Mon, Jul 5, 2021 at 11:48 AM Physikerwelt  wrote:

> Hi all,
>
> thank you for your feedback.
>
> @Roy this is a good point, I honestly did not think about it before.
> However, in the back of my mind, I remembered this problem had been
> solved before. There is a beta feature called visual diffs.
>
> If you enable this feature and navigate to
>
> https://en.wikipedia.org/w/index.php?title=User:RoySmith/sandbox=revision=1031413484=1031413445=visual
>
> you see the text "math formula changed" this is exactly what I was
> looking for. Unfortunately, I was not yet able to figure out if there
> is an API to get the visual diffs. I had expected it to be in
> RESTbase, but nothing there.
>
> If I can get to that API the problem would be simple enough to start
> with the implementation.
>
> All the best
> Moritz
>
> On Thu, Jul 1, 2021 at 3:39 PM Amir Sarabadani 
> wrote:
> >
> > Note on ORES as one of its maintainers:
> > ORES doesn't use recent changes for getting content and scoring edits.
> It hits the API.
> >
> > HTH
> >
> > On Thu, Jul 1, 2021 at 3:18 PM Robin Hood  wrote:
> >>
> >> I’m no expert, but I believe the only way to get a diff via the API is
> through https://www.mediawiki.org/wiki/API:Compare. I haven’t worked with
> it to any great degree, though, so I’m afraid I can’t help beyond pointing
> you in that direction.
> >>
> >>
> >>
> >> From: Physikerwelt 
> >> Sent: July 1, 2021 8:17 AM
> >> To: Wikimedia developers 
> >> Cc: andre.greiner-petter ; Aaron
> Halfaker 
> >> Subject: [Wikitech-l] Stream of recent changes diffs
> >>
> >>
> >>
> >> Dear all,
> >>
> >>
> >>
> >> we have developed a tool that is (in some cases) capable of checking if
> formulae in -tags in the context of a wikitext fragment are likely
> to be correct or not. We would like to test the tool on the recent changes.
> From
> >>
> >>
> >>
> >> https://www.mediawiki.org/wiki/API:Recent_changes_stream
> >>
> >>
> >>
> >> we can get the stream of recent changes. However, I did not find a way
> to get the diff (either in HTML or Wikitext) to figure out how the content
> was changed. The only option I see is to request the revision text manually
> additionally. This would be a few unnecessary requests since most of the
> changes do not change -tags. I assume that others, i.e., ORES
> >>
> >>
> >>
> >> https://www.mediawiki.org/wiki/ORES,
> >>
> >>
> >>
> >> compute the diffs anyhow and wonder if there is an easier way to get
> the diffs from the recent changes stream without additional requests.
> >>
> >>
> >>
> >> All the best
> >>
> >> Physikerwelt (Moritz Schubotz)
> >>
> >>
> >>
> >>
> >>
> >> ___
> >> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> >> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> >>
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
> >
> >
> >
> > --
> > Amir (he/him)
> >
> > ___
> > Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> > To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> >
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-05 Thread Physikerwelt
Hi all,

thank you for your feedback.

@Roy this is a good point, I honestly did not think about it before.
However, in the back of my mind, I remembered this problem had been
solved before. There is a beta feature called visual diffs.

If you enable this feature and navigate to
https://en.wikipedia.org/w/index.php?title=User:RoySmith/sandbox=revision=1031413484=1031413445=visual

you see the text "math formula changed" this is exactly what I was
looking for. Unfortunately, I was not yet able to figure out if there
is an API to get the visual diffs. I had expected it to be in
RESTbase, but nothing there.

If I can get to that API the problem would be simple enough to start
with the implementation.

All the best
Moritz

On Thu, Jul 1, 2021 at 3:39 PM Amir Sarabadani  wrote:
>
> Note on ORES as one of its maintainers:
> ORES doesn't use recent changes for getting content and scoring edits. It 
> hits the API.
>
> HTH
>
> On Thu, Jul 1, 2021 at 3:18 PM Robin Hood  wrote:
>>
>> I’m no expert, but I believe the only way to get a diff via the API is 
>> through https://www.mediawiki.org/wiki/API:Compare. I haven’t worked with it 
>> to any great degree, though, so I’m afraid I can’t help beyond pointing you 
>> in that direction.
>>
>>
>>
>> From: Physikerwelt 
>> Sent: July 1, 2021 8:17 AM
>> To: Wikimedia developers 
>> Cc: andre.greiner-petter ; Aaron Halfaker 
>> 
>> Subject: [Wikitech-l] Stream of recent changes diffs
>>
>>
>>
>> Dear all,
>>
>>
>>
>> we have developed a tool that is (in some cases) capable of checking if 
>> formulae in -tags in the context of a wikitext fragment are likely to 
>> be correct or not. We would like to test the tool on the recent changes. From
>>
>>
>>
>> https://www.mediawiki.org/wiki/API:Recent_changes_stream
>>
>>
>>
>> we can get the stream of recent changes. However, I did not find a way to 
>> get the diff (either in HTML or Wikitext) to figure out how the content was 
>> changed. The only option I see is to request the revision text manually 
>> additionally. This would be a few unnecessary requests since most of the 
>> changes do not change -tags. I assume that others, i.e., ORES
>>
>>
>>
>> https://www.mediawiki.org/wiki/ORES,
>>
>>
>>
>> compute the diffs anyhow and wonder if there is an easier way to get the 
>> diffs from the recent changes stream without additional requests.
>>
>>
>>
>> All the best
>>
>> Physikerwelt (Moritz Schubotz)
>>
>>
>>
>>
>>
>> ___
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
>
>
> --
> Amir (he/him)
>
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-01 Thread Amir Sarabadani
Note on ORES as one of its maintainers:
ORES doesn't use recent changes for getting content and scoring edits. It
hits the API.

HTH

On Thu, Jul 1, 2021 at 3:18 PM Robin Hood  wrote:

> I’m no expert, but I believe the only way to get a diff via the API is
> through https://www.mediawiki.org/wiki/API:Compare. I haven’t worked with
> it to any great degree, though, so I’m afraid I can’t help beyond pointing
> you in that direction.
>
>
>
> *From:* Physikerwelt 
> *Sent:* July 1, 2021 8:17 AM
> *To:* Wikimedia developers 
> *Cc:* andre.greiner-petter ; Aaron
> Halfaker 
> *Subject:* [Wikitech-l] Stream of recent changes diffs
>
>
>
> Dear all,
>
>
>
> we have developed a tool that is (in some cases) capable of checking if
> formulae in -tags in the context of a wikitext fragment are likely
> to be correct or not. We would like to test the tool on the recent changes.
> From
>
>
>
> https://www.mediawiki.org/wiki/API:Recent_changes_stream
>
>
>
> we can get the stream of recent changes. However, I did not find a way to
> get the diff (either in HTML or Wikitext) to figure out how the content was
> changed. The only option I see is to request the revision text manually
> additionally. This would be a few unnecessary requests since most of the
> changes do not change -tags. I assume that others, i.e., ORES
>
>
>
> https://www.mediawiki.org/wiki/ORES,
>
>
>
> compute the diffs anyhow and wonder if there is an easier way to get the
> diffs from the recent changes stream without additional requests.
>
>
>
> All the best
>
> Physikerwelt (Moritz Schubotz)
>
>
>
>
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



-- 
Amir (he/him)
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-01 Thread Roy Smith
I'm not sure diffs are going to be useful here.  For example, this diff 

 ostensibly introduces an error in the math markup, but due to the way I've 
formatted the wikisource, it's not obvious from the diff that this is within 
... tags.

You might end up having to do this using the database dumps 
, which is going to entail looking 
at a lot more data (extreme understatement) than the recent changes stream.

> On Jul 1, 2021, at 9:18 AM, Robin Hood  wrote:
> 
> I’m no expert, but I believe the only way to get a diff via the API is 
> throughhttps://www.mediawiki.org/wiki/API:Compare 
> . I haven’t worked with it to any 
> great degree, though, so I’m afraid I can’t help beyond pointing you in that 
> direction.
>  
> From: Physikerwelt mailto:w...@physikerwelt.de>> 
> Sent: July 1, 2021 8:17 AM
> To: Wikimedia developers  >
> Cc: andre.greiner-petter  >; Aaron Halfaker 
> mailto:ahalfa...@wikimedia.org>>
> Subject: [Wikitech-l] Stream of recent changes diffs
>  
> Dear all,
>  
> we have developed a tool that is (in some cases) capable of checking if 
> formulae in -tags in the context of a wikitext fragment are likely to 
> be correct or not. We would like to test the tool on the recent changes. From
>  
> https://www.mediawiki.org/wiki/API:Recent_changes_stream 
> 
>  
> we can get the stream of recent changes. However, I did not find a way to get 
> the diff (either in HTML or Wikitext) to figure out how the content was 
> changed. The only option I see is to request the revision text manually 
> additionally. This would be a few unnecessary requests since most of the 
> changes do not change -tags. I assume that others, i.e., ORES
>  
> https://www.mediawiki.org/wiki/ORES ,
>  
> compute the diffs anyhow and wonder if there is an easier way to get the 
> diffs from the recent changes stream without additional requests.
>  
> All the best
> Physikerwelt (Moritz Schubotz)
>  
>  
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org 
> 
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org 
> 
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ 
> 
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-01 Thread Robin Hood
I’m no expert, but I believe the only way to get a diff via the API is through 
https://www.mediawiki.org/wiki/API:Compare. I haven’t worked with it to any 
great degree, though, so I’m afraid I can’t help beyond pointing you in that 
direction.

From: Physikerwelt 
Sent: July 1, 2021 8:17 AM
To: Wikimedia developers 
Cc: andre.greiner-petter ; Aaron Halfaker 

Subject: [Wikitech-l] Stream of recent changes diffs

Dear all,

we have developed a tool that is (in some cases) capable of checking if 
formulae in -tags in the context of a wikitext fragment are likely to be 
correct or not. We would like to test the tool on the recent changes. From

https://www.mediawiki.org/wiki/API:Recent_changes_stream

we can get the stream of recent changes. However, I did not find a way to get 
the diff (either in HTML or Wikitext) to figure out how the content was 
changed. The only option I see is to request the revision text manually 
additionally. This would be a few unnecessary requests since most of the 
changes do not change -tags. I assume that others, i.e., ORES

https://www.mediawiki.org/wiki/ORES,

compute the diffs anyhow and wonder if there is an easier way to get the diffs 
from the recent changes stream without additional requests.

All the best
Physikerwelt (Moritz Schubotz)


___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-01 Thread Andrew Otto
This isn't helpful now, but your use case is relevant to something I hope
to pursue in the future: comprehensive mediawiki change events, including
content.  I don't have a great place yet for collecting these use cases, so
I added it to Modern Event Platform parent ticket
 so I don't forget. :)



On Thu, Jul 1, 2021 at 8:17 AM Physikerwelt  wrote:

> Dear all,
>
> we have developed a tool that is (in some cases) capable of checking if
> formulae in -tags in the context of a wikitext fragment are likely
> to be correct or not. We would like to test the tool on the recent changes.
> From
>
> https://www.mediawiki.org/wiki/API:Recent_changes_stream
>
> we can get the stream of recent changes. However, I did not find a way to
> get the diff (either in HTML or Wikitext) to figure out how the content was
> changed. The only option I see is to request the revision text manually
> additionally. This would be a few unnecessary requests since most of the
> changes do not change -tags. I assume that others, i.e., ORES
>
> https://www.mediawiki.org/wiki/ORES,
>
> compute the diffs anyhow and wonder if there is an easier way to get the
> diffs from the recent changes stream without additional requests.
>
> All the best
> Physikerwelt (Moritz Schubotz)
>
>
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/