[Wikitech-l] Re: Stream of recent changes diffs

2021-07-08 Thread Roy Smith
On that topic, I'll share some of my experience.

First, parsing wikitext is way more difficult than you probably imagine.  
People are often tempted to do a poor-man's job of it with regular expressions 
and the like.  Down that path lies madness.  Don't go there.

There's only two rational ways I know of to parse wikitext.

Parsoid is one.  It's complicated to get your head around, but it is the one 
true officially supported way.

The other is mwparserfromhell .  It 
has the advantage of being much simpler to use.  It has the disadvantage of not 
getting every possible edge case correct.  It also is only usable in Python, 
which is fine if you're using Python and a problem otherwise.

In either case, once you've got parsed versions of two revisions, you'll then 
be faced with the problem of diffing them.  That's going to be non-trivial.


> On Jul 8, 2021, at 7:01 PM, David Lynch  wrote:
> 
> The best I can say about this for your purposes is that using the parsoid 
> HTML would relieve you of having to parse wikitext to work out whether the 
> contents of a math tag were what changed. 路



___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-08 Thread David Lynch
I'm afraid that the visual differ isn't helpfully set up for this. Its
approach is to fetch the parsoid HTML for the revisions to compare, and
then generate the comparison output client-side. It's not terribly reusable
outside of the VisualEditor context -- all of the describing of changes
that it does leans heavily on interrogating VisualEditor's data model for
information about the things that changed.

The best I can say about this for your purposes is that using the parsoid
HTML *would* relieve you of having to parse wikitext to work out whether
the contents of a math tag were what changed. 路

If you do want to dig into this further, check out:
https://doc.wikimedia.org/VisualEditor/master/js/source/ve.init.mw.DiffLoader.html

~David

On Mon, Jul 5, 2021 at 11:48 AM Physikerwelt  wrote:

> Hi all,
>
> thank you for your feedback.
>
> @Roy this is a good point, I honestly did not think about it before.
> However, in the back of my mind, I remembered this problem had been
> solved before. There is a beta feature called visual diffs.
>
> If you enable this feature and navigate to
>
> https://en.wikipedia.org/w/index.php?title=User:RoySmith/sandbox=revision=1031413484=1031413445=visual
>
> you see the text "math formula changed" this is exactly what I was
> looking for. Unfortunately, I was not yet able to figure out if there
> is an API to get the visual diffs. I had expected it to be in
> RESTbase, but nothing there.
>
> If I can get to that API the problem would be simple enough to start
> with the implementation.
>
> All the best
> Moritz
>
> On Thu, Jul 1, 2021 at 3:39 PM Amir Sarabadani 
> wrote:
> >
> > Note on ORES as one of its maintainers:
> > ORES doesn't use recent changes for getting content and scoring edits.
> It hits the API.
> >
> > HTH
> >
> > On Thu, Jul 1, 2021 at 3:18 PM Robin Hood  wrote:
> >>
> >> I’m no expert, but I believe the only way to get a diff via the API is
> through https://www.mediawiki.org/wiki/API:Compare. I haven’t worked with
> it to any great degree, though, so I’m afraid I can’t help beyond pointing
> you in that direction.
> >>
> >>
> >>
> >> From: Physikerwelt 
> >> Sent: July 1, 2021 8:17 AM
> >> To: Wikimedia developers 
> >> Cc: andre.greiner-petter ; Aaron
> Halfaker 
> >> Subject: [Wikitech-l] Stream of recent changes diffs
> >>
> >>
> >>
> >> Dear all,
> >>
> >>
> >>
> >> we have developed a tool that is (in some cases) capable of checking if
> formulae in -tags in the context of a wikitext fragment are likely
> to be correct or not. We would like to test the tool on the recent changes.
> From
> >>
> >>
> >>
> >> https://www.mediawiki.org/wiki/API:Recent_changes_stream
> >>
> >>
> >>
> >> we can get the stream of recent changes. However, I did not find a way
> to get the diff (either in HTML or Wikitext) to figure out how the content
> was changed. The only option I see is to request the revision text manually
> additionally. This would be a few unnecessary requests since most of the
> changes do not change -tags. I assume that others, i.e., ORES
> >>
> >>
> >>
> >> https://www.mediawiki.org/wiki/ORES,
> >>
> >>
> >>
> >> compute the diffs anyhow and wonder if there is an easier way to get
> the diffs from the recent changes stream without additional requests.
> >>
> >>
> >>
> >> All the best
> >>
> >> Physikerwelt (Moritz Schubotz)
> >>
> >>
> >>
> >>
> >>
> >> ___
> >> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> >> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> >>
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
> >
> >
> >
> > --
> > Amir (he/him)
> >
> > ___
> > Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> > To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> >
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Two open developer positions for OpenRefine (paid contractors, part time, remote)

2021-07-08 Thread sandra . fauconnier
Hi everyone,

I'm very happy to announce:

OpenRefine [1] has two Junior Developer job openings (paid contractor 
positions; part-time, fully remote) for building Structured Data on Wikimedia 
Commons [2] functionalities.

Needless to say, we would love to receive applications from Wikimedians :-)

* Junior Developer - Wikimedia Development [3] (6 months, from September 2021 
till February 2022)
* Junior Developer - OpenRefine Development [4] (8 months, from November 2021 
till June 2022)

All the best!
Sandra (User:Spinster / User:SFauconnier)

[1] https://openrefine.org
[2] https://w.wiki/UR
[3] 
https://openrefine.org/blog/2021/07/07/Wikimedia-Commons-reconciliation-batch-developer.html
[4] https://openrefine.org/blog/2021/07/07/OpenRefine-SDC-developer.html
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/


[Wikitech-l] Re: [discovery] Upcoming Search Platform Office Hours—July 14th, 2021

2021-07-08 Thread Federico Leva (Nemo)

Il 07/07/21 22:47, Trey Jones ha scritto:

Google Meet link:https://meet.google.com/vyc-jvgq-dww

Join by phone in the US: +1 786-701-6904 PIN: 262 122 849#


I'm pretty sure it's possible to dial-in by phone from an EU number too, 
I did so in some WMF-hosted meeting only few months ago. I forgot how to 
find the number, despite the instructions:

https://support.google.com/meet/answer/9518557
https://support.google.com/meet/answer/9683440

I think I found it in a Calendar invitation back then, could someone 
maybe share it from there?


Thanks,
Federico
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/