[Wikitech-l] Small change to mediawiki.page_change.v1 event stream schema

2024-07-16 Thread Andrew Otto
Hello,

tl;dr

The performer field on events in the mediawiki.page_change.v1
<https://stream.wikimedia.org/?doc#/streams/get_v2_stream_mediawiki_page_change_v1>
stream is no longer required.  It will be omitted in certain cases.


If you have any code that expects the performer field to be present, please
adapt accordingly.

---

Last fall, the Wikimedia Data Engineering team fixed a bug
<https://phabricator.wikimedia.org/T342487> in the mediawiki.page_change.v1
(and other) streams. Since that fix, the performer field will be omitted in
events for which a wiki admin is doing a RevisionDelete
<https://www.mediawiki.org/wiki/Help:RevisionDelete> with suppression.

This bug fix was not properly applied to the mediawiki/page/change
<https://schema.wikimedia.org/#!//primary/jsonschema/mediawiki/page/change>
event schema, causing validation errors for these cases
<https://phabricator.wikimedia.org/T367923>.

Since mid November 2023, mediawiki.page_change.v1 events caused by admin
using ‘revision suppression’ have been dropped.

We did not notice these errors, as the mediawiki.page_change.v1 stream only
contains RevisionDelete (AKA “revision visibility”) related changes if
those changes are made to the current revision of a page.  Using
RevisionDelete on the current revision of a page is usually rare.  This
recently began happening more often on certain wikis, so we noticed.

To work around the validation errors, we have made the performer field
optional in the mediawiki/page/change
<https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/1047549> schema.

If you have any questions or issues, please create a Phabricator
<https://phabricator.wikimedia.org/> ticket and tag Data-Engineering.

Thank you,

Andrew Otto

Wikimedia Data Engineering
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Scheduling a patch for a MediaWiki backport window just got easier

2024-06-05 Thread Andrew Otto
٩( ๑╹ ꇴ╹)۶

On Wed, Jun 5, 2024 at 11:27 AM Dreamy Jazz 
wrote:

> Hi,
>
> I already used this feature before you sent the email and found it very
> useful. I especially like the link from gerrit.
>
> Thanks,
>
> Dreamy Jazz
>
> English Wikipedia CheckUser, Admin and Arb Clerk.
> Software Engineer working at the Wikimedia Foundation
>
>
> On Wed, 5 Jun 2024 at 16:14, Bryan Davis  wrote:
>
>> Last week I saw this in a WMF internal chat: "Now we just need to make
>> it a w easier to edit the Deployments calendar.  (or…is there an
>> easier way than squinting at wikitext tables and copy/pasting
>> templates?)"
>>
>> I'm sure that a number of y'all can relate to this. The
>>  page is pretty nice
>> to read as a human and not too bad for bots. Editing it though can be
>> a bit painful as that pull quote implies. I decided I would try to do
>> something about that. The result is a tool at
>> .
>>
>> The new "Wikimedia Deployment Scheduler" tool tries to make adding
>> your Gerrit change to a backport window as simple as possible. All it
>> needs from you is the Gerrit change number, your IRC nick, and the
>> backport window you want to use. Using some python magic, including
>> the always useful mwparserfromhell library, it finds the right place
>> in [[wikitech:Deployments]] to insert your request for deployment.
>>
>> To make things even easier, Gerrit will now show you a "Schedule
>> backport of this change" link underneath the commit message for
>> changes that are eligible for a backport deployment. What changes are
>> those? Any open, unmerged change on the master branch of
>> operations/mediawiki-config.git or changes on "wmf/*" branches in
>> mediawiki/core.git, mediawiki/extensions/*.git, or
>> mediawiki/skins/*.git.
>>
>> Thanks to Antoine Musso and Tyler Cipriani for their help and
>> encouragement in building this tool. If you are interested in seeing
>> what the Gerrit integration needed, check out
>> <
>> https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/gerrit/+/7ea913b
>> ^!/>
>>
>> Bryan
>> --
>> Bryan DavisWikimedia Foundation
>> Principal Software Engineer   Boise, ID USA
>> [[m:User:BDavis_(WMF)]]  irc: bd808
>> ___
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Enabling canary events for all MediaWiki event streams

2023-12-11 Thread Andrew Otto
Hi everyone, I just enabled canary events for the mentioned streams.

Please comment here or on the task if you encounter any issues.

Thank you!
-Andrew Otto & the WMF Data Engineering team
<https://wikitech.wikimedia.org/wiki/Data_Engineering>


On Tue, Nov 7, 2023 at 1:57 PM Andrew Otto  wrote:

> tl;dr
>
> Ignore this email if you do not use MediaWiki event streams.
>
> On Monday December 11 2023, all MediaWiki related event streams will have 
> artificial
> canary events
> <https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams#Canary_Events>
> injected into them.  If you use any of these streams, you should discard
> thesee canary events.
>
> Add code to your consumers that discards events where meta.domain ==
> "canary".
> Canary Events
>
> At WMF, we use artificial 'canary' AKA 'heartbeat' events
> <https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#canary_events_enabled>
> to differentiate between a broken event stream and an empty event stream.
> Canary events should be produced at least once an hour.  If there are no
> events in a stream for an hour, then something is likely broken with that
> stream.
>
> These artificial canary events can be identified by the fact that their
> meta.domain field is set to "canary".  If you use any of the streams
> listed below, you will need to add code that discards any events where 
> meta.domain
> == "canary".
>
> Back in 2020, we began producing canary events into all new streams, but
> we never got around to enabling these for streams that already existed.  We
> needed to ensure that all consumers of these streams filtered out the
> canary events.  We're just finally getting around to enabling canary events
> for all streams.
>
> We will enable canary event production
> <https://phabricator.wikimedia.org/T266798> for the following streams on
> Monday, December 11th, 2023:
>
> - mediawiki.recentchange
>
> - mediawiki.page-create
>
> - mediawiki.page-delete
>
> - mediawiki.page-links-change
>
> - mediawiki.page-move
>
> - mediawiki.page-properties-change
>
> - mediawiki.page-restrictions-change
>
> - mediawiki.page-suppress
>
> - mediawiki.page-undelete
>
> - mediawiki.revision-create
>
> - mediawiki.revision-visibility-change
>
> - mediawiki.user-blocks-change
>
> - mediawiki.centralnotice.campaign-change
>
> - mediawiki.centralnotice.campaign-create
>
> - mediawiki.centralnotice.campaign-delete
>
> If you consume any of these streams, either external to WMF networks using
> EventStreams, or internally using Kafka, please ensure that your consumer
> logic discards events where meta.domain == "canary" before this date.
> (Note that not all of these streams are exposed publicly at
> stream.wikimedia.org <https://stream.wikimedia.org/?doc#/streams>.)
>
> Thank you,
>
> -Andrew Otto & the WMF Data Engineering team
> <https://wikitech.wikimedia.org/wiki/Data_Engineering>
>
> References
>
> - T266798 - Enable canary events for all MediaWiki streams
> <https://phabricator.wikimedia.org/T266798>
>
> - T251609 - Automate ingestion and refinement into Hive of event data
> from Kafka using stream configs and canary/heartbeat events
> <https://phabricator.wikimedia.org/T251609>
>
>
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Fwd: Enabling canary events for all MediaWiki event streams

2023-11-09 Thread Andrew Otto
> What is the content of these canary events?
Good question! I just updated EventStreams docs here
<https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams#Canary_Events>
with
an answer:

The content of most canary event fields are copied directly from the first
example event in the event's schema. E.g. mediawiki/recentchange example
<https://github.com/wikimedia/schemas-event-primary/blob/master/jsonschema/mediawiki/recentchange/1.0.1.yaml#L159>
, mediawiki/revision/create example
<https://github.com/wikimedia/schemas-event-primary/blob/master/jsonschema/mediawiki/revision/create/2.0.0.yaml#L288>.
These examples can also be seen in the OpenAPI docs for the streams
<https://stream.wikimedia.org/?doc#/streams>, e.g. mediawiki.page-move
example value
<https://stream.wikimedia.org/?doc#/streams/get_v2_stream_mediawiki_page_move>.
The code that creates canary events can be found here
<https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/%2B/refs/heads/master/eventutilities/src/main/java/org/wikimedia/eventutilities/monitoring/CanaryEventProducer.java#118.>
(as
of 2023-11).


On Thu, Nov 9, 2023 at 11:14 AM Siddharth VP  wrote:

> Hi Andrew,
>
> What is the content of these canary events? Do they have a data section or
> is it just the metadata? If I already have filtering to process only
> interesting events (say data.wiki === 'enwiki'), do I still need to add
> additional filtering to discard canary events?
>
> On Thu, 9 Nov 2023 at 21:23, Andrew Otto  wrote:
>
>> tl;dr
>>
>> Ignore this email if you do not use MediaWiki event streams.
>>
>> On Monday December 11 2023, all MediaWiki related event streams will have 
>> artificial
>> canary events
>> <https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams#Canary_Events>
>> injected into them.  If you use any of these streams, you should discard
>> these canary events.
>>
>> *Add code to your consumers that discards events where* meta.domain ==
>> "canary".
>> Canary Events
>>
>> At WMF, we use artificial 'canary' AKA 'heartbeat' events
>> <https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#canary_events_enabled>
>> to differentiate between a broken event stream and an empty event stream.
>> Canary events should be produced at least once an hour.  If there are no
>> events in a stream for an hour, then something is likely broken with that
>> stream.
>>
>> These artificial canary events can be identified by the fact that their
>> meta.domain field is set to "canary".  If you use any of the streams
>> listed below, you will need to add code that discards any events where 
>> meta.domain
>> == "canary".
>>
>> Back in 2020, we began producing canary events into all new streams, but
>> we never got around to enabling these for streams that already existed.  We
>> needed to ensure that all consumers of these streams filtered out the
>> canary events.  We're just finally getting around to enabling canary events
>> for all streams.
>>
>> We will enable canary event production
>> <https://phabricator.wikimedia.org/T266798> for the following streams on
>> Monday, December 11th, 2023:
>>
>> - mediawiki.recentchange
>>
>> - mediawiki.page-create
>>
>> - mediawiki.page-delete
>>
>> - mediawiki.page-links-change
>>
>> - mediawiki.page-move
>>
>> - mediawiki.page-properties-change
>>
>> - mediawiki.page-restrictions-change
>>
>> - mediawiki.page-suppress
>>
>> - mediawiki.page-undelete
>>
>> - mediawiki.revision-create
>>
>> - mediawiki.revision-visibility-change
>>
>> - mediawiki.user-blocks-change
>>
>> - mediawiki.centralnotice.campaign-change
>>
>> - mediawiki.centralnotice.campaign-create
>>
>> - mediawiki.centralnotice.campaign-delete
>>
>> If you consume any of these streams, either external to WMF networks
>> using EventStreams, or internally using Kafka, please ensure that your
>> consumer logic discards events where meta.domain == "canary" before this
>> date. (Note that not all of these streams are exposed publicly at
>> stream.wikimedia.org <https://stream.wikimedia.org/?doc#/streams>.)
>>
>> Thank you,
>>
>> -Andrew Otto & the WMF Data Engineering team
>> <https://wikitech.wikimedia.org/wiki/Data_Engineering>
>>
>> References
>>
>> - T266798 - Enable canary events for all MediaWiki streams
>> <https://phabrica

[Wikitech-l] Fwd: Enabling canary events for all MediaWiki event streams

2023-11-09 Thread Andrew Otto
tl;dr

Ignore this email if you do not use MediaWiki event streams.

On Monday December 11 2023, all MediaWiki related event streams will
have artificial
canary events
<https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams#Canary_Events>
injected into them.  If you use any of these streams, you should discard
these canary events.

*Add code to your consumers that discards events where* meta.domain ==
"canary".
Canary Events

At WMF, we use artificial 'canary' AKA 'heartbeat' events
<https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#canary_events_enabled>
to differentiate between a broken event stream and an empty event stream.
Canary events should be produced at least once an hour.  If there are no
events in a stream for an hour, then something is likely broken with that
stream.

These artificial canary events can be identified by the fact that their
meta.domain field is set to "canary".  If you use any of the streams listed
below, you will need to add code that discards any events where meta.domain
== "canary".

Back in 2020, we began producing canary events into all new streams, but we
never got around to enabling these for streams that already existed.  We
needed to ensure that all consumers of these streams filtered out the
canary events.  We're just finally getting around to enabling canary events
for all streams.

We will enable canary event production
<https://phabricator.wikimedia.org/T266798> for the following streams on
Monday, December 11th, 2023:

- mediawiki.recentchange

- mediawiki.page-create

- mediawiki.page-delete

- mediawiki.page-links-change

- mediawiki.page-move

- mediawiki.page-properties-change

- mediawiki.page-restrictions-change

- mediawiki.page-suppress

- mediawiki.page-undelete

- mediawiki.revision-create

- mediawiki.revision-visibility-change

- mediawiki.user-blocks-change

- mediawiki.centralnotice.campaign-change

- mediawiki.centralnotice.campaign-create

- mediawiki.centralnotice.campaign-delete

If you consume any of these streams, either external to WMF networks using
EventStreams, or internally using Kafka, please ensure that your consumer
logic discards events where meta.domain == "canary" before this date. (Note
that not all of these streams are exposed publicly at stream.wikimedia.org
<https://stream.wikimedia.org/?doc#/streams>.)

Thank you,

-Andrew Otto & the WMF Data Engineering team
<https://wikitech.wikimedia.org/wiki/Data_Engineering>

References

- T266798 - Enable canary events for all MediaWiki streams
<https://phabricator.wikimedia.org/T266798>

- T251609 - Automate ingestion and refinement into Hive of event data from
Kafka using stream configs and canary/heartbeat events
<https://phabricator.wikimedia.org/T251609>
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Python requests broken by urllib3 version 2.x

2023-05-11 Thread Andrew Otto
> But now I'm curious about how conda enables running docker safely in
production. :)

It doesn't enable running docker, we just use packed conda envs instead of
docker images.  This only really works because we build and run the conda
envs on the same OS.   See conda-pack <https://conda.github.io/conda-pack/>.

We'd prefer to use docker if we could.

On Thu, May 11, 2023 at 3:19 AM Slavina Stefanova 
wrote:

> > I'm working on an "Essential Tools for Managing Python Development
>> Environments
>> <https://docs.google.com/document/d/1pl6QCDWGebGjRrHixgQNES8qRf-8PCzwLSsFj7jI1HY/edit#heading=h.i5ou2ywgmb09>"
>> tutorial
>> Awesome!   Did you consider conda envs?  FWIW, we rely on conda envs
>> <https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils#building-project-conda-distribution-environments>
>>  in
>> the Data Engineering world to work around the lack of ability to
>> securely run docker images in production.
>> We didn't try pyenv, mainly because conda gets us more than just python :)
>
>
> I tried conda a few years ago, but for my own use cases, I found it mostly
> got in my way. I know it's big with data and science folks though, so I
> will add it to an "other tools to be aware of" section. But now I'm curious
> about how conda enables running docker safely in production. :)
>
> For anyone interested in the (bewildering complexity of the current)
> Python packaging ecosystem, I can recommend this blog post by
> PyPA member and PSF fellow Pradyum Gedam:
> https://pradyunsg.me/blog/2023/01/21/thoughts-on-python-packaging/
> <https://pradyunsg.me/blog/2023/01/21/thoughts-on-python-packaging/>
>
>
> --
> Slavina Stefanova (she/her)
> Software Engineer - Technical Engagement
>
> Wikimedia Foundation
>
>
> On Mon, May 8, 2023 at 3:03 PM Andrew Otto  wrote:
>
>> > For Java, we run an instance of Archiva: https://archiva.wikimedia.org/
>> > It's not a perfect approach but I think we can and should move in that
>> direction with all our other ecosystems
>>
>> Gitlab package registries may help us here!
>>
>>
>>
>> On Mon, May 8, 2023 at 8:59 AM Andrew Otto  wrote:
>>
>>> > Tangent: is it worthwhile to establish a consensus for best practices
>>> with package pinning and package management for Python projects in the
>>> Wikimedia ecosystem?
>>> Yes! That would be awesome. I have spent a lot of time floundering in
>>> this area trying to make decisions; it'd be nice if we had a good guideline
>>> established.
>>>
>>> > I'm working on an "Essential Tools for Managing Python Development
>>> Environments
>>> <https://docs.google.com/document/d/1pl6QCDWGebGjRrHixgQNES8qRf-8PCzwLSsFj7jI1HY/edit#heading=h.i5ou2ywgmb09>"
>>> tutorial
>>> Awesome!   Did you consider conda envs?  FWIW, we rely on conda envs
>>> <https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils#building-project-conda-distribution-environments>
>>> in the Data Engineering world to work around the lack of ability to
>>> securely run docker images in production.  We didn't try pyenv, mainly
>>> because conda gets us more than just python :)
>>>
>>> > Poetry is a modern lockfile-based packaging and dependency management
>>> tool worth looking into
>>> Poetry was nice, but I found that it wasn't as comprehensive (yet?) as
>>> ye old setuptools based stuff.  I can't quite recall what, but I think it
>>> was around support for datafiles and scripts? But, this was 1.5 years ago
>>> so maybe things are better now.
>>>
>>> Thank you!
>>>
>>> On Fri, May 5, 2023 at 11:32 AM Slavina Stefanova <
>>> sstefan...@wikimedia.org> wrote:
>>>
>>>> Tangent: is it worthwhile to establish a consensus for best practices
>>>>> with package pinning and package management for Python projects in the
>>>>> Wikimedia ecosystem? When I last worked on a python project (
>>>>> https://wikitech.wikimedia.org/wiki/Add_Link) I found it confusing
>>>>> that we have so many different tools and approaches for doing these 
>>>>> things,
>>>>> and seems like we'd benefit from having a standard, supported way. (Or
>>>>> maybe that already exists and I haven't found it?)
>>>>
>>>>
>>>> I'm working on an "Essential Tools for Managing Python Development
>>>> Environments
>>>> <https://docs.google.com/document/d/1pl6QCDWGebGjRrHixgQNES8qRf-8PCzwLS

[Wikitech-l] Re: Python requests broken by urllib3 version 2.x

2023-05-08 Thread Andrew Otto
> For Java, we run an instance of Archiva: https://archiva.wikimedia.org/
> It's not a perfect approach but I think we can and should move in that
direction with all our other ecosystems

Gitlab package registries may help us here!



On Mon, May 8, 2023 at 8:59 AM Andrew Otto  wrote:

> > Tangent: is it worthwhile to establish a consensus for best practices
> with package pinning and package management for Python projects in the
> Wikimedia ecosystem?
> Yes! That would be awesome. I have spent a lot of time floundering in this
> area trying to make decisions; it'd be nice if we had a good guideline
> established.
>
> > I'm working on an "Essential Tools for Managing Python Development
> Environments
> <https://docs.google.com/document/d/1pl6QCDWGebGjRrHixgQNES8qRf-8PCzwLSsFj7jI1HY/edit#heading=h.i5ou2ywgmb09>"
> tutorial
> Awesome!   Did you consider conda envs?  FWIW, we rely on conda envs
> <https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils#building-project-conda-distribution-environments>
> in the Data Engineering world to work around the lack of ability to
> securely run docker images in production.  We didn't try pyenv, mainly
> because conda gets us more than just python :)
>
> > Poetry is a modern lockfile-based packaging and dependency management
> tool worth looking into
> Poetry was nice, but I found that it wasn't as comprehensive (yet?) as ye
> old setuptools based stuff.  I can't quite recall what, but I think it was
> around support for datafiles and scripts? But, this was 1.5 years ago so
> maybe things are better now.
>
> Thank you!
>
> On Fri, May 5, 2023 at 11:32 AM Slavina Stefanova <
> sstefan...@wikimedia.org> wrote:
>
>> Tangent: is it worthwhile to establish a consensus for best practices
>>> with package pinning and package management for Python projects in the
>>> Wikimedia ecosystem? When I last worked on a python project (
>>> https://wikitech.wikimedia.org/wiki/Add_Link) I found it confusing that
>>> we have so many different tools and approaches for doing these things, and
>>> seems like we'd benefit from having a standard, supported way. (Or maybe
>>> that already exists and I haven't found it?)
>>
>>
>> I'm working on an "Essential Tools for Managing Python Development
>> Environments
>> <https://docs.google.com/document/d/1pl6QCDWGebGjRrHixgQNES8qRf-8PCzwLSsFj7jI1HY/edit#heading=h.i5ou2ywgmb09>"
>> tutorial that will be published to the wikis when ready. Maybe that could
>> be expanded upon? In my experience though, it can be hard to get people to
>> agree on following a standard, especially when there are so many different
>> options and many folks already have their favorite tools and workflows. But
>> it would be nice to have a set of recommendations to reduce the cognitive
>> load.
>>
>> --
>> Slavina Stefanova (she/her)
>> Software Engineer - Technical Engagement
>>
>> Wikimedia Foundation
>>
>>
>> On Fri, May 5, 2023 at 5:18 PM Kosta Harlan 
>> wrote:
>>
>>> Tangent: is it worthwhile to establish a consensus for best practices
>>> with package pinning and package management for Python projects in the
>>> Wikimedia ecosystem? When I last worked on a python project (
>>> https://wikitech.wikimedia.org/wiki/Add_Link) I found it confusing that
>>> we have so many different tools and approaches for doing these things, and
>>> seems like we'd benefit from having a standard, supported way. (Or maybe
>>> that already exists and I haven't found it?)
>>>
>>> Kosta
>>>
>>> On 5. May 2023, at 13:51, Slavina Stefanova 
>>> wrote:
>>>
>>> Poetry is a modern lockfile-based packaging and dependency management
>>> tool worth looking into. It also supports exporting dependencies into a
>>> requirements.txt file, should you need that (nice if you want to
>>> containerize an app without bloating the image with Poetry, for instance).
>>>
>>> https://python-poetry.org/  <https://python-poetry.org/>
>>>
>>> --
>>> Slavina Stefanova (she/her)
>>> Software Engineer - Technical Engagement
>>>
>>> Wikimedia Foundation
>>>
>>>
>>> On Fri, May 5, 2023 at 1:38 PM Sebastian Berlin <
>>> sebastian.ber...@wikimedia.se> wrote:
>>>
>>>> A word of warning: using `pip freeze` to populate requirements.txt can
>>>> result in a hard to read (very long) file and other issues:
>>>> https://medium.com/@tomagee/pip-freeze-requirements-txt-considered

[Wikitech-l] Re: Python requests broken by urllib3 version 2.x

2023-05-08 Thread Andrew Otto
> Tangent: is it worthwhile to establish a consensus for best practices
with package pinning and package management for Python projects in the
Wikimedia ecosystem?
Yes! That would be awesome. I have spent a lot of time floundering in this
area trying to make decisions; it'd be nice if we had a good guideline
established.

> I'm working on an "Essential Tools for Managing Python Development
Environments
"
tutorial
Awesome!   Did you consider conda envs?  FWIW, we rely on conda envs

in the Data Engineering world to work around the lack of ability to
securely run docker images in production.  We didn't try pyenv, mainly
because conda gets us more than just python :)

> Poetry is a modern lockfile-based packaging and dependency management
tool worth looking into
Poetry was nice, but I found that it wasn't as comprehensive (yet?) as ye
old setuptools based stuff.  I can't quite recall what, but I think it was
around support for datafiles and scripts? But, this was 1.5 years ago so
maybe things are better now.

Thank you!

On Fri, May 5, 2023 at 11:32 AM Slavina Stefanova 
wrote:

> Tangent: is it worthwhile to establish a consensus for best practices with
>> package pinning and package management for Python projects in the Wikimedia
>> ecosystem? When I last worked on a python project (
>> https://wikitech.wikimedia.org/wiki/Add_Link) I found it confusing that
>> we have so many different tools and approaches for doing these things, and
>> seems like we'd benefit from having a standard, supported way. (Or maybe
>> that already exists and I haven't found it?)
>
>
> I'm working on an "Essential Tools for Managing Python Development
> Environments
> "
> tutorial that will be published to the wikis when ready. Maybe that could
> be expanded upon? In my experience though, it can be hard to get people to
> agree on following a standard, especially when there are so many different
> options and many folks already have their favorite tools and workflows. But
> it would be nice to have a set of recommendations to reduce the cognitive
> load.
>
> --
> Slavina Stefanova (she/her)
> Software Engineer - Technical Engagement
>
> Wikimedia Foundation
>
>
> On Fri, May 5, 2023 at 5:18 PM Kosta Harlan  wrote:
>
>> Tangent: is it worthwhile to establish a consensus for best practices
>> with package pinning and package management for Python projects in the
>> Wikimedia ecosystem? When I last worked on a python project (
>> https://wikitech.wikimedia.org/wiki/Add_Link) I found it confusing that
>> we have so many different tools and approaches for doing these things, and
>> seems like we'd benefit from having a standard, supported way. (Or maybe
>> that already exists and I haven't found it?)
>>
>> Kosta
>>
>> On 5. May 2023, at 13:51, Slavina Stefanova 
>> wrote:
>>
>> Poetry is a modern lockfile-based packaging and dependency management
>> tool worth looking into. It also supports exporting dependencies into a
>> requirements.txt file, should you need that (nice if you want to
>> containerize an app without bloating the image with Poetry, for instance).
>>
>> https://python-poetry.org/  
>>
>> --
>> Slavina Stefanova (she/her)
>> Software Engineer - Technical Engagement
>>
>> Wikimedia Foundation
>>
>>
>> On Fri, May 5, 2023 at 1:38 PM Sebastian Berlin <
>> sebastian.ber...@wikimedia.se> wrote:
>>
>>> A word of warning: using `pip freeze` to populate requirements.txt can
>>> result in a hard to read (very long) file and other issues:
>>> https://medium.com/@tomagee/pip-freeze-requirements-txt-considered-harmful-f0bce66cf895
>>> .
>>>
>>> *Sebastian Berlin*
>>> Utvecklare/*Developer*
>>> Wikimedia Sverige (WMSE)
>>>
>>> E-post/*E-Mail*: sebastian.ber...@wikimedia.se
>>> Telefon/*Phone*: (+46) 0707 - 92 03 84
>>>
>>>
>>> On Fri, 5 May 2023 at 13:17, Amir Sarabadani 
>>> wrote:
>>>
 You can also create an empty virtual env, install all requirements and
 then do
 pip freeze > requirements.txt

 That should take care of pinning

 Am Fr., 5. Mai 2023 um 13:11 Uhr schrieb Lucas Werkmeister <
 lucas.werkmeis...@wikimedia.de>:

> For the general case of Python projects, I’d argue that a better
> solution is to adopt the lockfile pattern (package-lock.json,
> composer.lock, Cargo.lock, etc.) and pin *all* dependencies, and only
> update them when the new versions have been tested and are known to work.
> pip-tools  can help with that,
> for example (requirements.in specifies “loose” dependencies;
> pip-compile creates a pinned requirements.txt; pip-sync installs it; 
> 

[Wikitech-l] Re: major upgrade of PageProperties extension (a proof of concept for the use of SLOTS)

2023-01-12 Thread Andrew Otto
> This also reminds me, that like for namespaces and content handlers, we
should probably keep a list of known rvslot names, to avoid potential
conflicts.

Would be nice if extensions could register with mediawiki what rvslot names
they provide/manage. :)

On Thu, Jan 12, 2023 at 10:18 AM Derk-Jan Hartman <
d.j.hartman+wmf...@gmail.com> wrote:

> Oh Thomas. I notice that you use "json" as the content model, but if I'm
> not mistaken, the content model is supposed to describe "what" you are
> storing, not how you are storing it. For instance: commons metadata uses
> "wikibase-mediainfo" as its content model.
>
> This also reminds me, that like for namespaces and content handlers, we
> should probably keep a list of known rvslot names, to avoid potential
> conflicts.
>
> https://www.mediawiki.org/wiki/Extension_default_namespaces
> https://www.mediawiki.org/wiki/Content_handlers
>
> So probably just somewhere here?
> https://www.mediawiki.org/wiki/Multi-Content_Revisions
>
>
> DJ
>
>
> On Thu, Jan 12, 2023 at 4:00 PM Derk-Jan Hartman <
> d.j.hartman+wmf...@gmail.com> wrote:
>
>> This is looking really nice !
>>
>> For those wondering, here is an example of the raw api output for this
>> extra slot when requested:
>>
>> https://wikienterprise.org/w/api.php?action=query=json=revisions=Main%20Page=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser%7Ccontent%7Ccontentmodel%7Cslotsize=*
>>
>> This could be very useful ! Glad to finally see slots getting some more
>> usage.
>>
>> DJ
>>
>> On Thu, Jan 12, 2023 at 10:23 AM  wrote:
>>
>>>
>>> Hello, I've released a major upgrade of PageProperties extension
>>> and completely rewritten the extension page
>>>
>>> https://www.mediawiki.org/wiki/Extension:PageProperties
>>>
>>> most of the recent work is intended for the use in conjunction with
>>> Semantic MediaWiki
>>> (it now offers a complete GUI based on OOUI to interactively create/edit
>>> properties, create/edit forms and even
>>> to perform a semantic import from csv fields to local properties
>>> registered in the Wiki).
>>>
>>> However at the same time the extension can be considered a proof of
>>> concept for the use of
>>> Slots, since all the data-structure is completely based on SLOTS
>>> (properties associated to a page are recorded within a slot with JSON
>>> content model)
>>> and they can also be navigated through the interface.
>>>
>>> The extension makes also significant use of the OOUI library, and I want
>>> to thank the authors for their wonderful work.
>>>
>>> If you want to check out the extension, I've set up this wiki
>>>
>>> https://wikienterprise.org/wiki/Main_Page
>>>
>>> where you can freely test it.
>>>
>>> I also plan to contribute whenever possible to the development
>>> of MCR/slots for the coming versions of Mediawiki, since currently
>>> the extension uses at least one hack, and the edit of slots
>>> in addition to the first could in my opinion be strongly simplified
>>> possibly with a
>>> a few changes in the code base.
>>>
>>> best
>>> (Thomas)
>>>
>>>
>>>
>>> ___
>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
>>>
>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>
>> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: [Wiki-research-l] Wikimedia Enterprise HTML dumps available for public download

2021-10-19 Thread Andrew Otto
Wow very cool!

On Tue, Oct 19, 2021 at 10:57 AM Ariel Glenn WMF 
wrote:

> I am pleased to announce that Wikimedia Enterprise's HTML dumps [1] for
> October 17-18th are available for public download; see
> https://dumps.wikimedia.org/other/enterprise_html/ for more information.
> We
> expect to make updated versions of these files available around the 1st/2nd
> of the month and the 20th/21st of the month, following the cadence of the
> standard SQL/XML dumps.
>
> This is still an experimental service, so there may be hiccups from time to
> time. Please be patient and report issues as you find them. Thanks!
>
> Ariel "Dumps Wrangler" Glenn
>
> [1] See https://www.mediawiki.org/wiki/Wikimedia_Enterprise for much more
> about Wikimedia Enterprise and its API.
> ___
> Wiki-research-l mailing list -- wiki-researc...@lists.wikimedia.org
> To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org
>
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Enabling translating RCFeed log entry messages(IRC log entry messages)

2021-09-23 Thread Andrew Otto
I can't help you with your problem, but:

> The main cause of this is probably that the messages are only for
irc.wikimedia.org and irc.wikimedia.org will be replaced with EventStreams?

I don't think this is the cause.  We'd love to deprecate irc.wikimedia.org,
but doing so is probably practically impossible  I'm not aware of any work
to remove irc.wikimedia.org.  There has been talk about replacing its
backend , but the frontend IRC
channels would remain.

On Thu, Sep 23, 2021 at 3:01 PM lens0021 lens0021 
wrote:

> TLDR: I don't think the system messages for IRC are just for IRC. So I
> hope to change it.
>
> Hi,
> I am an extension developer and I am recently developing an extension[1]
> that provides a custom RCFeedEngine and an RCFeedFormatter. The purpose of
> the extension is to stream the recent changes in a wiki to a given Discord
> webhook.*
>
> The RC feed engines send messages in a freely configurable format to an
> engine set in $wgRCEngines[2] and every RC log entry has a system message
> for RC feed output. For instance, logs for moving a page have a message
> named "1movedto2" which is represented "moved [[$1]] to [[$2]]" in English.
>
> Disappointingly, I have found [[MediaWiki:1movedto2/qqq]] and similar
> messages on translatewiki include {{ignored}} template that says "This
> message is ignored on export for MediaWiki. Translating it is a waste of
> your effort!" and it seems to be true. The main cause of this is probably
> that the messages are only for irc.wikimedia.org and irc.wikimedia.org
> will be replaced with EventStreams? I couldn't find the exact reason.
>
> In my opinion, the log entry messages are not just for IRC. IRC is an
> implementation of RCFeed and RCFeed is a general interface and can be
> extended in many ways as even the core includes multiple RCFeedEngines. So,
> if I'm not wrong, I'd like to create a ticket for enabling translation and
> request your opinions.
>
> Regards.
> -User:Lens0021
>
> * There are a few extensions with a similar purpose. But the extensions
> are not RCFeedEngines and define their own system messages and use them
> instead of using the log entry messages.
>
> ---
> [1] https://www.mediawiki.org/wiki/Extension:DiscordRCFeed
> [2] https://www.mediawiki.org/wiki/Manual:$wgRCEngines
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-01 Thread Andrew Otto
This isn't helpful now, but your use case is relevant to something I hope
to pursue in the future: comprehensive mediawiki change events, including
content.  I don't have a great place yet for collecting these use cases, so
I added it to Modern Event Platform parent ticket
 so I don't forget. :)



On Thu, Jul 1, 2021 at 8:17 AM Physikerwelt  wrote:

> Dear all,
>
> we have developed a tool that is (in some cases) capable of checking if
> formulae in -tags in the context of a wikitext fragment are likely
> to be correct or not. We would like to test the tool on the recent changes.
> From
>
> https://www.mediawiki.org/wiki/API:Recent_changes_stream
>
> we can get the stream of recent changes. However, I did not find a way to
> get the diff (either in HTML or Wikitext) to figure out how the content was
> changed. The only option I see is to request the revision text manually
> additionally. This would be a few unnecessary requests since most of the
> changes do not change -tags. I assume that others, i.e., ORES
>
> https://www.mediawiki.org/wiki/ORES,
>
> compute the diffs anyhow and wonder if there is an easier way to get the
> diffs from the recent changes stream without additional requests.
>
> All the best
> Physikerwelt (Moritz Schubotz)
>
>
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Re: [Wikitech-l] New CI Promote Step for Service Developers

2020-09-24 Thread Andrew Otto
Hey wow this is really great!  Excited to use this.  THANK YOU!

On Thu, Sep 24, 2020 at 1:59 PM Jeena Huneidi 
wrote:

> Hi Everyone,
>
> I'd like to announce a new CI feature for developers of services deployed
> to kubernetes!
>
> It's now possible for your helmfile values files to be automatically
> updated with new image versions after changes are merged to your service.
> You can read more about it and how to enable it here:
> https://phabricator.wikimedia.org/phame/post/view/208/ci_now_updates_your_deployment-charts/
>
> If you have any questions or run into problems, please feel free to reach
> out to Release Engineering directly or on IRC at #wikimedia-releng.
>
> --
> Jeena Huneidi
> Software Engineer, Release Engineering
> Wikimedia Foundation
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Replacement for Helm chart repository

2020-08-03 Thread Andrew Otto
GREAT STUFF! Thank you!

On Mon, Aug 3, 2020 at 11:02 AM Janis Meybohm 
wrote:

> Hello,
>
> We are replacing our current repository for Helm charts
> (https://releases.wikimedia.org/charts/) with an instance of
> ChartMuseum. The new repository index can be found at:
> https://helm-charts.wikimedia.org/stable/index.yaml
>
> Users may add/update the Helm repository config with using "helm repo
> add wmf-stable https://helm-charts.wikimedia.org/stable/;.
>
> Developers may now stop the process of packaging helm charts manually,
> rebuilding the index and pushing all that to git. As of now, increasing
> the charts version number in Chart.yaml is sufficient to have the chart
> being packaged and uploaded to ChartMuseum automatically.
>
> We will remove the chart tgz archives from git as well as the old
> repository from releases.wikimedia.org in the next week. If you
> experience any issues, please feel free to reach out in
> #wikimedia-serviceops.
>
> For further details, please see:
> https://wikitech.wikimedia.org/wiki/ChartMuseum
>
>
> Cheers,
>
> Janis Meybohm
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Change to Wikitech logins: Username now case-sensitive

2019-04-16 Thread Andrew Otto
Great!  Is this just for Wikitech itself or all ldap/wikitech
authentication?

On Mon, Apr 15, 2019 at 7:56 PM Bryan Davis  wrote:

> A change was deployed to the Wikitech config 2019-04-15T23:16 UTC
> which prevents users from logging into the wiki with a username that
> differs in case from the 'cn' value for their developer account.
>
> This change is not expected to cause problems for most users, but
> there may be some people who have historically entered a username with
> mismatched case (for example "bryandavis" instead of "BryanDavis") and
> relied on MediaWiki and the LdapAuthentication plugin figuring things
> out. This will no longer happen automatically. These users will need
> to update their password managers (or brains if they are not using a
> password manager) to supply the username with correct casing.
>
> The "wrongpassword" error message on Wikitech has been updated with a
> local override to help people discover this problem. See
>  for more details.
>
> Bryan, on behalf of the Cloud Services team
> --
> Bryan Davis  Wikimedia Foundation
> [[m:User:BDavis_(WMF)]] Manager, Technical EngagementBoise, ID USA
> irc: bd808v:415.839.6885 x6855
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] stats.wikimedia.org maintenance downtime

2018-09-05 Thread Andrew Otto
This has been done, thanks all!

On Tue, Aug 28, 2018 at 12:53 PM Andrew Otto  wrote:

> Hi all,
>
> On Wednesday September 5th at around 13:30 UTC we will be taking
> stats.wikimedia.org and analytics.wikimedia.org offline for a server
> upgrade.  We expect this downtime to take about an hour.
>
> You can follow along (and report any issues) at
> https://phabricator.wikimedia.org/T192641.
>
> Thanks!
> -Andrew Otto
>  Systems Engineer
>  Wikimedia Foundation
>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] stats.wikimedia.org maintenance downtime

2018-08-28 Thread Andrew Otto
Hi all,

On Wednesday September 5th at around 13:30 UTC we will be taking
stats.wikimedia.org and analytics.wikimedia.org offline for a server
upgrade.  We expect this downtime to take about an hour.

You can follow along (and report any issues) at
https://phabricator.wikimedia.org/T192641.

Thanks!
-Andrew Otto
 Systems Engineer
 Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] TechCom Radar 2018-08-08

2018-08-10 Thread Andrew Otto
Is there a reason there is no TechCom IRC meeting next week? I’d love to
have one!
https://phabricator.wikimedia.org/T201643

:)

On Fri, Aug 10, 2018 at 4:01 PM Kate Chapman  wrote:

> Hi All,
>
> Here are the minutes from this week's TechCom meeting:
>
> * IRC meeting scheduled for 22 August at 2pm PST(21:00 UTC, 23:00 CET)
> in #wikimedia-office: Introduce a new namespace for collaborative
> judgments about wiki entities 
>
> * Hosted IRC consultation meeting on RFC: Block users by page/namespace
> 
> * Log:
> <
> https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-08-08-21.00.log.html
> >
> * Minutes:
> <
> https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-08-08-21.00.html
> >
>
> * mwlog1001 log server (formerly fluorine) has had a flame graph being
> generated on it and are going to move to own server, going to move to
> production this week or next.
>
> * Last Call closing 22 August at 2pm PST(22:00 UTC, 23:00 CET) : RFC:
> Modern Event Platform - Choose Schema Tech
>  agreement was to move
> forward with using JSON as the serialization method.
>
> * Harmonise the identification of requests across our stack
>  was discussed and accepted
> as an RFC
>
> * RFC: Spec for representing multiple content objects per revision (MCR)
> in XML dumps  was discussed
> and accepted as an RFC
>
> * No TechCom internal meeting next week (2018-08-15)
>
> * No TechCom IRC meeting next week (2018-08-15)
>
> You can also find our meeting minutes at
> 
>
> See also the TechCom RFC board
> .
>
> If you prefer you can subscribe to our newsletter here:
> 
>
> Thanks,
>
> Kate
> --
> Kate Chapman TechCom Facilitator (Contractor)
>
>
>
>
>
>
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RevisionInsertComplete vs. RevisionRecordInserted

2018-02-01 Thread Andrew Otto
This is the first I’ve heard of it!  So, we don’t have a plan to change it,
but I suppose we should if RevisionInsertComplete is deprecated.  I haven’t
looked at RevisionRecordInserted yet so I can’t answer questions about
schema changes, but I doubt it would change anything.

Just created https://phabricator.wikimedia.org/T186228, thanks.

On Mon, Jan 29, 2018 at 4:19 PM, Stas Malyshev 
wrote:

> Hi!
>
> I've noticed that RevisionInsertComplete hook is now deprecated in favor
> of RevisionRecordInserted. However, EventBus still uses
> RevisionInsertComplete. Is this going to change soon? If so, will the
> underlying event/topic change too? I couldn't find anything in
> Phabricator about this - is there plan to change it or still use old
> hook for now and foreseeable future?
>
> Thanks,
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] ganglia.wikimedia.org has been retired

2018-01-02 Thread Andrew Otto
NICE!

On Fri, Dec 22, 2017 at 6:52 PM, Daniel Zahn  wrote:

> This is a notice that as part of "T177195 Reduce technical debt in metrics
> monitoring " the service
>
> ganglia.wikimedia.org has been retired and removed from DNS.
>
> It had already been in a deprecation period and served a notice about it
> for a while.
>
> The replacement service is https://grafana.wikimedia.org/
>
> If you see references to it in wiki or code, please feel free to update
> and/or create/request new grafana dashboards.
>
> ---
> refs:
>
> https://phabricator.wikimedia.org/T177225
>
> https://gerrit.wikimedia.org/r/#/q/topic:kill-ganglia+(
> status:open+OR+status:merged)
>
> https://phabricator.wikimedia.org/T177195
>
> --
> Daniel Zahn 
> Operations Engineer
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Announcing MediaWiki code search

2017-12-21 Thread Andrew Otto
Super cool!

On Thu, Dec 21, 2017 at 9:31 AM, zppix e  wrote:

> Thank you very much Kunal!
>
> --
> Zppix
> Volunteer Wikimedia Developer
> Volunteer Wikimedia GCI2017 Mentor
> enwp.org/User:Zppix
> **Note: I do not work for Wikimedia Foundation, or any of its chapters.**
>
> > On Dec 21, 2017, at 8:25 AM, Bahodir Mansurov 
> wrote:
> >
> > It's very fast!
> >
> > Also, as far as I know, it's not easy to search all extensions at
> > once on Github, which makes this tool even more valuable.
> >
> > Thanks for sharing.
> >
> > Amir Ladsgroup  writes:
> >
> >> Kunal, you rock!
> >>
> >> Best
> >>
> >> ‪On Thu, Dec 21, 2017 at 1:50 PM ‫יגאל חיטרון‬‎  >
> >> wrote:‬
> >>
> >>> Wow, thanks a lot!
> >>> Igal (User:IKhitron)
> >>>
> >>>
> >>> 2017-12-21 14:09 GMT+02:00 Florian Schmidt <
> >>> florian.schmidt.wel...@t-online.de>:
> >>>
>  Kunal….. that is simply awesome! Big thanks for this new tool, this
> will
>  make us in finding usages of deprecated methods, we would like to
> remove,
>  much more easy!
> 
> 
> 
>  Best,
> 
>  Florian
> 
> 
> 
>  Von: Wikitech-l [mailto:wikitech-l-boun...@lists.wikimedia.org] Im
>  Auftrag von Kunal Mehta
>  Gesendet: Donnerstag, 21. Dezember 2017 05:15
>  An: wikitech-l 
>  Betreff: [Wikitech-l] Announcing MediaWiki code search
> 
> 
> 
>  Hi,
> 
>  MediaWiki code search is a fully free software tool that lets you
>  easily search through all of MediaWiki core, extensions, and skins
>  that are hosted on Gerrit. You can limit your search to specific
>  repositories, or types of repositories too. Regular expressions are
>  supported in both the search string, and when filtering by path.
> 
>  Try it out: https://codesearch.wmflabs.org/search/
> 
>  I started working on this because the only other options to searching
>  the entire MediaWiki codebase was either cloning everything locally
>  (takes up space, and need to manually keep it up to date) or using
>  Github (not free software, has extraneous repositories). The backend
>  is powered by hound, a code search tool written by etsy, based on
>  Google's Code Search.
> 
>  Please let me know what you think! More documentation and links are
>  at: .
> 
>  -- Legoktm
> 
> 
>  ___
>  Wikitech-l mailing list
>  Wikitech-l@lists.wikimedia.org
>  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> 
> >>> ___
> >>> Wikitech-l mailing list
> >>> Wikitech-l@lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >> ___
> >> Wikitech-l mailing list
> >> Wikitech-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Sunsetting Trending Edits Service before the holiday

2017-12-12 Thread Andrew Otto
> This is a little inferior to the production version as it is unable to use
production kafka and if it has any outages it will lose data.

​EventStreams isn’t as good as using Kafka, but an outage shouldn’t be a
reason to lose data.  Store the Last-Event-ID
 that EventStreams
gives you, and you can use it when the service starts back up to start from
where you left off.

​>  and maybe, as others have mentioned, a good reason to get production
Kafka events flowing into Cloud VPS backed projects.

Def not opposed to a Kafka cluster in Cloud mirroring from Prod. :)

BTW, this is a wee relevant:
https://wikitech.wikimedia.org/wiki/User:Ottomata/Stream_Data_Platform

This is a draft! I’m shopping this around as a program for next FY.  We
will see!






On Tue, Dec 12, 2017 at 4:16 PM, Gergo Tisza  wrote:

> On Tue, Dec 12, 2017 at 12:12 PM, Jon Robson  wrote:
>
> > This is a little inferior to the production version as it is unable to
> use
> > production kafka and if it has any outages it will lose data.
> >
>
> Hopefully that gets fixed soon, Cloud VPS / Toolforge is the foundation for
> out volunteer tool developer community who really shouldn't be treated as
> second-class citizens.
>
> Other than that, moving to the Cloud is not a bad thing for an experimental
> project IMO. It makes it easier to experiment with minimal risk, and it
> makes it easy to add co-collaborators to your project without having to get
> prod access for them.
>
> I'm hoping to get this onto IFTTT  with help
> > from Stephen Laporte in my volunteer time, as I think this feature is a
> > pretty powerful one which has failed to find its use case in the wiki
> > world. As Kaldari points out it's incredibly good at detecting edit wars
> > and I personally have learned a lot about what our editors see as
> important
> > and notable in the world (our editors really seem to like wrestling). I
> > think there are ample and exciting things people could build on top of
> this
> > api.
> >
>
> Yeah a "give me push notifications about ongoing edit wars" tool for admins
> sounds really cool. Although you'd probably want to look at revert trends
> (or both edit and revert trends) for that.
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Question Pertaining to the "stats.grok.se" Page Containing Pre-2015 Page-View Data

2017-09-26 Thread Andrew Otto
That thread links to
https://meta.wikimedia.org/wiki/Community_Tech/Pageview_stats_tool, which
has some good info about the history and status.

On Mon, Sep 25, 2017 at 5:22 PM, Toby Negrin  wrote:

> Hi Karl, Daniel --
>
> Erik doesn't support stats.grok.se.
>
> There's some information on the reliability of stats.grok.se and
> alternatives on this thread on the analytics list:
>
> https://lists.wikimedia.org/pipermail/analytics/2017-August/005978.html
>
> good luck!
>
> -Toby
>
> On Mon, Sep 25, 2017 at 2:10 PM, Daniel Zahn  wrote:
>
> > Karl, you should probably ask Erik Zachte about grok.se
> >
> > https://www.wired.com/2013/12/erik-zachte-wikistats/
> >
> > https://en.wikipedia.org/wiki/User:Erik_Zachte
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Can we drop revision hashes (rev_sha1)?

2017-09-15 Thread Andrew Otto
> can it be a dataset generated from each revision and then published
separately?

Perhaps it be generated asynchronously via a job?  Either stored in
revision or a separate table.

On Fri, Sep 15, 2017 at 4:06 PM, Andrew Otto <o...@wikimedia.org> wrote:

> > As a random idea - would it be possible to calculate the hashes when data
> is transitioned from SQL to Hadoop storage?
>
> We take monthly snapshots of the entire history, so every month we’d have
> to pull the content of every revision ever made :o
>
>
> On Fri, Sep 15, 2017 at 4:01 PM, Stas Malyshev <smalys...@wikimedia.org>
> wrote:
>
>> Hi!
>>
>> > We should hear from Joseph, Dan, Marcel, and Aaron H on this I think,
>> but
>> > from the little I know:
>> >
>> > Most analytical computations (for things like reverts, as you say) don’t
>> > have easy access to content, so computing SHAs on the fly is pretty
>> hard.
>> > MediaWiki history reconstruction relies on the SHA to figure out what
>> > revisions revert other revisions, as there is no reliable way to know if
>> > something is a revert other than by comparing SHAs.
>>
>> As a random idea - would it be possible to calculate the hashes when
>> data is transitioned from SQL to Hadoop storage? I imagine that would
>> slow down the transition, but not sure if it'd be substantial or not. If
>> we're using the hash just to compare revisions, we could also use
>> different hash (maybe non-crypto hash?) which may be faster.
>>
>> --
>> Stas Malyshev
>> smalys...@wikimedia.org
>>
>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Can we drop revision hashes (rev_sha1)?

2017-09-15 Thread Andrew Otto
> As a random idea - would it be possible to calculate the hashes when data
is transitioned from SQL to Hadoop storage?

We take monthly snapshots of the entire history, so every month we’d have
to pull the content of every revision ever made :o


On Fri, Sep 15, 2017 at 4:01 PM, Stas Malyshev 
wrote:

> Hi!
>
> > We should hear from Joseph, Dan, Marcel, and Aaron H on this I think, but
> > from the little I know:
> >
> > Most analytical computations (for things like reverts, as you say) don’t
> > have easy access to content, so computing SHAs on the fly is pretty hard.
> > MediaWiki history reconstruction relies on the SHA to figure out what
> > revisions revert other revisions, as there is no reliable way to know if
> > something is a revert other than by comparing SHAs.
>
> As a random idea - would it be possible to calculate the hashes when
> data is transitioned from SQL to Hadoop storage? I imagine that would
> slow down the transition, but not sure if it'd be substantial or not. If
> we're using the hash just to compare revisions, we could also use
> different hash (maybe non-crypto hash?) which may be faster.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Can we drop revision hashes (rev_sha1)?

2017-09-15 Thread Andrew Otto
We should hear from Joseph, Dan, Marcel, and Aaron H on this I think, but
from the little I know:

Most analytical computations (for things like reverts, as you say) don’t
have easy access to content, so computing SHAs on the fly is pretty hard.
MediaWiki history reconstruction relies on the SHA to figure out what
revisions revert other revisions, as there is no reliable way to know if
something is a revert other than by comparing SHAs.

See
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history
(particularly the *revert* fields).



On Fri, Sep 15, 2017 at 1:49 PM, Erik Zachte  wrote:

> Compute the hashes on the fly for the offline analysis doesn’t work for
> Wikistats 1.0, as it only parses the stub dumps, without article content,
> just metadata.
> Parsing the full archive dumps is a quite expensive, time-wise.
>
> This may change with Wikistats 2.0 with has a totally different process
> flow. That I can't tell.
>
> Erik Zachte
>
> -Original Message-
> From: Wikitech-l [mailto:wikitech-l-boun...@lists.wikimedia.org] On
> Behalf Of Daniel Kinzler
> Sent: Friday, September 15, 2017 12:52
> To: Wikimedia developers 
> Subject: [Wikitech-l] Can we drop revision hashes (rev_sha1)?
>
> Hi all!
>
> I'm working on the database schema for Multi-Content-Revisions (MCR) <
> https://www.mediawiki.org/wiki/Multi-Content_Revisions/Database_Schema>
> and I'd like to get rid of the rev_sha1 field:
>
> Maintaining revision hashes (the rev_sha1 field) is expensive, and becomes
> more expensive with MCR. With multiple content objects per revision, we
> need to track the hash for each slot, and then re-calculate the sha1 for
> each revision.
>
> That's expensive especially in terms of bytes-per-database-row, which
> impacts query performance.
>
> So, what do we need the rev_sha1 field for? As far as I know, nothing in
> core uses it, and I'm not aware of any extension using it either. It seems
> to be used primarily in offline analysis for detecting (manual) reverts by
> looking for revisions with the same hash.
>
> Is that reason enough for dragging all the hashes around the database with
> every revision update? Or can we just compute the hashes on the fly for the
> offline analysis? Computing hashes is slow since the content needs to be
> loaded first, but it would only have to be done for pairs of revisions of
> the same page with the same size, which should be a pretty good
> optimization.
>
> Also, I believe Roan is currently looking for a better mechanism for
> tracking all kinds of reverts directly.
>
> So, can we drop rev_sha1?
>
> --
> Daniel Kinzler
> Principal Platform Engineer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fwd: EventStreams launch and RCStream deprecation

2017-02-27 Thread Andrew Otto
> I monitor irc.wikimedia.org and it has proven itself to be highly reliable
for many years in production, which is a lot more than can be said for any
of these proposed alternatives
​.

​Great!  But, as is, it is impossible for us to restart or migrate the
irc.wikimedia.org to new servers without also​ disrupting client
connections.  RCStream also has this problem.  EventStreams does not.

​We have talked about reworking the irc.wikimedia.org backend so that it is
easier to maintain, but as of now there is no official internal project to
do this.  Either way, I think it is safe to say that irc.wikimedia.org will
remain online for the foreseeable future.

​
​>
is there any working example code, preferably written in Python​
+1 to what Alex said: ​
https://wikitech.wikimedia.org/wiki/EventStreams#Python

Also, if you want to fix or add more client examples up there, please feel
free to do so! :)


​> Do you really have a date for decommission of a working service already
Yes, as you note, MW devs (and developers in general) have a history of
creating new services to deprecate old ones.  Sometimes this is done for
trivial reasons, other times not (but I’m sure no developer ever thinks
their own reasons are trivial ;) ).  We also have a history of creating new
services and not ever actually shutting old ones off, which creates
maintenance headaches for the ops team.  To help mitigate those headaches,
we agreed to commit to a shut off date for RCStream before launching
EventStreams.  I think the date is far enough in the future that folks
should have time to find any bugs, and for us to work out kinks on our
side.  Of course, If EventStreams isn't working​ when the July shut off
date rolls around, we won’t just blindly turn off RCStream.


-Ao




On Sat, Feb 25, 2017 at 2:31 PM, MZMcBride <z...@mzmcbride.com> wrote:

> Congratulations on the launch of EventStreams.
>
> Andrew Otto wrote:
> >I did say deprecated!  Okay okay, we may never be able to fully deprecate
> >irc.wikimedia.org.  It’s used by too many (probably sentient by now) bots
> >out there.
>
> I monitor irc.wikimedia.org and it has proven itself to be highly reliable
> for many years in production, which is a lot more than can be said for any
> of these proposed alternatives. I'm glad to hear that irc.wikimedia.org
> will be left alone. If people want to be nasty and call irc.wikimedia.org
> deprecated, I suppose that's fine, as long as it remains the functional
> and dependable (and completely quirky) API it continues to serve as.
>
> MZMcBride
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Fwd: EventStreams launch and RCStream deprecation

2017-02-23 Thread Andrew Otto
Hi everyone!

Wikimedia is releasing a new service: EventStreams
<https://wikitech.wikimedia.org/wiki/EventStreams>[1].  This service allows
us to publish arbitrary streams of JSON event data to the public.  (Pt:
we’re looking for cool new uses
<https://www.mediawiki.org/wiki/EventStreams/Blog_-_Call_For_Entries>[2] to
put on an upcoming blog post.)


Initially, the only stream available will be good ol’ RecentChanges
<https://www.mediawiki.org/wiki/Manual:RCFeed>.  This event stream overlaps
functionality already provided by irc.wikimedia.org and RCStream
<https://wikitech.wikimedia.org/wiki/RCStream>.  However, this new service
has advantages over these (now deprecated) services.


   1.

   We can expose more than just RecentChanges.
   2.

   Events are delivered over streaming HTTP (chunked transfer) instead of
   IRC or socket.io.  This requires less client side code and fewer special
   routing cases on the server side.
   3.

   Streams can be resumed from the past.  By using EventSource, a
   disconnected client will automatically resume the stream from where it left
   off, as long as it resumes within one week.  In the future, we would like
   to allow users to specify historical timestamps from which they would like
   to begin consuming, if this proves safe and tractable.


I did say deprecated!  Okay okay, we may never be able to fully deprecate
irc.wikimedia.org.  It’s used by too many (probably sentient by now) bots
out there.  We do plan to obsolete RCStream, and to turn it off in a
reasonable amount of time.  The deadline iis July 7th, 2017.  All
services that rely on RCStream should migrate to the HTTP based
EventStreams service by this date.  We are committed to assisting you in
this transition, so let us know how we can help.

Unfortunately, unlike RCStream, EventStreams does not have server side
event filtering (e.g. by wiki) quite yet.  How and if this should be done
is still under discussion <https://phabricator.wikimedia.org/T152731>.

The RecentChanges data you are used to remains the same, and is available
at https://stream.wikimedia.org/v2/stream/recentchange. However, we may
have something different for you, if you find it useful. We have been
internally producing new Mediawiki specific events
<https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema/mediawiki>
[3] for a while now, and could expose these via EventStreams as well.

Take a look at these events, and tell us what you think.  Would you find
them useful?  How would you like to subscribe to them?  Individually as
separate streams, or would you like to be able to compose multiple event
types into a single stream via an API?  These things are all possible.

I asked for a lot of feedback in the above paragraphs.  Let’s try and
centralize this discussion over on the mediawiki.org EventStreams talk page
<https://www.mediawiki.org/wiki/Talk:EventStreams>[4].   In summary, the
questions are:


   -

   What RCStream clients do you maintain, and how can we help you migrate
   to EventStreams? <https://www.mediawiki.org/wiki/Topic:Tkjkee2j684hkwc9>
   -

   Is server side filtering, by wiki or arbitrary event field, useful to
   you? <https://www.mediawiki.org/wiki/Topic:Tkjkabtyakpm967t>
   -

   Would you like to consume streams other than RecentChanges?
   <https://www.mediawiki.org/wiki/Topic:Tkjk4ezxb4u01a61>  (Currently
   available events are described the event-schemas repository
   
<https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema/mediawiki>
   .)



Thanks!
- Andrew Otto

[1] https://wikitech.wikimedia.org/wiki/EventStreams
[2] https://www.mediawiki.org/wiki/EventStreams/Blog_-_Call_For_Entries
[3]
https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema/mediawiki
[4] https://www.mediawiki.org/wiki/Talk:EventStreams
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Public Event Streams (AKA RCStream replacement) question

2016-10-20 Thread Andrew Otto
f days, most notably big template updates / transclusions. This
> graph~[5] plots Change Propagation's delay in processing the events for
> each defined rule. The "backlog per rule" metric measures the delay between
> event production and event consumption. Here, event production refers to
> the time stamp MediaWiki observed the event, while event consumption refers
> to the time that Change Propagation dequeues it from Kafka and starts
> executing it.
>
>
> > How does recovery/re-sync work after
> > disconnect/downtime?
> >
>
> Because relying on EventBus and, specifically, Change Propagation, means
> consuming events via push HTTP requests, the receiving entity does not have
> to worry about this in this context (public event streams are different
> matter, though). EventBus handles offsets internally, so even if Change
> Propagation stops working for some time or cannot connect to Kafka, it will
> resume processing events form where it left off once the pipeline is
> accessible again. If, on the other hand, the service receiving the HTTP
> requests is down or unreachable, Change Propagation has a built-in retry
> mechanism that is triggered to resend requests whenever an erroneous
> response is received from the service.
>
> I hope this helps. Would be happy to talk more about this specific topic
> some more.
>
> Cheers,
> Marko
>
>
> >
> > I have not read the entire conversation, so the answers might already be
> > there -
> > my appologies if they are, just point me there.
> >
> > Anyway, if anyone has a good solution for sending wiki-events to a large
> > number
> > of subscribers, yes, please let us (WMDE/Wikidata) know about it!
> >
> > Am 26.09.2016 um 22:07 schrieb Gergo Tisza:
> > > On Mon, Sep 26, 2016 at 5:57 AM, Andrew Otto <o...@wikimedia.org>
> wrote:
> > >
> > >>  A public resumable stream of Wikimedia events would allow folks
> > >> outside of WMF networks to build realtime stream processing tooling on
> > top
> > >> of our data.  Folks with their own Spark or Flink or Storm clusters
> (in
> > >> Amazon or labs or wherever) could consume this and perform complex
> > stream
> > >> processing (e.g. machine learning algorithms (like ORES), windowed
> > trending
> > >> aggregations, etc.).
> > >>
> > >
> > > I recall WMDE trying something similar a year ago (via PubSubHubbub)
> and
> > > getting vetoed by ops. If they are not aware yet, might be worth
> > contacting
> > > them and asking if the new streaming service would cover their use
> cases
> > > (it was about Wikidata change invalidation on third-party wikis, I
> > think).
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >
> >
> >
> > --
> > Daniel Kinzler
> > Senior Software Developer
> >
> > Wikimedia Deutschland
> > Gesellschaft zur Förderung Freien Wissens e.V.
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
>
>
> [1] https://www.mediawiki.org/wiki/EventBus
> [2]
> https://github.com/wikimedia/mediawiki-event-schemas/tree/
> master/jsonschema
> [3] https://www.mediawiki.org/wiki/Change_propagation
> [4]
> https://github.com/wikimedia/mediawiki-services-change-
> propagation-deploy/blob/ea8cdf85e700b74918a3e59ac6058a
> 1a952b3e60/scap/templates/config.yaml.j2#L556
> [5]
> https://grafana.wikimedia.org/dashboard/db/eventbus?panelId=10
>
> --
> Marko Obrovac, PhD
> Senior Services Engineer
> Wikimedia Foundation
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Public Event Streams (AKA RCStream replacement) question

2016-10-20 Thread Andrew Otto
s/Fabricator/Phabricator/ (gmail auto correct GRR)

On Thu, Oct 20, 2016 at 4:00 PM, Andrew Otto <o...@wikimedia.org> wrote:

> Thanks for the feedback everyone!
>
> Due to the simplicity of the HTTP stream model, we are moving forward with
> that, instead of websockets/socket.io.  We hope to have an initial
> version of this serving existent EventBus events this quarter.  Next we
> will focus on more features (filtering), and also work towards deprecating
> both RCStream and RCFeed.
>
> You can follow progress of this effort on Fabricator: https://
> phabricator.wikimedia.org/T130651
>
>
>
> On Thu, Sep 29, 2016 at 10:29 AM, Marko Obrovac <mobro...@wikimedia.org>
> wrote:
>
>> Hello,
>>
>> Regarding Wikidata, it is important to make the distinction here between
>> the WMF internal use and public-facing facilities. The underlying
>> sub-system that the public event streams will be relying on is called
>> EventBus~[1], which is (currently) comprised of:
>>
>> (i) The producer HTTP proxy service. It allows (internal) users to produce
>> events using a REST HTTP interface. It also validates events against the
>> currently-supported set of JSON event schemas~[2].
>> (ii) The Kafka cluster, which is in charge of queuing the produced events
>> and delivering them to consumer clients. The event streams are separated
>> into topics, e.g. a revision-create topic, a page-move topic, etc.
>> (iii) The Change Propagation service~[3]. It is the main Kafka consumer at
>> this point. In its most basic form, it executes HTTP requests triggered by
>> user-defined rules for certain topics. The aim of the service is to able
>> to
>> update dependant entities starting from a resource/event. One example is
>> recreating the needed data for a page when it is edited. When a user edits
>> a page, ChangeProp receives an event in the revision-create topic and
>> sends
>> a no-cache request to RESTBase to render it. After RB has completed the
>> request, another request is sent to the mobile content service to do the
>> same, because the output of the mobile content service for a given page
>> relies on the latest RB/Parsoid HTML.
>>
>> Currently, the biggest producer of events is MediaWiki itself. The aim of
>> this e-mail thread is to add a forth component to the system - the public
>> event stream consumption. However, for the Wikidata case, we think the
>> Change Propagation service should be used (i.e. we need to keep it
>> internal). If you recall, Daniel, we did kind of start talking about
>> putting WD updates onto EventBus in Esino Lario.
>>
>> In-lined the responses to your questions.
>>
>> On 27 September 2016 at 14:50, Daniel Kinzler <
>> daniel.kinz...@wikimedia.de>
>> wrote:
>>
>> > Hey Gergo, thanks for the heads up!
>> >
>> > The big questions here is: how does it scale? Sending events to 100
>> > clients may
>> > work, but does it work for 100 thousand?
>> >
>>
>> Yes, it does. Albeit, not instantly. We limit the concurrency of execution
>> to mitigate huge spikes and overloading the system. For example, Change
>> Propagation handles template transclusions: when a template is edited, all
>> of the pages it is transcluded in need to re-rendered, i.e. their HTMLs
>> have to be recreated. For important templates, that might mean
>> re-rendering
>> millions of pages. The queue is populated with the relevant pages and the
>> backlog is "slowly" processed. "Slowly" here refers to the fact that at
>> most X pages are re-rendered at the same time, where X is governed by the
>> concurrency factor. In the concrete example of important templates, it
>> usually takes a couple of days to go through the backlog of re-renders.
>>
>>
>> >
>> > And then there's several more important details to sort out: What's the
>> > granularity of subscription - a wiki? A page? Where does filtering by
>> > namespace
>> > etc happen?
>>
>>
>> As Andrew noted, the basic granularity is the topic, i.e. the type/schema
>> of the events that are to be received. Roughly, that means that a consumer
>> can obtain either all page edits, or page renames (for all WMF wikis)
>> without performing any kind of filtering. Change Propagation, however,
>> allows one to filter events out based on any of the fields contained in
>> the
>> events themselves, which means you are able to receive only events for a
>> specific wiki, a specific page or namespace. For example, Change
>> Propagation already handles

Re: [Wikitech-l] Public Event Streams (AKA RCStream replacement) question

2016-09-29 Thread Andrew Otto
> The big questions here is: how does it scale?
This new service is stateless and is backed by Kafka.  So, theoretically at
least, it should be horizontally scalable. (Add more Kafka brokers, add
more service workers.)


> And then there’s several more important details to sort out: What's the 
> granularity
of subscription
​.
A topic, which is generically defined, and does not need to be tied to
anything MediaWiki specific.  If you are interested in recentchanges
events​, the granularity will be the same as RCStream.

(Well ok, technically the granularity is topic-partition.  But for streams
with low enough volume, topics will only have a single partition, so in
practice the granularity is topic.)


​> ​
Where does filtering by namespace
​ ​
etc happen?
Filtering is not yet totally hammered out.  We aren’t sure what kind of
server side filtering we want to actually support in production.  Ideally
we’d get real fancy and allow complex filtering, but there are likely
performance and security concerns here.  Even so, filtering will be
configured by the client, and at the least you will be able to do glob
filtering on any number of keys, and maybe an array of possible values.
E.g. if you wanted to filter recentchanges events for plwiki and namespace
== 0, filters might look like:
{
   “database”: “plwiki”,
   “page_namespace”: 0
}


> How big is the latency?
For MediaWiki origin streams, in normal operation, probably around a few
seconds.  This highly depends on how many Kafka clusters we have to go
through before the event gets to the one from which this service is
backed.  This isn’t productionized yet, so we aren’t totally sure which
Kafka cluster these events will be served from.


> How does recovery/re-sync work after disconnect/downtime?
Events will be given to the client with their offsets in the stream.
During connection, a client can configure the offset that it wants to start
consuming at.  This is kind of like seeking to a particular location in a
file, but instead of a byte offset, you are starting at a certain event
offset in the stream.  In the future (when Kafka supports it), we will
support timestamp based subscription as well.  E.g. ‘ subscribe to
recentchanges event starting at time T.’  This will only work as long as
event at offset N or time T still exist in Kafka.  Kafka is usually used as
a rolling buffer from which old events are removed.  We will at least keep
events for 7 days, but at this time I don’t see a technical reason we
couldn’t keep events for much longer.


> Anyway, if anyone has a good solution for sending wiki-events to a large
number of subscribers yes, please let us (WMDE/Wikidata) know about it!
The first use case is not something like this.  The upcoming production
deployment will likely not be large enough to support many thousands of
connections.  BUT!  There is no technical reason we couldn’t.  If all goes
well, and WMF can be convinced to buy enough hardware, this may be
possible! :)







On Tue, Sep 27, 2016 at 3:50 PM, Daniel Kinzler <daniel.kinz...@wikimedia.de
> wrote:

> Hey Gergo, thanks for the heads up!
>
> The big questions here is: how does it scale? Sending events to 100
> clients may
> work, but does it work for 100 thousand?
>
> And then there's several more important details to sort out: What's the
> granularity of subscription - a wiki? A page? Where does filtering by
> namespace
> etc happen? How big is the latency? How does recovery/re-sync work after
> disconnect/downtime?
>
> I have not read the entire conversation, so the answers might already be
> there -
> my appologies if they are, just point me there.
>
> Anyway, if anyone has a good solution for sending wiki-events to a large
> number
> of subscribers, yes, please let us (WMDE/Wikidata) know about it!
>
> Am 26.09.2016 um 22:07 schrieb Gergo Tisza:
> > On Mon, Sep 26, 2016 at 5:57 AM, Andrew Otto <o...@wikimedia.org> wrote:
> >
> >>  A public resumable stream of Wikimedia events would allow folks
> >> outside of WMF networks to build realtime stream processing tooling on
> top
> >> of our data.  Folks with their own Spark or Flink or Storm clusters (in
> >> Amazon or labs or wherever) could consume this and perform complex
> stream
> >> processing (e.g. machine learning algorithms (like ORES), windowed
> trending
> >> aggregations, etc.).
> >>
> >
> > I recall WMDE trying something similar a year ago (via PubSubHubbub) and
> > getting vetoed by ops. If they are not aware yet, might be worth
> contacting
> > them and asking if the new streaming service would cover their use cases
> > (it was about Wikidata change invalidation on third-party wikis, I
> think).
> > ___
> > Wikitech-l mailing lis

Re: [Wikitech-l] Public Event Streams (AKA RCStream replacement) question

2016-09-26 Thread Andrew Otto
 standard implementations, either
> pure websockets or http server sent events, and let shim layers that
> provide other features like socket.io be implemented in other proxy
> servers.
>
>
>
> On Sun, Sep 25, 2016 at 4:02 PM, Merlijn van Deen (valhallasw) <
> valhall...@arctus.nl> wrote:
>
> > Hi Andrew,
> >
> > On 23 September 2016 at 23:15, Andrew Otto <o...@wikimedia.org> wrote:
> >
> > > We’ve been busy working on building a replacement for RCStream.  This
> new
> > > service would expose recentchanges as a stream as usual, but also other
> > > types of event streams that we can make public.
> > >
> >
> > First of all, why does it need to be a replacement, rather than something
> > that builds on existing infrastructure? Re-using the existing
> > infrastructure provides a much more convenient path for consumers to
> > upgrade.
> >
> >
> > > But we’re having a bit of an existential crisis!  We had originally
> > chosen
> > > to implement this using an up to date socket.io server, as RCStream
> also
> > > uses socket.io.  We’re mostly finished with this, but now we are
> taking
> > a
> > > step back and wondering if socket.io/websockets are the best
> technology
> > to
> > > use to expose stream data these days.
> > >
> >
> > For what it's worth, I'm on the fence about socket.io. My biggest
> argument
> > for socket.io is the fact that rcstream already uses it, but my
> experience
> > with implementing the pywikibot consumer for rcstream is that the Python
> > libraries are lacking, especially when it comes to stuff like
> reconnecting.
> > In addition, debugging issues requires knowledge of both socket.io and
> the
> > underlying websockets layer, which are both very different from regular
> > http.
> >
> > From the task description, I understand that the goal is to allow easy
> > resumation by passing information on the the last received message. You
> > could consider not implementing streaming /at all/, and just ask clients
> to
> > poll an http endpoint, which is much easier to implement client-side than
> > anything streaming (especially when it comes to handling disconnects).
> >
> > So: My preference would be extending the existing rcstream framework, but
> > if that's not possible, my preference would be with not streaming at all.
> >
> > Merlijn
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Public Event Streams (AKA RCStream replacement) question

2016-09-24 Thread Andrew Otto
​So, since most of the dev work for a socket.io implementation is already
done, you can see what the protocol would look like here:
https://github.com/wikimedia/kasocki#socketio-client-set-up

Kasocki is just a library, the actual WMF deployment and documentation
would be more specific about MediaWiki type events, but the interface would
be the same.  (Likely there would be client libraries to abstract the
actual socket.io interaction.)

For HTTP, instead of an RPC style protocol where you configure the stream
you want via several socket.emit calls, you’d construct the URI that
specifies the event streams, (and partitions and offsets if necessary), and
filters you want, and then request it.  Perhaps something like ‘http://
.../stream/mediawiki.revsision-create?database=plwiki;rev_len:gt100' (I
totally just made this URL up, no idea if it would work this way.).
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Public Event Streams (AKA RCStream replacement) question

2016-09-23 Thread Andrew Otto
Hi all,

We’ve been busy working on building a replacement for RCStream.  This new
service would expose recentchanges as a stream as usual, but also other
types of event streams that we can make public.

But we’re having a bit of an existential crisis!  We had originally chosen
to implement this using an up to date socket.io server, as RCStream also
uses socket.io.  We’re mostly finished with this, but now we are taking a
step back and wondering if socket.io/websockets are the best technology to
use to expose stream data these days.

The alternative is to just use ‘streaming’ HTTP chunked transfer encoding.
That is, the client makes a HTTP request for a stream, and the server
declares that it will be sending back data indefinitely in the response
body.  Clients just read (and parse) events out of the HTTP response body.
There is some event tooling built on top of this (namely SSE /
EventSource), but the basic idea is a never ending streamed HTTP response
body.

So, I’m reaching out to to gather some input to help inform a decision.
What will be easier for you users of RCStream in the future?  Would you
prefer to keep using socket.io (newer version), or would you prefer to work
directly with HTTP?  There seem to be good clients for socket.io and for
SSE/EventSource in many languages.

https://phabricator.wikimedia.org/T130651 has more context, but don’t worry
about reading it; it is getting a little long.  Feel free to chime in there
or on this thread.

Thanks!
-Andrew Otto
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Recent changes, notifications & pageprops

2016-09-23 Thread Andrew Otto
You can seek back on EventBus events, but not permanently (by default, only
up to 1 week).  If you want to respond to changes in an event stream, you
should consume the full event stream realtime and react to the events as
they come in.  A proper Stream Processing system (like Flink or Spark
Streaming) could help with this, but we don’t have that right now.  But, I
think for your use case, you don’t need a big stream processing system, as
this stream will be relatively small, and you don’t need fancy features
like time based windowing.  You just need to update something based on an
event, right?

The change-propagation service that the Services team is building can help
you with this.  It allows you to consume events, and specify matching rules
and actions to take based on those rules.

https://www.mediawiki.org/wiki/Change_propagation



On Fri, Sep 23, 2016 at 2:55 PM, Stas Malyshev 
wrote:

> Hi!
>
> > Could we emit a page/properties-change event to EventBus when page props
> > are updated?  Similar to how we emit an event for revision visibility
> > changes:
>
> This, however, still is missing a part because, as I understand,
> EventBus is not seekable. I.e., if I have data up-to-date to timepoint
> T, and I am now at timepoint N, I can scan recent changes list from T to
> N and know if certain item X has changed or not. However, since recent
> changes list has no entries for page props, and events on EventBus past
> N are lost to me, I have no idea if page props for X changed between T
> and N. To know that, I need permanent seekable record of changes. Or
> some flag that says when it was last updated, at least.
>
> Unless of course I'm missing the part where you can seek back on
> EventBus events, then please point me to the API that allows to do so.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Recent changes, notifications & pageprops

2016-09-23 Thread Andrew Otto
Could we emit a page/properties-change event to EventBus when page props
are updated?  Similar to how we emit an event for revision visibility
changes:

https://github.com/wikimedia/mediawiki-event-schemas/blob/master/jsonschema/mediawiki/revision/visibility-change/1.yaml

These events would be available to you as a stream from Kafka, or (soon) as
a publicly consumable stream.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Analytics Kafka Upgrade next week

2016-05-05 Thread Andrew Otto
Hiya,

We’ll be upgrading the Analytics Kafka cluster from 0.8.2 to 0.9.0.1 next
week.  This is scheduled to start at Wednesday May 11th at 13:00 UTC (9:00
EST, 6:00 PST).

If all goes well*, this should be a rolling upgrade with no downtime.

Just a heads up, thanks!
-Andrew & Luca


*everybody knock on wood now
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Unique Devices data available on API

2016-04-19 Thread Andrew Otto
Ha, perhaps Nuria’s quote should read:

> it does not include any cookie by which your brows*ING* history can be
tracked [3].

s/browser/browsing/

On Tue, Apr 19, 2016 at 4:38 PM, bawolff  wrote:

> > it does not include any
> > cookie by which your browser history can be tracked [3].
>
> Umm, it involves a cookie, which tracks whether you have previously
> browsed the site. While I applaud the analytics team in using such a
> privacy friendly method, I'm not sure its an entirely truthful
> statement to say that it does not involve a cookie in which your
> browser history can be tracked. The date you've visited previously
> sounds like a part of your browser history to me.
>
> To be clear, this is not meant to be a criticism. I think the approach
> that is being taken is really great.
>
> --
> -bawolff
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Ops] Migrating Node.JS services' deployment to Scap3

2016-04-14 Thread Andrew Otto
> I think that’s the only python based service deployed via scap3
EventLogging is python and deployed with scap3.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Ops] Migrating Node.JS services' deployment to Scap3

2016-04-14 Thread Andrew Otto
Oo also!  Mukunda​ and Tyler and I did some work over the last couple of
weeks to make bootstrapping new repos on deployment servers easier and
decoupled from trebuchet.  You don’t need to do this now if your repo is
already cloned on tin, but you might want to anyway.  snap deployed repos
should be added the list of sources in hieradata/common/scap/server.yaml
(in prod)[1].  For deploy-service repos, you don’t need to add any new
keyholder_agents, but adding an entry in sources will ensure that your repo
is cloned on a newly provisioned deploy server.  To do so, add:

  /deploy: ~

to the list of snap::server::sources there.  (The tilde just makes
scap::source use default params.)  See scap::source docs for more info[2].

I’ve updated Marko’s Services/Scap_Migration page with this step.

-Ao


[1] In labs deployment-prep, use hieradata/labs/deployment-prep/common.yaml
[2]
https://github.com/wikimedia/operations-puppet/blob/production/modules/scap/manifests/source.pp#L1
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] CirrusSearch outage Feb 28 ~19:30 UTC

2014-02-28 Thread Andrew Otto
 * We're going to figure out why we only got half the settings.  This is
 complicated because we can't let puppet restart Elasticsearch because
 Elasticsearch restarts must be done one node at a time.
 
Ah, I think I see it in elasticsearch/init.pp.  If you don’t want to subscribe 
the service to its config files, you should at the very least require them, so 
that they config files are put in place before the service is started by puppet 
during the first install.

e.g.  
https://github.com/wikimedia/puppet-kafka/blob/master/manifests/server.pp#L207



On Feb 28, 2014, at 5:11 PM, Nikolas Everett never...@wikimedia.org wrote:

 CirrusSearch flaked out Feb 28 around 19:30 UTC and I brought it back from
 the dead around 21:25 UTC.  During the time it was flaking out searches
 that used it (mediawiki.org, wikidata.org, ca.wikipedia.org, and everything
 in Italian) took a long, long time or failed immediately with a message
 about this being a temporary problem we're working on fixing.
 
 Events:
 We added four new Elasticserach servers on Rack D (yay) around 18:45 UTC
 The Elasticsearch cluster started serving simple requests very slowly
 around 19:30 UTC
 I was alerted to a search issue on IRC at 20:45 UTC
 I fixed the offending Elasticsearch servers around 21:25 UTC
 Query times recovered shortly after that
 
 Explanation:
 We very carefully installed the same version of Elasticsearch and Java as
 we use on the other machines then used puppet to configure the
 Elasticsearch machines to join the cluster.  It looks like they only picked
 up half the configuration provided by puppet
 (/etc/elasticsearch/elasticsearch.yml but not
 /etc/defaults/elasticsearch).  Unfortunately for us that is the bad half to
 miss because /etc/default/elasticsearch contains the JVM heap settings.
 
 The servers came online with the default amount of heap which worked fine
 until Elasticsearch migrated a sufficiently large index to them.  At that
 point the heap filled up and Java does what it does in that case and spun
 forever trying to free garbage.  It pretty much pegged one CPU and rendered
 the entire application unresponsive.  Unfortunately (again) pegging one CPU
 isn't that weird for Elasticsearch.  It'll do that when it is merging.  The
 application normally stays responsive because the rest of the JVM keeps
 moving along.  That doesn't happen when heap is full.
 
 Knocking out one of those machines caused tons of searches to block,
 presumably waiting for those machine to respond.  I'll have to dig around
 to see if I can find the timeout but we're obviously using the default
 which in our case is way way way to long.  We then filled the pool queue
 and started rejecting requests to search altogether.
 
 When I found the problem all I had to do was kill -9 the Elasticsearch
 servers and restart them.  -9 is required because JVMs don't catch the
 regular signal if they are too busy garbage collecting.
 
 What we're doing to prevent it from happening again:
 * We're going to monitor the slow query log and have icinga start
 complaining if it grows very quickly.  We normally get a couple of slow
 queries per day so this shouldn't be too noisy.  We're going to also have
 to monitor error counts, especially once we get more timeouts.  (
 https://bugzilla.wikimedia.org/show_bug.cgi?id=62077)
 * We're going to sprinkle more timeouts all over the place.  Certainly in
 Cirrus while waiting on Elasticsearch and figure out how to tell
 Elasticsearch what the shard timeouts should be as well.(
 https://bugzilla.wikimedia.org/show_bug.cgi?id=62079)
 * We're going to figure out why we only got half the settings.  This is
 complicated because we can't let puppet restart Elasticsearch because
 Elasticsearch restarts must be done one node at a time.
 
 Nik
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Need help from Wikimedia github admins to transfer bingle repo

2014-02-21 Thread Andrew Otto
Hm, someone correct me if I’m wrong, but if you want to have Bingle hosted on 
the wikimedia Github, you should create a repository for it in Gerrit, and then 
add a special replication rule in puppet in role/gerrit.pp to get the 
repository in the Github URL that you want.




On Feb 21, 2014, at 11:45 AM, Arthur Richards aricha...@wikimedia.org wrote:

 I'd like to transfer the github repo for Bingle to the organization
 (currently it's associated with my personal github account). According to
 https://help.github.com/articles/how-to-transfer-a-repository that means
 I'll need administrative rights in order to do so. Can someone help me out
 with this?
 
 -- 
 Arthur Richards
 Software Engineer, Mobile
 [[User:Awjrichards]]
 IRC: awjr
 +1-415-839-6885 x6687
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Analytics] [WikimediaMobile] Mobile stats

2013-09-20 Thread Andrew Otto
Oh awesome!  Glad y'all found it!


On Sep 19, 2013, at 5:01 PM, Adam Baso ab...@wikimedia.org wrote:

 +Analytics
 
 
 On Thu, Sep 19, 2013 at 1:57 PM, Adam Baso ab...@wikimedia.org wrote:
 A run on yesterday's valid Wikipedia Zero hits showed that user agents NOT 
 supporting HTML (i.e., only supporting WAP) is only 0.098 - 0.108 *percent*.
 
 Assuming a bunch of complaints don't come in (e.g., I'm getting tag soup!, 
 as Max might say), I think we could make a reasonable case to stop supporting 
 WAP through the formal channels (blog, mailing list(s), etc.).
 
 -Adam
 
 
 On Tue, Sep 17, 2013 at 1:11 PM, Arthur Richards aricha...@wikimedia.org 
 wrote:
 That's awesome - thanks Max and Adam; it's great to see the last vestiges of 
 X-Device finally disappear!
 
 
 On Tue, Sep 17, 2013 at 1:07 PM, Max Semenik maxsem.w...@gmail.com wrote:
 After looking at Varnish VCL with Adam, we discovered a bug in regex 
 resulting in many phones being detected as WAP when they shouldn't be. Since 
 the older change[1] simplifying detection had also fixed this bug, Brandon 
 Black deployed it and since today the usage share of WAP should seriously 
 drop. We will be monitoring the situation and revisit the issue of WAP 
 popularity once we have enough data.
 
 [1] https://gerrit.wikimedia.org/r/83919
 
 On Tue, Sep 10, 2013 at 4:39 PM, Adam Baso ab...@wikimedia.org wrote:
 Thanks. 7-9% of responses on Wikipedia Zero being WAP is pretty substantial.
 
 
 On Tue, Sep 10, 2013 at 2:01 PM, Andrew Otto o...@wikimedia.org wrote:
  These
  zero.tsv.log*
  files to which I refer seem to be, basically Varnish log lines that
  correspond to Wikipedia Zero-targeted traffic.
 Yup!  Correct.  zero.tsv.log* files are captured unsampled and based on the 
 presence of a zero= tag in the X-Analytics header:
 
 http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df8779e9a3bdb69066b2/templates%2Fudp2log%2Ffilters.oxygen.erb#L10
 
  Do I understand correctly that field as Content-Type?
 Yup again!  The varnishncsa format string that is currently being beamed at 
 udp2log is here:
 
 http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df8779e9a3bdb69066b2/modules%2Fvarnish%2Ffiles%2Fvarnishncsa.default
 
 
 -- 
 Best regards,
 Max Semenik ([[User:MaxSem]])
 
 ___
 Mobile-l mailing list
 mobil...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l
 
 
 
 
 -- 
 Arthur Richards
 Software Engineer, Mobile
 [[User:Awjrichards]]
 IRC: awjr
 +1-415-839-6885 x6687
 
 
 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [WikimediaMobile] [Analytics] Mobile stats

2013-09-10 Thread Andrew Otto
 These
 zero.tsv.log*
 files to which I refer seem to be, basically Varnish log lines that
 correspond to Wikipedia Zero-targeted traffic.
Yup!  Correct.  zero.tsv.log* files are captured unsampled and based on the 
presence of a zero= tag in the X-Analytics header:  

http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df8779e9a3bdb69066b2/templates%2Fudp2log%2Ffilters.oxygen.erb#L10

 Do I understand correctly that field as Content-Type?
Yup again!  The varnishncsa format string that is currently being beamed at 
udp2log is here:

http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df8779e9a3bdb69066b2/modules%2Fvarnish%2Ffiles%2Fvarnishncsa.default





On Sep 10, 2013, at 4:25 PM, Adam Baso ab...@wikimedia.org wrote:

 Somewhere in between, I think.
 
 Wikipedia Zero's main extension, ZeroRatedMobileAccess, relies upon the
 mobile web's main extension, MobileFrontend. Wikipedia Zero access is
 served across [lang.].zero.wikipedia.org and [lang.].m.wikipedia.org.
 
 As I understand, the general Varnish logs capture both the Wikipedia
 Zero-based and the non-Wikipedia Zero-based mobile web access. These
 zero.tsv.log*
 files to which I refer seem to be, basically Varnish log lines that
 correspond to Wikipedia Zero-targeted traffic.
 
 Wikipedia Zero for the mobile web will in all likelihood have a higher rate
 of WAP device usage and WAP content served when compared to the general
 Wikipedia for the mobile web stats. It's likely that, to at least some
 extent, that higher WAP usage in participating Wikipedia Zero markets,
 would be washed out by the relatively higher adoption of smartphones in
 wealthier markets.
 
 Please do let me know in case of a need for further clarification!
 
 -Adam
 
 
 On Tue, Sep 10, 2013 at 4:04 AM, Gerard Meijssen
 gerard.meijs...@gmail.comwrote:
 
 Hoi,
 Is the Wikipedia-Zero traffic information part of the mobile statistics or
 is it something completely separate thing?
 Thanks,
 GerardM
 
 
 On 10 September 2013 03:26, Adam Baso ab...@wikimedia.org wrote:
 
 Wikipedia Zero traffic (IP address and MCC/MNC matching as expected)
 shows
 in one day of requests (zero.tsv.log-20130907) roughly 7-9% of page
 responses having a Content-Type response of text/vnd.wap.wml, presuming
 field #11 (or index 10 if you're indexing from 0) in zero.tsv.log-date
 is
 the Content-Type. Do I understand correctly that field as Content-Type?
 
 Thanks.
 -Adam
 
 
 On Thu, Sep 5, 2013 at 9:27 AM, Arthur Richards aricha...@wikimedia.org
 wrote:
 
 Would adding the accept header to the x-analytics header be worthwhile
 for
 this?
 On Sep 5, 2013 4:16 AM, Erik Zachte ezac...@wikimedia.org wrote:
 
 For a breakdown per country, the higher the sampling rate the better,
 as
 the data will become reliable even for smaller countries with a not so
 great adoption rate of Wikipedia.
 
 -Original Message-
 From: analytics-boun...@lists.wikimedia.org [mailto:
 analytics-boun...@lists.wikimedia.org] On Behalf Of Max Semenik
 Sent: Thursday, September 05, 2013 12:28 PM
 To: Diederik van Liere
 Cc: A mailing list for the Analytics Team at WMF and everybody who has
 an
 interest in Wikipedia and analytics.; mobile-l; Wikimedia developers
 Subject: Re: [Analytics] [WikimediaMobile] Mobile stats
 
 On 05.09.2013, 4:04 Diederik wrote:
 
 Heya,
 I would suggest to at least run it for a 7 day period so you capture
 at least the weekly time-trends, increasing the sample size should
 also be recommendable. We can help setup a udp-filter for this
 purpose
 as long as the data can be extracted from the user-agent string.
 
 Unfortunately, accept is no less important here.
 So, to enumerate our requirements as a result of this thread:
 * Sampling rate the same as wikistats (1/1000).
 * No less than a week worth of data.
 * User-agent:
 * Accept:
 * Country from GeoIP to determine the share of developing countries.
 * Wiki to determine if some wikis are more dependant on WAP than other
  ones.
 
 Anything else?
 
 --
 Best regards,
  Max Semenik ([[User:MaxSem]])
 
 
 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics
 
 
 ___
 Mobile-l mailing list
 mobil...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l
 
 
 ___
 Mobile-l mailing list
 mobil...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 

Re: [Wikitech-l] [Analytics] Page view stats failure

2013-07-24 Thread Andrew Otto
Yesterday morning I received an alert that the gadolinium udp2log process was 
experiencing packet loss.  In addition to being the webstats-collector host 
(which generates the pagecounts files), gadolinium is a socat relay.  It is 
responsible for feeding about 5 total udp2log instances all of the webrequest 
log traffic.

Upon investigating the packet loss issue on gadolinium, I noticed that the 
socat relay process itself was dropping packets if the udp2log process was also 
up.  I believe this is due to the fact that if both socat and udp2log is 
running, the NIC must process twice the amount of data than if only one is 
running.  I went into emergency mode to move as much of the udp2log filters to 
other existent udp2log boxes.  Opsen and I set up a new box (erbium) so that we 
could still have a box on which to run some of the gadolinium udp2log filters 
(including the webstatscollector one).

Fundraising gets their webrequest data from gadolinium, so I had spent much of 
the day working with them.  It turned out that this wasn't so much of an 
emergency for them, since they had a scheduled downtime during this time anyway.

Erbium was almost fully ready yesterday evening.  When I was about to finish 
setting up erbium, other opsen had started a restructuring of production 
puppetmaster setup, which caused puppet to not work for a short period.  I was 
crunched with time to finish this, but couldn't until the puppetmaster was back 
up.  I had urgent personal business to take care of (had to put an application 
in on an apartment before someone else did), so I ran out for the evening 
leaving things in this state.  I was thinking mostly of Fundraising, and they 
didn't' seem worried, and forgot that webstatscollector was an issue too.

Erbium is online as of a few minutes ago and the webstatscollector processes 
should be trucking along, so pageview data should be fine starting now.  The 
webstatscollector processes are not currently monitored.  I plan to add process 
monitoring for both of these, as well as UDP dropped packet statistics for both 
the socat relay process and the webstatscollector process.







On Jul 24, 2013, at 1:37 AM, Jeremy Baron jer...@tuxmachine.com wrote:

 On Jul 24, 2013 12:43 AM, Ikuya Yamada ik...@sfc.keio.ac.jp wrote:
  It seems that the page view statistics data does not contain the
  actual data for the last few hours.
 
  http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-07/
 
  Are there any failures on the server-side?
 
 Just looking at file sizes I can see 15, 16, and 20-05(the current hour) UTC 
 all look smaller than normal. (yes, something's broken)
 
 -Jeremy
 
 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Github/Gerrit mirroring

2013-03-14 Thread Andrew Otto
 I've been hosting my puppet-cdh4 (Hadoop) repository on Github for a while 
 now.  I am planning on moving this into Gerrit.
 Why do you want to move it to gerrit then?

Security reasons.  All puppet repos have to be hosted by WMF and reviewed by 
ops before we can use it in production.


On Mar 14, 2013, at 4:11 PM, Juliusz Gonera jgon...@wikimedia.org wrote:

 On 03/08/2013 08:55 AM, Andrew Otto wrote:
 I've been hosting my puppet-cdh4 (Hadoop) repository on Github for a while 
 now.  I am planning on moving this into Gerrit.
 
 I've been getting pretty high quality pull requests for the last month or so 
 from a couple of different users. (Including CentOS support, supporting 
 MapReduce v1 as well as YARN, etc.)
 
   https://github.com/wikimedia/puppet-cdh4/issues?page=1state=closed
 
 I'm happy to host this in Gerrit, but I suspect that contribution to this 
 project will drop once I do. :/
 
 Why do you want to move it to gerrit then?
 
 -- 
 Juliusz
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Github/Gerrit mirroring

2013-03-08 Thread Andrew Otto
I've been hosting my puppet-cdh4 (Hadoop) repository on Github for a while now. 
 I am planning on moving this into Gerrit.

I've been getting pretty high quality pull requests for the last month or so 
from a couple of different users. (Including CentOS support, supporting 
MapReduce v1 as well as YARN, etc.) 

  https://github.com/wikimedia/puppet-cdh4/issues?page=1state=closed

I'm happy to host this in Gerrit, but I suspect that contribution to this 
project will drop once I do. :/







On Mar 8, 2013, at 11:47 AM, Quim Gil q...@wikimedia.org wrote:

 On 03/08/2013 08:31 AM, Dan Andreescu wrote:
 ... Then, if a developer is not willing to learn
 Gerrit, its code is probably not worth the effort of us integrating
 github/gerrit.  That will just add some more poor quality code to your
 review queues.
 
 imho GitHub has the potential to get us a first patch from many contributors 
 that won't arrive through gerrit.wikimedia.org first. It's just a lot simpler 
 for GitHub users. Some of those patches will be good, some not so much, but 
 that is probably also the case for first time contributors in Gerrit.
 
 When a developer submits a second and a third pull request via GitHub then we 
 can politely invite her to check http://www.mediawiki.org/wiki/Gerrit and 
 join our actual development process.
 
 -- 
 Quim Gil
 Technical Contributor Coordinator @ Wikimedia Foundation
 http://www.mediawiki.org/wiki/User:Qgil
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Analytics] RFC: Tab as field delimiter in logging format of cache servers

2013-01-31 Thread Andrew Otto
Hi all!

Just an FYI here that this has been done, yay!   Varnish, Nginx, and Squid 
frontends are now all logging with tab as the field delimiter.

For those who would notice, for the time being, we have started outputting logs 
to new filenames with .tab. in the name, so as to differentiate the format.  We 
will most likely change the file names back to their original names in a month 
or so.

Thanks all!
-Andrew Otto


On Jan 28, 2013, at 11:33 AM, Matthew Flaschen mflasc...@wikimedia.org wrote:

 On 01/27/2013 08:07 AM, Erik Zachte wrote:
 The code to change existing tabs into some less obnoxious character is dead
 trivial, hardly any overhead. At worst one field will then be affected, not
 the whole record, which makes it easier to spot and debug the anomaly when
 it happens. 
 
 Scanning an input record for tabs and raising a counter is also very
 efficient. Sending one alert hourly based on this counter should make us
 aware soon enough when this issue needs follow-up, yet without causing
 bottle necks.
 
 Doing both of those would be pretty robust.  However, if that isn't
 workable, a simple option is just to strip tab characters before
 Varnish/Squid/etc. writes the line.
 
 That means downstream code doesn't have to do anything special, and it
 shouldn't affect many actual requests.
 
 Matt Flaschen
 
 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Analytics] RFC: Tab as field delimiter in logging format of cache servers

2013-01-31 Thread Andrew Otto
Ah, no I mean change the future ones back to their original names.  We'd leave 
the ones that are being generated as '.tab.' now as they are.  We could see 
these filenames in the archives.

But!  If everybody loves '.tab.',  forever, that's fine with me too!




On Jan 31, 2013, at 6:51 PM, Diederik van Liere dvanli...@wikimedia.org wrote:

 Yes let's not change the filenames
 D
 
 Sent from my iPhone
 
 On 2013-01-31, at 18:45, Matthew Walker mwal...@wikimedia.org wrote:
 
 We will most likely change the file names back to their original names in a 
 month or so
 
 Please don't. It'll serve as a visible marker for the future for when we go 
 back and look at the files and do a WTF.
 
 ~Matt Walker
 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics
 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Analytics] RFC: Tab as field delimiter in logging format of cache servers

2013-01-31 Thread Andrew Otto
.tsv doh!

That I like!  That woulda been so much cleaner and obvious and I wouldn't mind 
leaving it forever.

Hokay!



On Jan 31, 2013, at 7:25 PM, Yuri Astrakhan yuriastrak...@gmail.com wrote:

 .tsv - tab separated values?
 
 
 On Thu, Jan 31, 2013 at 7:23 PM, Chad innocentkil...@gmail.com wrote:
 
 On Thu, Jan 31, 2013 at 7:12 PM, Andrew Otto o...@wikimedia.org wrote:
 Ah, no I mean change the future ones back to their original names.  We'd
 leave the ones that are being generated as '.tab.' now as they are.  We
 could see these filenames in the archives.
 
 But!  If everybody loves '.tab.',  forever, that's fine with me too!
 
 
 I really don't like .tab.
 
 -Chad
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Mediawiki + Vagrant

2012-10-16 Thread Andrew Otto
  If you don't want to use Puppet or Chef, you can just configure an instance 
 by hand (by SSHing into it, usually) and regenerate a Vagrant box from the 
 result.

Actually, even if you do want to use Puppet, a VM can be useful.  I have a 
local VM (not Vagrant) set up that I use to test new puppet manifests.  I'm 
using our operations/puppet production branch directly.  As is, it wasn't 
completely straightforward to set this up with the way our puppet repository is 
now, but it is possible.  Labs is good for final testing of manifests, but it 
is hard to use it to test changes as you make them.

So u, yeahh!  This could be good for (some) ops people too.



On Oct 16, 2012, at 6:24 AM, Ori Livneh o...@wikimedia.org wrote:

 
 
 On Monday, October 15, 2012 at 10:28 PM, Andrew Bogott wrote:
 Does a single-instance install provide a sufficient test platform for 
 80% of the likely patches, or does all of the interesting stuff require 
 a full-blown cluster?
 
 I don't want to guess percentages, but a significant amount of development 
 work -- especially front-end -- can be adequately tested on a VM. 
 If single-system servers are truly useful in most cases, then a 
 prepackaged VM image seems straightforward and handy. Presuming that 
 the devs are willing/able to run a few git commands before they start 
 coding, we could potentially leave puppet and Vagrant out of the 
 equation and just build a one-off image by hand and include strict 
 instructions to fetch and rebase immediately after opening. It looks 
 like that's roughly what Mozilla is doing at the moment.
 
 Vagrant is a means for doing precisely that. If you don't want to use Puppet 
 or Chef, you can just configure an instance by hand (by SSHing into it, 
 usually) and regenerate a Vagrant box from the result. Vagrant can set up 
 shared folders between the host and guest VM, so what we might want to do is 
 simply tell Vagrant to mount its project directory on the host machine as 
 /srv/mediawiki (or whatever) in the guest VM, and have apache serve that. 
 That makes it very easy to track head in git: you keep a clone of the 
 repository up-to-date on your local disk, and leave it for the VM to serve 
 it. That would spare people the trouble of having to set up a LAMP stack. 
 
 Is anyone interested in taking this on? I've done it before and found it 
 useful, so I'd be happy to provide some assistance. Otherwise I'll work on it 
 myself when I get the chance.
 
 
 --
 Ori Livneh
 o...@wikimedia.org
 
 
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Github replication

2012-10-03 Thread Andrew Otto
Awesome!  I have a repo I'd love to try this with right now.  I'll find you on 
IRC…


On Oct 3, 2012, at 12:27 PM, Chad innocentkil...@gmail.com wrote:

 Hi everyone,
 
 Just letting everyone know: mediawiki/core is now replicating from
 gerrit to github.
 
 https://github.com/mediawiki/core
 
 Next step: extensions!
 
 -Chad
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Appreciation thread

2012-08-24 Thread Andrew Otto
Many many many thanks Rob H, Peter Y, Leslie C, Ben H, Ryan L, Faidon, Daniel 
Z, Mark B, Chris J, and everyone else on the ops team that has put up with my 
IRC poking and prodding thus far.  You guys are a huge help to the analytics 
team.  Thanks for guiding me through and teaching me the systems, and for 
feedback for my puppet stuff. :)


On Aug 23, 2012, at 9:29 PM, Roan Kattouw roan.katt...@gmail.com wrote:

 On Thu, Aug 23, 2012 at 2:30 AM, Niklas Laxström
 niklas.laxst...@gmail.com wrote:
 * Sumana for this idea.
 +1
 
 Also:
 * Inez for writing code I intended to write, exactly the way I
 intended to write it, while I was busy with something else yesterday
 * Timo (Krinkle) for announcing he's back from vacation via the gerrit-wm bot
 * The rest of the VE team for being generally awesome
 * Aaron, Ariel, Ben and Faidon (and anyone else that's working on this
 that I'm forgetting) for their relentless work on the Swift migration
 this week
 
 Roan
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Appreciation thread

2012-08-24 Thread Andrew Otto
Oh and thanks to Jeremy B too!  He's been super helpful at directing my 
questions to the proper know-it-all.


On Aug 24, 2012, at 9:58 AM, Andrew Otto o...@wikimedia.org wrote:

 Many many many thanks Rob H, Peter Y, Leslie C, Ben H, Ryan L, Faidon, Daniel 
 Z, Mark B, Chris J, and everyone else on the ops team that has put up with my 
 IRC poking and prodding thus far.  You guys are a huge help to the analytics 
 team.  Thanks for guiding me through and teaching me the systems, and for 
 feedback for my puppet stuff. :)
 
 
 On Aug 23, 2012, at 9:29 PM, Roan Kattouw roan.katt...@gmail.com wrote:
 
 On Thu, Aug 23, 2012 at 2:30 AM, Niklas Laxström
 niklas.laxst...@gmail.com wrote:
 * Sumana for this idea.
 +1
 
 Also:
 * Inez for writing code I intended to write, exactly the way I
 intended to write it, while I was busy with something else yesterday
 * Timo (Krinkle) for announcing he's back from vacation via the gerrit-wm bot
 * The rest of the VE team for being generally awesome
 * Aaron, Ariel, Ben and Faidon (and anyone else that's working on this
 that I'm forgetting) for their relentless work on the Swift migration
 this week
 
 Roan
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Personal sandbox space in Gerrit

2012-07-02 Thread Andrew Otto
Ah that is useful!  Thanks Chad!

On Jul 2, 2012, at 1:35 PM, Chad wrote:

 Hi everyone,
 
 I've just come across (and enabled) a feature in Gerrit that I think
 many will find useful. I'm calling them personal sandboxes. The
 basic premise is that each user can have a personal branch space
 that they have push rights to that don't require admin intervention.
 
 The branches are named in the format sandbox/$username/* so I
 could make a sandbox called sandbox/demon/weekend-hacking
 and push that to gerrit without requiring review or anyone to make
 the branch first. Quick example:
 
 $ cd mediawiki/core
 $ git checkout -b sandbox/demon/foo-bar
 [ hack away ]
 $ git push --set-upstream origin sandbox/demon/foo-bar
 
 (The --set-upstream is only necessary the first time you push)
 
 This isn't designed to replace long-lived branches where you are
 collaborating with others, but to simply give you a space where you
 can push some work when you want to stash it (and it should be
 viewable by Gitweb, if you want others to see it).
 
 Happy hacking!
 
 -Chad
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Extensions queue for Git conversion

2012-05-21 Thread Andrew Otto
Chad, I'm not sure if this matters for what you are doing, but I recently moved 
this
  http://svn.mediawiki.org/viewvc/mediawiki/trunk/udplog/
to
  https://gerrit.wikimedia.org/r/gitweb?p=analytics/udplog.git;a=summary

Is there a list or page somewhere that keeps track of what has already been 
migrated to git?

-Andrew Otto


On May 21, 2012, at 2:01 PM, Chad wrote:

 On Mon, May 21, 2012 at 10:50 AM, Stephan Gambke s7ep...@gmail.com wrote:
 Hi,
 
 the next batch of extensions to be transferred to Git is overdue for
 more than a week now. Any idea, when it will actually happen?
 (http://www.mediawiki.org/wiki/Git/Conversion/Extensions_queue)
 
 
 I apologize for letting those original dates slip. I've got a bunch
 of projects looking to make the jump to Git, that page included.
 
 Looking to get this done this week.
 
 -Chad
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Extensions queue for Git conversion

2012-05-21 Thread Andrew Otto
Full history, but apparently I didn't do some fancy things that Chad usually 
does.  Chad is going to redo this conversion this week.


On May 21, 2012, at 4:10 PM, K. Peachey wrote:

 On Tue, May 22, 2012 at 4:17 AM, Andrew Otto o...@wikimedia.org wrote:
 Chad, I'm not sure if this matters for what you are doing, but I recently 
 moved this
  http://svn.mediawiki.org/viewvc/mediawiki/trunk/udplog/
 to
  https://gerrit.wikimedia.org/r/gitweb?p=analytics/udplog.git;a=summary
 
 Is there a list or page somewhere that keeps track of what has already been 
 migrated to git?
 
 -Andrew Otto
 
 Perhaps you could put a . OBSOLETE file in the folder pointing to git,
 Also is that a cut and paste move or did you do the full history?
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Guidelines for db schema changes

2012-04-27 Thread Andrew Otto
Here's the migrations library I wrote.  :)
https://github.com/ottomata/cs_migrations


-Andrew Otto


On Apr 26, 2012, at 11:30 AM, Andrew Otto wrote:

 I once wrote a pretty decent schema migration tool that fits most if not all 
 of these requirements.  It was built for the Kohana PHP framework, but a lot 
 of it is pretty independent of that.  If someone ends up working on this I'd 
 love to help and maybe share some code and ideas.  
 
 -Andrew Otto
 
 http://ottomata.org
 http://www.flickr.com/photos/OttomatonA
 http://www.couchsurfing.org/people/otto
 
 
 On Apr 25, 2012, at 12:58 PM, Asher Feldman wrote:
 
 I am generally in favor of all of this and in the meeting that proceeded
 Rob's email, proposed that we develop a new schema migration tool for
 mediawiki along similar lines. Such a beast would have to work in all
 deployment cases without modifications (stock single wiki installs and at
 wmf with many wikis across multiple masters with tiered replication), be
 idempotent when run across many databases, track version and state per
 migration, and include up/down steps in every migration.
 
 There are opensource php migration tools modeled along those used by the
 popular ruby and python frameworks. I deployed
 https://github.com/davejkiger/mysql-php-migrations at kiva.org a couple
 years ago where it worked well and is still in use.  Nothing will meet our
 needs off the shelf though.  A good project could at best be forked into
 mediawiki with modifications if the license allows it, or more likely serve
 as a model for our own development.
 
 On Tue, Apr 24, 2012 at 11:27 PM, Faidon Liambotis 
 fai...@wikimedia.orgwrote:
 
 
 In other systems I've worked before, such problems have been solved by
 each schema-breaking version providing schema *and data* migrations for
 both forward *and backward* steps.
 
 
 This means that the upgrade transition mechanism knew how to add or
 remove columns or tables *and* how to fill them with data (say by
 concatenating two columns of the old schema). The same program would
 also take care to do the exact opposite steps in a the migration's
 backward method, in case a rollback was needed.
 
 
 Down migrations aid development; I find them most useful as documentation
 of prior state, making a migration readable as a diff.  They generally
 aren't useful in production environments at scale though, which developers
 removed from the workings of production need to be aware of.  Even with
 transparent execution of migrations, the time it takes to apply changes
 will nearly always be far outside of the acceptable bounds of an emergency
 response necessitating a code rollback.  So except in obvious cases such as
 adding new tables, care is needed to keep forward migration backwards
 compatible with code as much as possible.
 
 The migrations themselves can be kept in the source tree, perhaps even
 versioned and with the schema version kept in the database, so that both
 us and external users can at any time forward their database to any
 later version, automagically.
 
 
 Yep. That we have to pull in migrations from both core and many extensions
 (many projects, one migration system) while also running different sets of
 extensions across different wikis intermingling on the same database
 servers adds some complexity but we should get there.
 
 -Asher
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Guidelines for db schema changes

2012-04-26 Thread Andrew Otto
I once wrote a pretty decent schema migration tool that fits most if not all of 
these requirements.  It was built for the Kohana PHP framework, but a lot of it 
is pretty independent of that.  If someone ends up working on this I'd love to 
help and maybe share some code and ideas.  

-Andrew Otto

http://ottomata.org
http://www.flickr.com/photos/OttomatonA
http://www.couchsurfing.org/people/otto


On Apr 25, 2012, at 12:58 PM, Asher Feldman wrote:

 I am generally in favor of all of this and in the meeting that proceeded
 Rob's email, proposed that we develop a new schema migration tool for
 mediawiki along similar lines. Such a beast would have to work in all
 deployment cases without modifications (stock single wiki installs and at
 wmf with many wikis across multiple masters with tiered replication), be
 idempotent when run across many databases, track version and state per
 migration, and include up/down steps in every migration.
 
 There are opensource php migration tools modeled along those used by the
 popular ruby and python frameworks. I deployed
 https://github.com/davejkiger/mysql-php-migrations at kiva.org a couple
 years ago where it worked well and is still in use.  Nothing will meet our
 needs off the shelf though.  A good project could at best be forked into
 mediawiki with modifications if the license allows it, or more likely serve
 as a model for our own development.
 
 On Tue, Apr 24, 2012 at 11:27 PM, Faidon Liambotis 
 fai...@wikimedia.orgwrote:
 
 
 In other systems I've worked before, such problems have been solved by
 each schema-breaking version providing schema *and data* migrations for
 both forward *and backward* steps.
 
 
 This means that the upgrade transition mechanism knew how to add or
 remove columns or tables *and* how to fill them with data (say by
 concatenating two columns of the old schema). The same program would
 also take care to do the exact opposite steps in a the migration's
 backward method, in case a rollback was needed.
 
 
 Down migrations aid development; I find them most useful as documentation
 of prior state, making a migration readable as a diff.  They generally
 aren't useful in production environments at scale though, which developers
 removed from the workings of production need to be aware of.  Even with
 transparent execution of migrations, the time it takes to apply changes
 will nearly always be far outside of the acceptable bounds of an emergency
 response necessitating a code rollback.  So except in obvious cases such as
 adding new tables, care is needed to keep forward migration backwards
 compatible with code as much as possible.
 
 The migrations themselves can be kept in the source tree, perhaps even
 versioned and with the schema version kept in the database, so that both
 us and external users can at any time forward their database to any
 later version, automagically.
 
 
 Yep. That we have to pull in migrations from both core and many extensions
 (many projects, one migration system) while also running different sets of
 extensions across different wikis intermingling on the same database
 servers adds some complexity but we should get there.
 
 -Asher
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Git + Gerrit is a toughy

2012-02-27 Thread Andrew Otto
 In conclusion, it is best to use a local branch for any bug / feature you 
 might be working on actually. Branches are cheap in git, they are just a 
 pointer.  Once you are happy with your small branch, squash the commits and 
 submit the end result. Less changes, less spam.
We've had a few discussions around the office about this, but I'll comment here 
too.

This works as long as you are not trying to collaborate with other people.  
There are 5 of us trying to use our analytics repository for development work.  
We are doing fake-scrum, and every day we talk about what has been worked on.  
We are using git/gerrit to share this work.  Committing often to only local 
branches doesn't help us for collaboration.  

Roan and Ryan have helped us work out a workflow that will work for us 
decently.  We can push directly to the main repository when we need to, and use 
gerrit for all other times.  Depending on the changes, we'll end up reviewing 
code when it is first pushed to a remote branch, OR later when it is merged 
into master.  That way we only have to review code at a single point, rather 
than multiple.  This requires that everyone our in our team is communicating 
and responsibly pushing for review.  Our team is small so that's no problem for 
us.  But this will be a bigger problem for MediaWiki fo sho.  

Thanks Antoine!

- otto 


http://ottomata.com
http://www.flickr.com/photos/OttomatonA
http://www.couchsurfing.org/people/otto



On Feb 23, 2012, at 1:43 PM, Antoine Musso wrote:

 Le 21/02/12 18:44, Andrew Otto a écrit :
 [~/Projects/wm/analytics/reportcard] (master)[29c6b47]$ git-review
 You have more than one commit that you are about to submit.
 The outstanding commits are:
 
 29c6b47 (HEAD, master) observation.py - comments
 14a771a test commit for git branch push
 73dd606 Buncha mini changes + hackiness to parse a few things.  This really 
 needs more work
 2d37c13 pipeline/user_agent.py - adding comment that this file should not be 
 used
 5892eb8 Adding loader.py - first hacky loader, just so we can get some data 
 into mysql to work with.
 e3fb30b Renaming the concept of variables to 'traits'.  Allowing trait_sets 
 to be specified so that we don't record HUGE amounts of data.
 d0de74b base.py - adding schema in comments.  Got lots of work to do to make 
 this prettier
 328e55d Trying my darndest to clean things up here!  I've cloned a new repo, 
 and am checking in my non-committed (an non-approved?) changes into this new 
 branch.  Hopefully gerrit will be happier with me.
 
 This smells of me doing something really wrong.
 
 It seems to be a git-review safe guard to prevents someone from sending 
 several commits. Each of them would make Gerrit  generates several changes.  
 To those wondering why one would have multiple commits, three use cases come 
 to mind.
 
 I will give you the solution at the end of this post.
 
 
 1) high frequency tradin^H^H^H^H^H^H committing
 
 Some people, me for example, do local commits very often, then squashes them 
 before submitting the final patch.  Git squashing means regrouping multiples 
 commits in just one.  Imagine I have made 4 commits locally, possibly using 
 my mother tongue (french), in such case, using git-review will have me 
 submitting a commit list like:
 
  abcde1 oh mygod
  d30909 variable fun
  f39004 je ne sais plus ce que c'est
  439090 before lunch
 
 f39004 is some French meaning I don't remember what was that which does not 
 describe the commit change (hint: using French will be perfectly valid once 
 we start migrating from English).
 
 Anyway, git-review would generates 4 changes out of the 4 commits above.  Not 
 that helpful is it?
 
 Instead I would have wanted to regroup them and write a nice commit 
 description. For example:
 
 30490 (bug 1234) fix issue in feature foobar
 
 
 2) newbie spamming gerrit
 
 This happen when you first play with Gerrit.
 
 In subversion world, whenever you submit a new patch (svn commit) it is going 
 to be written down in the central repository. You will not be able to change 
 it, hence any subsequent submission based on it are guaranteed it is not 
 going to change.
 
 In git world, as I understand it, each commit as a parent commit. The 
 reference is a sha1 based on the content of the commit. Whenever you change a 
 commit, every children, grand-children .. will have their sha1 recomputed.
 
 Enter in Gerrit world, when we send a commit in a queue, there is no 
 guarantee that commit will end up in the reference repo. It might be amended 
 or simply rejected. So your list of commit will be recomputed and all child / 
 grand children will need to be resubmitted.
 
 Guess that? That will update of all those Gerrit changes, making mass email 
 spam / jenkins rebuild etc.
 
 
 3) mixing features
 
 You could well be mixing two different changes.  Maybe you have made a commit 
 to fix bug 1234 and two days later a fix for bug 6789. Those should really be 
 two different changes

Re: [Wikitech-l] Git + Gerrit is a toughy

2012-02-21 Thread Andrew Otto
 IMO you should ask Ryan to set up direct push access for
 your working branches
Cool, will do.  Does Ryan read this list?

I'm not sure how MediaWiki should work, but maybe Gerrit should be set up like 
this by default?  Either with a master + a production branch, or even just a 
master.  Somewhere where only one branch needs review, and any other branches 
can be created and pushed at will, and review isn't needed until a merge to the 
master or production branch happens.

 Why did you merge master into your branch, rather than merging your
 branch into master? That doesn't make much sense to me.
Hmm, maybe I did this wrong then.  Is this something I should never do with git 
at all, or just with this Gerrit workflow?  Isn't merging from master into my 
branches part of a regular workflow?  Shouldn't I be merging in the code from 
master all the time as I work?


Thanks Roan!  We'll get this ironed out fo sho.

-otto



On Feb 18, 2012, at 2:31 AM, Roan Kattouw wrote:

 On Sat, Feb 18, 2012 at 2:47 AM, Andrew Otto o...@wikimedia.org wrote:
 2. Do I need to rebase every time I push for review?
 
 I don't quite understand what is going on here.  I've installed git-review 
 and am using this to push to git.  It does a rebase by default.  I'm not 
 sure if I should be turning that off or not.  Rebases seem like a bad idea 
 unless you really need to do them. I think git-review is doing a rebase by 
 default so it can squash all of your local commits into one big review 
 commit before pushing. Yuck!  This would surely mean fewer commits to review 
 in Gerrit, but it destroys real the history.  It is making git work more 
 like subversion, where you just work locally until everything is good and 
 then have one big commit.  I should be able to commit often and be able to 
 share my commits with other developers before having everything reviewed.
 
 Yes, you need to rebase before you push. The rebase does not exist to
 squash multiple commits into one, but to ensure that your commit can
 be merged cleanly. This fits the gated trunk model, but it looks like
 you don't necessarily want to gate your working branch at all, just
 your master. IMO you should ask Ryan to set up direct push access for
 your working branches, so you can just git push into them directly,
 bypassing review. You can then merge your branch into master, and
 submit that merge commit for review.
 
 
 3. How does Gerrit handle merges?  Do all merge commits need to be 
 re-approved?
 
 Yes.
 
 4. What should I do in the following situation?
 
 I have a branch I recently made from master.  I've made some changes and 
 pushed them to gerrit.  My changes have been approved.  Now I want to sync 
 master into my branch.  I do
 
  git merge master
 
 Why did you merge master into your branch, rather than merging your
 branch into master? That doesn't make much sense to me.
 
 Then resolve any conflicts and commit.  How should I push these changes?  
 The commits that make up the merge have already been approved in gerrit on 
 the master branch.  Do I need to push for review using git-review?  They've 
 already been approved, so I would think not.  But gerrit will currently not 
 allow me to push without using git-review (is that because the commits need 
 a Change-Id?).
 
 Yes, you need to submit the merge commit for review. If some commits
 don't have a Change-Id, git-review can't submit them, but I don't see
 how that could be the case. You said the commits were already approved
 in gerrit, *and* they don't have a Change-Id? Those things can't both
 be true.
 
 Since gerrit doesn't let me do a regular git push to push my master merge to 
 the remote branch I am tracking, I do git-review.
 Perhaps you should ask for regular pushes to be allowed if you're not
 using the review workflow for that branch, see also above.
 
  This does rebase by default, so for some reason I am stuck having to 
 resolve every single commit that was made to master in order to get the 
 merge to push.  This takes quite a while, but I did it, and once the 
 interactive rebase was finished I was able to git-review to push the merge 
 from master.
 
 Great.  Now I that my branch is in sync with master again, I want to merge 
 it into master.
 
  git checkout master
  git merge my_branch
 
 All good.  Then what?  Since I can't do just 'git push', I try git-review 
 again.  The same thing happens.  I have to run through the whole interactive 
 rebase routine and resolve each of my commits from my_branch manually.  I do 
 that, then run 'git-review' again.  Now I get this error message:
 
 remote: Hint: A potential Change-Id was found, but it was not in the footer 
 of the commit message.
 To ssh://o...@gerrit.wikimedia.org:29418/analytics/reportcard.git
  ! [remote rejected] HEAD - refs/for/master/master (missing Change-Id in 
 commit message)
 error: failed to push some refs to 
 'ssh://o...@gerrit.wikimedia.org:29418/analytics/reportcard.git'
 
 Each of the commits I merged

Re: [Wikitech-l] Git + Gerrit is a toughy

2012-02-21 Thread Andrew Otto
 There is a bug in git that causes merge commits to not automatically
 get Change-IDs. After generating a merge commit, you need to run git
 commit --amend , then save without changing anything. That makes sure
 the commit-msg hook is run and the Change-ID is appended.
Yeah, I tried this, but no luck :(   waah.  I'm googling and trying other 
things to commit, but I'm still a little lost.

Also, any idea why git-review would say this every time I try to commit?  

[~/Projects/wm/analytics/reportcard] (master)[29c6b47]$ git-review
You have more than one commit that you are about to submit.
The outstanding commits are:

29c6b47 (HEAD, master) observation.py - comments
14a771a test commit for git branch push
73dd606 Buncha mini changes + hackiness to parse a few things.  This really 
needs more work
2d37c13 pipeline/user_agent.py - adding comment that this file should not be 
used
5892eb8 Adding loader.py - first hacky loader, just so we can get some data 
into mysql to work with.
e3fb30b Renaming the concept of variables to 'traits'.  Allowing trait_sets to 
be specified so that we don't record HUGE amounts of data.
d0de74b base.py - adding schema in comments.  Got lots of work to do to make 
this prettier
328e55d Trying my darndest to clean things up here!  I've cloned a new repo, 
and am checking in my non-committed (an non-approved?) changes into this new 
branch.  Hopefully gerrit will be happier with me.


This smells of me doing something really wrong.

Thanks!
-Ao


On Feb 18, 2012, at 2:31 AM, Roan Kattouw wrote:

 On Sat, Feb 18, 2012 at 2:47 AM, Andrew Otto o...@wikimedia.org wrote:
 2. Do I need to rebase every time I push for review?
 
 I don't quite understand what is going on here.  I've installed git-review 
 and am using this to push to git.  It does a rebase by default.  I'm not 
 sure if I should be turning that off or not.  Rebases seem like a bad idea 
 unless you really need to do them. I think git-review is doing a rebase by 
 default so it can squash all of your local commits into one big review 
 commit before pushing. Yuck!  This would surely mean fewer commits to review 
 in Gerrit, but it destroys real the history.  It is making git work more 
 like subversion, where you just work locally until everything is good and 
 then have one big commit.  I should be able to commit often and be able to 
 share my commits with other developers before having everything reviewed.
 
 Yes, you need to rebase before you push. The rebase does not exist to
 squash multiple commits into one, but to ensure that your commit can
 be merged cleanly. This fits the gated trunk model, but it looks like
 you don't necessarily want to gate your working branch at all, just
 your master. IMO you should ask Ryan to set up direct push access for
 your working branches, so you can just git push into them directly,
 bypassing review. You can then merge your branch into master, and
 submit that merge commit for review.
 
 
 3. How does Gerrit handle merges?  Do all merge commits need to be 
 re-approved?
 
 Yes.
 
 4. What should I do in the following situation?
 
 I have a branch I recently made from master.  I've made some changes and 
 pushed them to gerrit.  My changes have been approved.  Now I want to sync 
 master into my branch.  I do
 
  git merge master
 
 Why did you merge master into your branch, rather than merging your
 branch into master? That doesn't make much sense to me.
 
 Then resolve any conflicts and commit.  How should I push these changes?  
 The commits that make up the merge have already been approved in gerrit on 
 the master branch.  Do I need to push for review using git-review?  They've 
 already been approved, so I would think not.  But gerrit will currently not 
 allow me to push without using git-review (is that because the commits need 
 a Change-Id?).
 
 Yes, you need to submit the merge commit for review. If some commits
 don't have a Change-Id, git-review can't submit them, but I don't see
 how that could be the case. You said the commits were already approved
 in gerrit, *and* they don't have a Change-Id? Those things can't both
 be true.
 
 Since gerrit doesn't let me do a regular git push to push my master merge to 
 the remote branch I am tracking, I do git-review.
 Perhaps you should ask for regular pushes to be allowed if you're not
 using the review workflow for that branch, see also above.
 
  This does rebase by default, so for some reason I am stuck having to 
 resolve every single commit that was made to master in order to get the 
 merge to push.  This takes quite a while, but I did it, and once the 
 interactive rebase was finished I was able to git-review to push the merge 
 from master.
 
 Great.  Now I that my branch is in sync with master again, I want to merge 
 it into master.
 
  git checkout master
  git merge my_branch
 
 All good.  Then what?  Since I can't do just 'git push', I try git-review 
 again.  The same thing happens.  I have to run through

[Wikitech-l] Git + Gerrit is a toughy

2012-02-17 Thread Andrew Otto
Hi all!

And here's another hi: Hi!  This is my first post to this list, so here is a 
quick intro in case you missed the other ones.  I'm Andrew Otto, an engineering 
on the new Analytics team.  I'm working with David Schoonover (new hire as 
well), Fabian Kaelin, and Diederik van Liere.  Right now we're working on some 
prototypes for the a WikiMedia report card.  

I think we are the first team that is doing active work in git using Gerrit, 
and Robla asked me to reach out here to describe our experiences and ask for 
help.  We're struggling right now to be productive using Gerrit (I spent 3 
hours today just trying to merge a branch), but it could be do to our lack of 
experience with it.  There have been a couple of emails bouncing around to Ryan 
Lane and Roan, but it might be more productive if I made this conversation more 
visible here.  I'll start with some questions.

1. Will Gerrit allow us to create branches without using the web GUI, and 
without having to be a Gerrit admin for a project?

One of the points of using git is to be able to create branches at will.  We're 
finding this very difficult right now, not only because creating requires GUI 
admin access, but because of other reasons explained below.

2. Do I need to rebase every time I push for review?  

I don't quite understand what is going on here.  I've installed git-review and 
am using this to push to git.  It does a rebase by default.  I'm not sure if I 
should be turning that off or not.  Rebases seem like a bad idea unless you 
really need to do them. I think git-review is doing a rebase by default so it 
can squash all of your local commits into one big review commit before pushing. 
Yuck!  This would surely mean fewer commits to review in Gerrit, but it 
destroys real the history.  It is making git work more like subversion, where 
you just work locally until everything is good and then have one big commit.  I 
should be able to commit often and be able to share my commits with other 
developers before having everything reviewed.  

3. How does Gerrit handle merges?  Do all merge commits need to be re-approved?

4. What should I do in the following situation?

I have a branch I recently made from master.  I've made some changes and pushed 
them to gerrit.  My changes have been approved.  Now I want to sync master into 
my branch.  I do

  git merge master

Then resolve any conflicts and commit.  How should I push these changes?  The 
commits that make up the merge have already been approved in gerrit on the 
master branch.  Do I need to push for review using git-review?  They've already 
been approved, so I would think not.  But gerrit will currently not allow me to 
push without using git-review (is that because the commits need a Change-Id?).

Since gerrit doesn't let me do a regular git push to push my master merge to 
the remote branch I am tracking, I do git-review.  This does rebase by default, 
so for some reason I am stuck having to resolve every single commit that was 
made to master in order to get the merge to push.  This takes quite a while, 
but I did it, and once the interactive rebase was finished I was able to 
git-review to push the merge from master.

Great.  Now I that my branch is in sync with master again, I want to merge it 
into master. 

  git checkout master
  git merge my_branch

All good.  Then what?  Since I can't do just 'git push', I try git-review 
again.  The same thing happens.  I have to run through the whole interactive 
rebase routine and resolve each of my commits from my_branch manually.  I do 
that, then run 'git-review' again.  Now I get this error message:

remote: Hint: A potential Change-Id was found, but it was not in the footer of 
the commit message.
To ssh://o...@gerrit.wikimedia.org:29418/analytics/reportcard.git
 ! [remote rejected] HEAD - refs/for/master/master (missing Change-Id in 
commit message)
error: failed to push some refs to 
'ssh://o...@gerrit.wikimedia.org:29418/analytics/reportcard.git'

Each of the commits I merged from my_branch come with their own Change-Id in 
the commit messages.  But these commits are now merge commits (I think?), so 
they have information about the merge and any conflicts in the commit message 
below the original Change-Id.  I think this is confusing Gerrit, because it 
doesn't see the Change-Id in the footer.

Now I'm stuck, I'm really not sure how to push anymore.  I want to get Diederik 
some of my changes, but I can't push them to master.


Thanks for the help everybody!  It sounds like we in Analytics are the 
git+gerrit workflow Guinea pigs, eh?  We're happy to fill this role, but SCMs 
are supposed to streamline and improve work flow, and right now Gerrit is being 
a big ol' nasty nancy.  Help us iron this out so we can keep working!

- otto


http://ottomata.com
http://www.flickr.com/photos/OttomatonA
http://www.couchsurfing.org/people/otto


___
Wikitech-l mailing list
Wikitech-l