Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-04-05 Thread Nicholas Skaggs
Hello Roy, depending on what you are wanting to experiment with, https://paws.wmcloud.org/ might be a good choice. You are correct about Cloud VPS instances. If your work becomes its own project, you are more than welcome to request a new project on Cloud VPS. On Thu, Apr 1, 2021 at 10:26 AM Roy

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-04-01 Thread Roy Smith
I'd like to continue exploring this, just not quite sure of the appropriate way forward. I gather doing work like this on the toolforge bastion hosts is frowned upon, so I guess what I should be doing is spinning up a VPS instance on https://horizon.wikimedia.org/?

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Roy Smith
Thanks for looking into this. I tried this again a little later, and it ran fine. Odd that the amount of memory used depends on the number of rows. I would expect it would stream results to stdout as they came in, but apparently not. Even weirder that the 100M example runs OOM in 10s, while

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Brooke Storm
> On Mar 31, 2021, at 5:18 PM, Roy Smith wrote: > > I'm just playing around on tools-sgebastion-08. I can dump the first 1 > million image names about half a minute: > >> tools.spi-tools-dev:xw-join$ time mysql >> --defaults-file=$HOME/replica.my.cnf -h >>

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Roy Smith
I'm just playing around on tools-sgebastion-08. I can dump the first 1 million image names about half a minute: > tools.spi-tools-dev:xw-join$ time mysql --defaults-file=$HOME/replica.my.cnf > -h commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select > img_name from image limit

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Daniel Schwen
> > I run FastilyBot on a Raspberry Pi, and needless to say it would be > grossly impractical for me to perform a "join" in the bot's code. > Why not run it on WMF Cloud? In code joins will very likely work there and Cloud is supported. You are effectively asking to also support a second way

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Brooke Storm
> On Mar 31, 2021, at 2:20 PM, Roy Smith wrote: > > Is it feasible to do a log analysis of the database servers to find out what > tools are (were?) using cross-wiki joins? At least that would ensure that > all the tool owners could be contacted directly to make sure they know this > is

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Roy Smith
Is it feasible to do a log analysis of the database servers to find out what tools are (were?) using cross-wiki joins? At least that would ensure that all the tool owners could be contacted directly to make sure they know this is happening. > On Mar 31, 2021, at 3:46 PM, Joaquin Oltra

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Joaquin Oltra Hernandez
Hi Fastily, we are aware of the use case for matching commons pages/images/sha1s between commons/big wikis and other wikis, as it has come up many times. I'm cataloging all the comments and examples that have come up in the last 5 months in order to provide categorized input to the parent task

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Kimmo Virtanen
Hi, > This is painful. I think you raised some really good points about cross-joins with Central Auth and Commons as those are *designed* to be cross-referenced from other wikis. But ultimately, if there's no a reasonable way to do it in the software (Maria DB) we have available, implementing

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Huji Lee
I am not being critical of people (namely, the amazing Cloud team) here. I am being critical of decisions. That could even involve much higher level decisions e.g. should WMF have spent more money and hired more resources for this? It could very well be that I am uninformed, and these decisions

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Aaron Halfaker
> > over a path of effort for the Clouds team It seems to me that the Cloud team is putting in all of the effort they can. I'm not sure where they would find more time and energy to implement a better solution. I imagine any better solution wouldn't be a matter of a few extra hours, but rather

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Huji Lee
I said it before, and I say it again: *some* databases should be available for cross-wiki JOIN everywhere. This would at least include commons_p and centralauth_p but perhaps also enwiki_p and meta_p I know that we discussed it before and better long-term solutions can be imagined (such as a data

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Fastily
A little late to the party, I just learned about this change today. I maintain a number of bot tasks and database reports on enwp that rely on

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-15 Thread Dan Andreescu
> > [4] was made to figure out common use cases and possibilities to enable > them again. > ... > [4] https://phabricator.wikimedia.org/T215858 > I just want to highlight this ^ thing Joaquin said and mention that our team (Data Engineering) is also participating in brainstorming ways to bring

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-15 Thread Daniel Schwen
This might be a trivial suggestion (for me it was a game changer): Segment your large queries based on an indexed column. By that I mean, add an additional WHERE clause to process only a small subset of the entire DB (e.g. page_id >= 13 AND page_id < 14) and then loop in your application

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-15 Thread Joaquin Oltra Hernandez
Hi, These changes are not arbitrary, they are a necessity. They are happening because the clusters are out of capacity, they keep having problems with replication lagging and crashes (eg: [1]) and restoring servers takes days where the rest of the cluster remains at increased load. Additionally

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-13 Thread Daniel Schwen
This is the next step after disallowing user databases on replicas. It broke some of my tools but I recently rewrote them to move joining logic into my application. I also replicate small amounts of data (e.g. page titles for a subset of pages) into my user db for joins. I found it quite

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-13 Thread Yetkin Sakal via Cloud
I completely agree with Maarten. It would be a step backward to stop supporting cross-database joins on wiki replicas. This is a breaking change and should not be applied unless a feasible solution to the problem is found. On Saturday, March 13, 2021, 8:17:39 PM GMT+3, Maarten Dammers

[Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-12 Thread Joaquin Oltra Hernandez
TLDR: - Instead of `*.db.svc.eqiad.wmflabs` use `*.db.svc.wikimedia.cloud` to use the new replicas - Quarry will migrate March 23 to use the new cluster - In a ~month (April 15) the old cluster will start retiring. See https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#Timeline