Re: [Analytics] [Research-Internal] Tutorials on disk space usage for notebook/stat boxes

2020-02-18 Thread Neil Shah-Quinn
Thank you very much, Luca! To make this nice documentation easier to discover, I moved it to Analytics/Systems/Clients along with the other information on the clients from Analytics/Data access. On Tue, 18 Feb 2020 at 17:11, Isaac

Re: [Analytics] [Wikimedia Research Showcase] February 19, 2020: The Humans and Bots of Wikipedia and Wikidata

2020-02-18 Thread Janna Layton
Just a reminder that this Research Showcase will be happening tomorrow. On Thu, Feb 13, 2020 at 1:32 PM Janna Layton wrote: > Hi all, > > The next Research Showcase will be live-streamed on Wednesday, February > 19, at 9:30 AM PST/17:30 UTC. We’ll have presentations from Jeffrey V. > Nickerson

Re: [Analytics] [Research-Internal] Tutorials on disk space usage for notebook/stat boxes

2020-02-18 Thread Isaac Johnson
Thanks for pulling together these directions Luca! I did a little clean-up and will try to remember to do so more routinely. Adding to what Diego said, I also started using stat1007 because it has the most access to resources (dumps, Hadoop, MariaDB), and then my virtual environments, config

Re: [Analytics] [Research-Internal] Tutorials on disk space usage for notebook/stat boxes

2020-02-18 Thread Andrew Otto
I added a 'GPU?' column too. :) THANKS LUCA! On Tue, Feb 18, 2020 at 11:51 AM Luca Toscano wrote: > Hey Diego, > > added a section at the end of the page with the info requested, let me > know if anything is missing :) > > Luca > > Il giorno mar 18 feb 2020 alle ore 17:37 Diego Saez-Trumper <

Re: [Analytics] [Research-Internal] Tutorials on disk space usage for notebook/stat boxes

2020-02-18 Thread Luca Toscano
Hey Diego, added a section at the end of the page with the info requested, let me know if anything is missing :) Luca Il giorno mar 18 feb 2020 alle ore 17:37 Diego Saez-Trumper < di...@wikimedia.org> ha scritto: > Thanks for this Luca. > > I tend to use stat1007 because I know that machine

Re: [Analytics] [Research-Internal] Tutorials on disk space usage for notebook/stat boxes

2020-02-18 Thread Diego Saez-Trumper
Thanks for this Luca. I tend to use stat1007 because I know that machine has a lot of ram/cpu and HDFS access. From other statsX I'm not sure which of them have what resources (I know at least one of them doesn't have HDFS access). There is a table where I can look at a summary of resources per

Re: [Analytics] [Wiki-research-l] Announcement - Mediawiki History Dumps

2020-02-18 Thread Joseph Allemandou
Hi Giovanni, The pagelinks table is great for temporal snapshots: you know about links between pages at the time of the query. Parsing the wikitext is needed to provide an historical view of the links :) Cheers Joseph On Tue, Feb 18, 2020 at 12:22 AM Giovanni Luca Ciampaglia wrote: > Thank you

Re: [Analytics] [Research-Internal] Tutorials on disk space usage for notebook/stat boxes

2020-02-18 Thread Marcel Ruiz Forns
Looks great Luca! Handy commands... On Tue, Feb 18, 2020 at 8:53 AM Luca Toscano wrote: > Hi everybody! > > I created the following doc: > https://wikitech.wikimedia.org/wiki/Analytics/Tutorials/Analytics_Client_Nodes > > It contains two FAQ: > - How do I ensure that there is enough space on