On Wed, Feb 28, 2018 at 5:07 PM, Jaime Crespo <jcre...@wikimedia.org> wrote: > On Wed, Feb 28, 2018 at 5:26 PM, Brad Jorsch (Anomie) <bjor...@wikimedia.org >> wrote: > >> On Wed, Feb 28, 2018 at 8:47 AM, Jaime Crespo <jcre...@wikimedia.org> >> wrote: >> >> > Very recently I have been experimenting with recursive Common Table >> > Expressions [2], which are or will be available on the latest versions of >> > MySQL and MariaDB. >> > >> >> Do the other databases MediaWiki tries to support have that feature? >> > > Actually, MySQL/MariaDB is the *last* database to conform to the sql:1999 > WITH standard: https://modern-sql.com/feature/with#compatibility Even > sqlite suported a limited set of those! > > The good news is that, probably because it arrived last, it got a pretty > feature-full implementation: > https://twitter.com/MarkusWinand/status/852862475699707904 > > >> >> > With a single query on can obtain all titles directly or indirectly in a >> > category: >> > >> > WITH RECURSIVE cte (cl_from, cl_type) AS >> > ( >> > SELECT cl_from, cl_type FROM categorylinks WHERE cl_to = >> > 'Database_management_systems' -- starting category >> > UNION >> > SELECT categorylinks.cl_from, categorylinks.cl_type FROM cte JOIN >> page >> > ON >> > cl_from = page_id JOIN categorylinks ON page_title = cl_to WHERE >> > cte.cl_type >> > = 'subcat' -- subcat addition on each iteration >> > ) >> > SELECT page_title FROM cte JOIN page ON cl_from = page_id WHERE >> > page_namespace = 0 ORDER BY page_title; -- printing only articles in the >> > end >> > , ordered by title >> > >> >> Does that work efficiently on huge categories, or does it wind up fetching >> millions of rows and filesorting? > > > Needs more testing-- that query worked for me well enough on my laptop to > expose it directly on a webrequest (<0.1 s), but I only imported the > categorylinks and page tables, so I was working with memory. Obviously, the > more complex the query, and the more results it returns, the less likely it > is to be able to be exposed to, e.g. a public API. But at least there are > configurable limits on recursivitiy and max execution time. > > Honestly, given it is a new feature, I don't expect mediawiki --which at > the moment has to support 5.5- to embrace it any time soon. However, I > wanted to ask if it was interesting enough to setup some test hosts for > mediawiki to "play" with it --e.g. evaluate performance--, and maybe (?) > some upgraded mariadb/mysql servers for WMF labsdb (for long-running > analytics or gadgets that generates reports). >
I certainly think labsdb users would welcome this feature. Traversing category trees (Or sometimes even the graph of page links) is certainly a common thing that people want to do. -- Brian _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l