Re: [Wikitech-l] Recursive Common Table expressions @ Wikimedia [was Fwd: [Wikimedia-l] What's making you happy this week? (Week of 18 February 2018)]

2018-02-28 Thread Stas Malyshev
Hi!

> None of these features are present on the minimum required versions of
> Mediawiki, or the latest version available on WMF servers-- but I wonder if
> people- Mediawiki hackers and Tools creators- would be interested on doing
> those?

It would be interesting to see how this can work in deepcategory
searches - we now have a keyword for it (driven by SPARQL for now) and
what would happen if it is ported to SQL. If we get it on labs db
replicas, we could set up mediawiki so that we could test how good is it
on real data. Thanks for posting about it!

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Recursive Common Table expressions @ Wikimedia [was Fwd: [Wikimedia-l] What's making you happy this week? (Week of 18 February 2018)]

2018-02-28 Thread bawolff
On Wed, Feb 28, 2018 at 5:07 PM, Jaime Crespo  wrote:
> On Wed, Feb 28, 2018 at 5:26 PM, Brad Jorsch (Anomie) > wrote:
>
>> On Wed, Feb 28, 2018 at 8:47 AM, Jaime Crespo 
>> wrote:
>>
>> > Very recently I have been experimenting with recursive Common Table
>> > Expressions [2], which are or will be available on the latest versions of
>> > MySQL and MariaDB.
>> >
>>
>> Do the other databases MediaWiki tries to support have that feature?
>>
>
> Actually, MySQL/MariaDB is the *last* database to conform to the sql:1999
> WITH standard: https://modern-sql.com/feature/with#compatibility Even
> sqlite suported a limited set of those!
>
> The good news is that, probably because it arrived last, it got a pretty
> feature-full implementation:
> https://twitter.com/MarkusWinand/status/852862475699707904
>
>
>>
>> > With a single query on can obtain all titles directly or indirectly in a
>> > category:
>> >
>> > WITH RECURSIVE cte (cl_from, cl_type) AS
>> > (
>> > SELECT cl_from, cl_type FROM categorylinks WHERE cl_to =
>> > 'Database_management_systems' -- starting category
>> > UNION
>> > SELECT categorylinks.cl_from, categorylinks.cl_type FROM cte JOIN
>> page
>> > ON
>> > cl_from = page_id JOIN categorylinks ON page_title = cl_to WHERE
>> > cte.cl_type
>> > = 'subcat' -- subcat addition on each iteration
>> > )
>> > SELECT page_title FROM cte JOIN page ON cl_from = page_id WHERE
>> > page_namespace = 0 ORDER BY page_title; -- printing only articles in the
>> > end
>> > , ordered by title
>> >
>>
>> Does that work efficiently on huge categories, or does it wind up fetching
>> millions of rows and filesorting?
>
>
> Needs more testing-- that query worked for me well enough on my laptop to
> expose it directly on a webrequest (<0.1 s), but I only imported the
> categorylinks and page tables, so I was working with memory. Obviously, the
> more complex the query, and the more results it returns, the less likely it
> is to be able to be exposed to, e.g. a public API. But at least there are
> configurable limits on recursivitiy and max execution time.
>
> Honestly, given it is a new feature, I don't expect mediawiki --which at
> the moment has to support 5.5- to embrace it any time soon. However, I
> wanted to ask if it was interesting enough to setup some test hosts for
> mediawiki to "play" with it --e.g. evaluate performance--, and maybe (?)
> some upgraded mariadb/mysql servers for WMF labsdb (for long-running
> analytics or gadgets that generates reports).
>

I certainly think labsdb users would welcome this feature. Traversing
category trees (Or sometimes even the graph of page links) is
certainly a common thing that people want to do.

--
Brian

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Recursive Common Table expressions @ Wikimedia [was Fwd: [Wikimedia-l] What's making you happy this week? (Week of 18 February 2018)]

2018-02-28 Thread Jaime Crespo
On Wed, Feb 28, 2018 at 5:26 PM, Brad Jorsch (Anomie)  wrote:

> On Wed, Feb 28, 2018 at 8:47 AM, Jaime Crespo 
> wrote:
>
> > Very recently I have been experimenting with recursive Common Table
> > Expressions [2], which are or will be available on the latest versions of
> > MySQL and MariaDB.
> >
>
> Do the other databases MediaWiki tries to support have that feature?
>

Actually, MySQL/MariaDB is the *last* database to conform to the sql:1999
WITH standard: https://modern-sql.com/feature/with#compatibility Even
sqlite suported a limited set of those!

The good news is that, probably because it arrived last, it got a pretty
feature-full implementation:
https://twitter.com/MarkusWinand/status/852862475699707904


>
> > With a single query on can obtain all titles directly or indirectly in a
> > category:
> >
> > WITH RECURSIVE cte (cl_from, cl_type) AS
> > (
> > SELECT cl_from, cl_type FROM categorylinks WHERE cl_to =
> > 'Database_management_systems' -- starting category
> > UNION
> > SELECT categorylinks.cl_from, categorylinks.cl_type FROM cte JOIN
> page
> > ON
> > cl_from = page_id JOIN categorylinks ON page_title = cl_to WHERE
> > cte.cl_type
> > = 'subcat' -- subcat addition on each iteration
> > )
> > SELECT page_title FROM cte JOIN page ON cl_from = page_id WHERE
> > page_namespace = 0 ORDER BY page_title; -- printing only articles in the
> > end
> > , ordered by title
> >
>
> Does that work efficiently on huge categories, or does it wind up fetching
> millions of rows and filesorting?


Needs more testing-- that query worked for me well enough on my laptop to
expose it directly on a webrequest (<0.1 s), but I only imported the
categorylinks and page tables, so I was working with memory. Obviously, the
more complex the query, and the more results it returns, the less likely it
is to be able to be exposed to, e.g. a public API. But at least there are
configurable limits on recursivitiy and max execution time.

Honestly, given it is a new feature, I don't expect mediawiki --which at
the moment has to support 5.5- to embrace it any time soon. However, I
wanted to ask if it was interesting enough to setup some test hosts for
mediawiki to "play" with it --e.g. evaluate performance--, and maybe (?)
some upgraded mariadb/mysql servers for WMF labsdb (for long-running
analytics or gadgets that generates reports).

-- 
Jaime Crespo

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Recursive Common Table expressions @ Wikimedia [was Fwd: [Wikimedia-l] What's making you happy this week? (Week of 18 February 2018)]

2018-02-28 Thread Brad Jorsch (Anomie)
On Wed, Feb 28, 2018 at 8:47 AM, Jaime Crespo  wrote:

> Very recently I have been experimenting with recursive Common Table
> Expressions [2], which are or will be available on the latest versions of
> MySQL and MariaDB.
>

Do the other databases MediaWiki tries to support have that feature?


> With a single query on can obtain all titles directly or indirectly in a
> category:
>
> WITH RECURSIVE cte (cl_from, cl_type) AS
> (
> SELECT cl_from, cl_type FROM categorylinks WHERE cl_to =
> 'Database_management_systems' -- starting category
> UNION
> SELECT categorylinks.cl_from, categorylinks.cl_type FROM cte JOIN page
> ON
> cl_from = page_id JOIN categorylinks ON page_title = cl_to WHERE
> cte.cl_type
> = 'subcat' -- subcat addition on each iteration
> )
> SELECT page_title FROM cte JOIN page ON cl_from = page_id WHERE
> page_namespace = 0 ORDER BY page_title; -- printing only articles in the
> end
> , ordered by title
>

Does that work efficiently on huge categories, or does it wind up fetching
millions of rows and filesorting?

-- 
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Recursive Common Table expressions @ Wikimedia [was Fwd: [Wikimedia-l] What's making you happy this week? (Week of 18 February 2018)]

2018-02-28 Thread Jaime Crespo
I was checking sixdegreesofwikipedia.com [0] and I saw that it implements
an application-driven breath-first search [1], like many other gadgets for
Wikipedia.

Very recently I have been experimenting with recursive Common Table
Expressions [2], which are or will be available on the latest versions of
MySQL and MariaDB.

With a single query on can obtain all titles directly or indirectly in a
category:

WITH RECURSIVE cte (cl_from, cl_type) AS
(
SELECT cl_from, cl_type FROM categorylinks WHERE cl_to =
'Database_management_systems' -- starting category
UNION
SELECT categorylinks.cl_from, categorylinks.cl_type FROM cte JOIN page ON
cl_from = page_id JOIN categorylinks ON page_title = cl_to WHERE cte.cl_type
= 'subcat' -- subcat addition on each iteration
)
SELECT page_title FROM cte JOIN page ON cl_from = page_id WHERE
page_namespace = 0 ORDER BY page_title; -- printing only articles in the end
, ordered by title

(it is more complex than needed because table denormalization, other
examples would be much simpler)

Thanks to CTEs, we can traverse hierarchies, without the need of external
tools, in a single SQL query and much more efficiently-- it doesn't need an
external application.

None of these features are present on the minimum required versions of
Mediawiki, or the latest version available on WMF servers-- but I wonder if
people- Mediawiki hackers and Tools creators- would be interested on doing
those?

[0] 
[1] <
https://github.com/jwngr/sdow/blob/master/sdow/breadth_first_search.py#L36>
[2] <
https://dbahire.com/mysql-8-0-new-features-in-real-life-applications-roles-and-recursive-ctes/
>


-- Forwarded message --
From: mathieu stumpf guntz 
Date: Tue, Feb 27, 2018 at 11:17 AM
Subject: Re: [Wikitech-l] [Wikimedia-l] What's making you happy this week?
(Week of 18 February 2018)
To: Wikimedia Mailing List , Pine W <
wiki.p...@gmail.com>, "wikitech-l@lists.wikimedia.org" <
wikitech-l@lists.wikimedia.org>


What's making me happy this week is joining the "Telegrafo" discussion for
ELISo  and I also just found
Six Degrees of Wikipedia .


Le 18/02/2018 à 23:12, Pine W a écrit :

> What's making me happy this week is Isarra's persistence in working on the
> Timeless skin. Timeless is based on Winter. [0] [1]
>
> For anyone who would like to try Timeless, it's available in Preferences
> under Appearance / Skin.
>
> What's making you happy this week?
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
> [0] https://www.mediawiki.org/wiki/Skin:Timeless
> [1] https://www.mediawiki.org/wiki/Winter
> ___
> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wik
> i/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: wikimedi...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l