Hi David,

Nice work! your query is far quicker than mine- even without the
reverseEdges index. I think you are right about the problem of potentially
double counting leaves. There weren't any multi-parent nodes in my test
data so I didn't notice this mistake.

Could you please explain why your query is so much faster?

On 2 January 2018 at 17:50, David Raymond <david.raym...@tomtom.com> wrote:

> I think you need a union there instead of a union all. Otherwise you're
> double (or more) counting leaves where there is more than 1 path to get to
> the leaf.
>
> I don't have a large dataset to test it on, but how about something like:
>
> create table nodes
> (
>   id integer primary key,
>   description text
> );
>
> create table edges
> (
>   parent int not null references nodes,
>   child int not null references nodes,
>   primary key (parent, child),
>   check (parent != child)
> ) without rowid;
> create index reverseEdges on edges (child, parent);
>
> create view leafCounts as with recursive
> leaves (id) as (
>   select nodes.id
>   from nodes left outer join edges
>   on nodes.id = edges.parent
>   where edges.parent is null
> ),
> paths (parent, child) as (
>   select parent, child from edges
>   union
>   select paths.parent, edges.child
>   from paths inner join edges
>   on paths.child = edges.parent
> )
> select parent, count(*) as leafCount
> from paths
> where child in leaves
> group by parent;
>
>
> -----Original Message-----
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Shane Dev
> Sent: Monday, January 01, 2018 11:14 AM
> To: SQLite mailing list
> Subject: [sqlite] Efficient query to count number of leaves in a DAG.
>
> Hi,
>
> I want to the count the number of leaves (descendants without children) for
> each node in a DAG
>
> DAG definition -
>
> CREATE TABLE nodes(id integer primary key, description text);
> CREATE TABLE edges(parent not null references nodes, child not null
> references nodes, primary key(parent, child));
>
> My query -
>
> CREATE VIEW v_count_leaves as with recursive r(id, top) as (
> select id, id from nodes
> union all
> select t.id, top from nodes as t, edges as e, r where e.parent=r.id and
> t.id
> =e.child)
> select top, count(*) from r where top<>id and id not in (select parent from
> edges where parent=id) group by top;
>
> It seems to work but is complex to understand and debug despite my aim to
> keep it simple as possible, but more importantly - it is very slow when
> there are more than a few thousand nodes and edges.
>
> It there a more efficient (and ideally simpler) way?
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to