Hi Peter,

By "schema changes Clemens suggested" I assume you mean replacing the
constraint -

....not exists (select * from edges where child=nodes.id);

with

...where id not in (select child from edges);

For this leaf count query, I need to constrain the result set to exclude
nodes which are parents -

sqlite> CREATE VIEW v_count_leaves as with recursive r(id, top) as (select
id, id from nodes union all select t.id, top from nodes as t, edges as e, r
where e.parent=r.id and t.id=e.child) select top, count(*) from r where
top<>id and id not in (select parent from edges) group by top;
sqlite> .timer on
sqlite> select * from v_count_leaves where top=4465;
4465    1
Run Time: real 75.678 user 75.671875 sys 0.000000

sqlite> select count(*) from nodes;
10000
sqlite> select count(*) from edges;
9986

It is still very slow or did you mean something else?


On 1 January 2018 at 18:04, petern <peter.nichvolo...@gmail.com> wrote:

> Shane.  I sent you a query to work with the crippled schema and index you
> proposed for TABLE edges.
> Clemens then explicitly suggested you correct the schema to have use of
> automatic covering index.
>
> >CREATE TABLE edges(parent not null references nodes, child not null
> >references nodes, primary key(parent, child));
>
> Try your leaf counter again - after making the schema changes Clemens
> suggested.
>
> Peter
>
>
> On Mon, Jan 1, 2018 at 8:13 AM, Shane Dev <devshan...@gmail.com> wrote:
>
> > Hi,
> >
> > I want to the count the number of leaves (descendants without children)
> for
> > each node in a DAG
> >
> > DAG definition -
> >
> > CREATE TABLE nodes(id integer primary key, description text);
> > CREATE TABLE edges(parent not null references nodes, child not null
> > references nodes, primary key(parent, child));
> >
> > My query -
> >
> > CREATE VIEW v_count_leaves as with recursive r(id, top) as (
> > select id, id from nodes
> > union all
> > select t.id, top from nodes as t, edges as e, r where e.parent=r.id and
> > t.id
> > =e.child)
> > select top, count(*) from r where top<>id and id not in (select parent
> from
> > edges where parent=id) group by top;
> >
> > It seems to work but is complex to understand and debug despite my aim to
> > keep it simple as possible, but more importantly - it is very slow when
> > there are more than a few thousand nodes and edges.
> >
> > It there a more efficient (and ideally simpler) way?
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to