Re: [sqlite] Search query alternatives.

Michael Falconer Wed, 15 Oct 2014 17:40:14 -0700

Thanks Stephen,

good point, I was just after some general results, I do take your point
about caching etc. However it is logical to apply UNION ALL if appropriate
in preference to UNION which in this context is a bit lazy. I'm a bit
annoyed I didn't pick up on it myself, but thankful to Igor for reminding
me of the difference in the two statements.



On 16 October 2014 11:14, Stephen Chrzanowski <pontia...@gmail.com> wrote:

> Careful with the timing.  You may be looking at OS memory caching the
> result set instead of pulling from the drive.  For best bets, either re-run
> both queries several times, ditch the longest and shortest times, then take
> the mean or average times and do the comparison that way.
>
> On Wed, Oct 15, 2014 at 7:54 PM, Michael Falconer <
> michael.j.falco...@gmail.com> wrote:
>
> > Igor,
> >
> > Nice one. A quick test using 10 lookup trees and the same search
> criteria:
> > *Showing rows 0 - 29 (30 total, Query took 0.4838 sec)*
> >
> > Now with UNION ALL replacing UNION:
> > *Showing rows 0 - 29 (30 total, Query took 0.2050 sec)*
> >
> > You weren't kidding about cheaper were you? LOL
> >
> >
> > On 16 October 2014 10:30, Michael Falconer <michael.j.falco...@gmail.com
> >
> > wrote:
> >
> > > Yes, I agree with the general sentiment. It is not exactly the design I
> > > would have chosen either, but it is what exists. Design change is
> > occurring
> > > but for the moment we are stuck with the current schema.
> > >
> > > If you cannot change the schemata to be more suitable, then your
> > >> demonstrated queries are very near as good as it gets. I can think of
> > other
> > >> ways to achieve the same, but nothing that would be more efficient to
> a
> > >> tangible level. Maybe someone else here can come up with something a
> bit
> > >> more tight.
> > >>
> > > You are preaching to the converted I'm afraid, but I appreciate the
> > points
> > > you made. The quoted paragraph probably answers most of my questions
> > > really, except for this from Igor:
> > >
> > > Unless you do expect duplicates and need to eliminate them, use UNION
> ALL
> > >> - it's much cheaper (this is assuming you insist on keeping multiple
> > >> tables).
> > >>
> > > Cheaper because it drops the operation to remove the duplicate records
> > > from the result set? I'm assuming. I'll give that a try.
> > >
> > > On 16 October 2014 10:05, RSmith <rsm...@rsweb.co.za> wrote:
> > >
> > >>
> > >> On 2014/10/16 00:05, Michael Falconer wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> first off I must start with an apology. I know I'm sort of doing the
> > >>> wrong//...
> > >>>
> > >>
> > >> No need to apologise, this flies quite close to the central theme.
> > >> Whether you are using SQLite or any other SQL RDBMS, this is horrible
> DB
> > >> design and it is so for precisely the reasons you are asking the
> > question.
> > >> Tables should not be dynamic: The simple rule of thumb being - if you
> > >> cannot tell (or at least accurately predict) before-hand exactly how
> > many
> > >> tables will be in the final system, then you are doing it wrong.  Why
> > not
> > >> have one set of tables and in stead of preceding each of them with a
> > >> <prefix>, simply add a column that can host the <prefix> as a simple
> > value,
> > >> which will immediately make your life very much easier and get the SQL
> > >> engine to do the work you are now trying to compensate for in a
> > lop-sided
> > >> manual kind of way.
> > >>
> > >> I can only imagine the amount of code you invested into your system to
> > >> track and deal with these dynamic sets of tables, so I know changing
> it
> > >> will seem like a rather large undertaking, but really it will mostly
> > >> involve removing loads of code to end up with a small set of simple
> > queries
> > >> that does all the work for you.
> > >>
> > >> The only reason I can imagine this sort of breakdown useful is if your
> > >> tree tables are all really really huge, like Gigabytes, and so there
> > might
> > >> be some efficiency to be gained from splitting it up, but I doubt this
> > is
> > >> the case.
> > >>
> > >> Just imagine the ease of those search queries you demonstrated... one
> > >> query, no UNIONs, a single set of joins and an extra where-clause
> > check...
> > >> Add to that increased efficiency at finding results (run one query in
> > stead
> > >> of 50).  Easy-mode. As a proof-of-concept, just make an SQLite DB, one
> > set
> > >> of tables as per your list, add column(s) for prefixing as needed,
> > import
> > >> all the tables from some large set of them (specifying additionally
> the
> > >> column for the prefix) and ten run search queries on them. I am
> willing
> > to
> > >> bet on the fact it will be much quicker in addition to being much
> > simpler.
> > >>
> > >> If you cannot change the schemata to be more suitable, then your
> > >> demonstrated queries are very near as good as it gets. I can think of
> > other
> > >> ways to achieve the same, but nothing that would be more efficient to
> a
> > >> tangible level. Maybe someone else here can come up with something a
> bit
> > >> more tight.
> > >>
> > >>
> > >>  thing here as this question is NOT related to sqlite. It is a general
> > SQL
> > >>> question but I ask it here because I have great respect for the
> answers
> > >>> and
> > >>> discussions I have seen on this forum over many years. I rarely post
> > >>> myself
> > >>> as there are always several contributors who beat me to the answer
> and
> > >>> often their response is far better than mine would have been. I'm
> not a
> > >>> code leper, I don't need the actual SQL just the method really,
> though
> > a
> > >>> short code example would be well received for illustration.
> > >>>
> > >>> It's about a search performed on multiple tables. However the
> structure
> > >>> of
> > >>> this database is somewhat unorthodox. It contains genealogical data
> and
> > >>> this is clustered into trees. Each tree has 8 tables, there can be as
> > >>> many
> > >>> as 100 trees but most installations of the associated application
> > >>> software
> > >>> contain between 5 - 50 trees. These 8 tables contain a family trees
> > data
> > >>> and are named:
> > >>> <prefix>addresses
> > >>> <prefix>connections
> > >>> <prefix>events
> > >>> <prefix>family
> > >>> <prefix>person
> > >>> <prefix>repositories
> > >>> <prefix>sources
> > >>> <prefix>texts
> > >>>
> > >>> The <prefix> changes and is unique for each tree.
> > >>>
> > >>> There are extensive search options offered to users as well as simple
> > >>> name
> > >>> searching and it is one of the best features of the app. It works
> > pretty
> > >>> well, so it ain't broke in any sense, we just wonder if there is a
> > better
> > >>> way to perform this search in SQL. Is there a general technique which
> > is
> > >>> superior either in speed, efficiency or load bearing contexts?
> > >>>
> > >>> I am sure you can see one of the pitfalls here is the exponential
> > growth
> > >>> of
> > >>> such a search query as both total number of trees and indeed user
> > search
> > >>> criteria increase. For each criteria component, and there are quite a
> > >>> few,
> > >>> the appropriate tables must be queried and results joined.
> > >>>
> > >>> Searches return records of individuals meeting the entered search
> > >>> criteria
> > >>> so the query focuses on the person table as it's anchor, performs
> > >>> searches
> > >>> on required other tables in the tree and joins the results to the
> > person
> > >>> data. The results from each tree search are then UNION'ed to provide
> > the
> > >>> result set. Here is a contrived example of the SQL query code which
> > >>> should
> > >>> make things clearer.
> > >>>
> > >>> The user enters simple search criteria - any person with a last name
> > >>> containing 'mac' and a first name containing the character 'a'. This
> is
> > >>> the
> > >>> resulting query (generated by php code), which searches a small 4
> > family
> > >>> tree installation.
> > >>>
> > >>> (SELECT humo1_person.*, event_kind, event_event, address_place,
> > >>> address_zip
> > >>>      FROM humo1_person
> > >>>      LEFT JOIN humo1_events ON pers_gedcomnumber=event_person_id
> > >>>      LEFT JOIN humo1_addresses ON pers_gedcomnumber=address_person_id
> > >>>          WHERE pers_lastname LIKE '%mac%'
> > >>>            AND (pers_firstname LIKE '%a%'
> > >>>                OR (event_kind='name' AND event_event LIKE '%a%')
> > >>>            )
> > >>>      GROUP BY pers_gedcomnumber
> > >>> )
> > >>> UNION
> > >>>      (SELECT humo2_person.*, event_kind, event_event, address_place,
> > >>> address_zip
> > >>>          FROM humo2_person LEFT JOIN humo2_events ON
> > >>> pers_gedcomnumber=event_person_id
> > >>>          LEFT JOIN humo2_addresses ON pers_gedcomnumber=address_
> > >>> person_id
> > >>>              WHERE pers_lastname LIKE '%mac%'
> > >>>                AND (pers_firstname LIKE '%a%'
> > >>>                    OR (event_kind='name' AND event_event LIKE '%a%')
> )
> > >>>          GROUP BY pers_gedcomnumber
> > >>>      )
> > >>> UNION
> > >>>      (SELECT humo3_person.*, event_kind, event_event, address_place,
> > >>> address_zip
> > >>>          FROM humo3_person LEFT JOIN humo3_events ON
> > >>> pers_gedcomnumber=event_person_id
> > >>>          LEFT JOIN humo3_addresses ON pers_gedcomnumber=address_
> > >>> person_id
> > >>>              WHERE pers_lastname LIKE '%mac%'
> > >>>                AND (pers_firstname LIKE '%a%'
> > >>>                    OR (event_kind='name' AND event_event LIKE '%a%')
> > >>>                )
> > >>>          GROUP BY pers_gedcomnumber
> > >>>      )
> > >>> UNION
> > >>>      (SELECT humo4_person.*, event_kind, event_event, address_place,
> > >>> address_zip
> > >>>          FROM humo4_person
> > >>>          LEFT JOIN humo4_events ON pers_gedcomnumber=event_person_id
> > >>>          LEFT JOIN humo4_addresses ON pers_gedcomnumber=address_
> > >>> person_id
> > >>>              WHERE pers_lastname LIKE '%mac%'
> > >>>                AND (pers_firstname LIKE '%a%'
> > >>>                    OR (event_kind='name' AND event_event LIKE '%a%')
> > >>>                )
> > >>>          GROUP BY pers_gedcomnumber
> > >>>      )
> > >>>      ORDER BY pers_lastname ASC , pers_firstname ASC LIMIT 0,30
> > >>>
> > >>> Any thoughts? Suggestions? Missiles? Good approach, bad approach, or
> > >>> completely off the grid? I do use sqlite quite a bit, but not on this
> > >>> particular project.
> > >>>
> > >>>
> > >>>
> > >> _______________________________________________
> > >> sqlite-users mailing list
> > >> sqlite-users@sqlite.org
> > >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> > >>
> > >
> > >
> > >
> > > --
> > > Regards,
> > >      Michael.j.Falconer.
> > >
> >
> >
> >
> > --
> > Regards,
> >      Michael.j.Falconer.
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Regards,
     Michael.j.Falconer.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Search query alternatives.

Reply via email to