On Mon, Feb 20, 2012 at 05:01:17PM +0000, Simon Slavin scratched on the wall: > > On 20 Feb 2012, at 4:11pm, Puneet Kishor <punk.k...@gmail.com> wrote: > > > is there are way to force the query to return only the "first" match > > so the resulting query has only as many rows as the left table? > > You have two problems: > > A) You are JOINing two columns which contain some null values. null is > interpreted as 'I don't know', so the joins tend to match many > different rows.
No, that's not how it works. If you have a join condition like (A.col == B.col), and either A.col or B.col (or both) is NULL, then the result of the comparison itself is NULL ("Are these values equal?" "I don't know."). JOIN operation will only match rows where the join condition is TRUE. NULL is not "good enough" to match rows. Think about NATURAL JOINs where the two columns are reduced to only one column. This reduction can only take place if the values are exactly the same. > B) You are using a LEFT JOIN which is explicitly defined to include > many matches, not just one match. Instead you might try NATURAL JOIN > or CROSS JOIN to see what difference they make. That's not the issue. Or rather, it is, but that isn't going to show us anything we don't already know. NATURAL JOIN should return the exact same results as example #5: 329743 (assuming "col" is the only column name shared between tables), except it will merge the two "col" columns into a single column. A CROSS JOIN has no join condition and is going to return 2,066,471,848 rows. That tells us nothing. > And you have a meta-problem too: the dataset you're experimenting with > is so big you can't see why it's doing what it's doing. Databases are designed to work with sets. If you want to debug the issue, you need to think in sets as well. Use the database to answer your questions, rather than just scanning the results and doing it in your head. The statement... SELECT COUNT(*) as ct, col FROM a GROUP BY 2 HAVING ct > 1 ...will tell you which rows have duplicate columns. You can run this on either table. If you want to have unique values in the JOIN, the best approach is to put a unique index across each "col" column, so that the columns have unique values. Failing that, replace the table reference in example #6 with something like "(SELECT * FROM a GROUP BY col) as a". That's going to give you random rows (there is no concept of a "first" row), but it will get you the row count you're looking for. You might need to do it with both tables, depending on which table(s) has the duplicate "col" values. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users