19 nov 2015, 11:20 Ilja Heckmann: > I have a dataset of events with timestamps, and want to extract a > subset of them so that there is at least, say, an hour between items > in the result. It would be trivial to solve this in an imperative > language, e.g. the solution in Python would be: > > result = [table[0]] > for row in table[1:]: > if row['timestamp'] >= result[-1]['timestamp'] + > timedelta(hours=1): > result.append(row) > > But I would like to have a solution in pure SQL. The closest I came > to a solution was this code, with integers instead of datetimes, and > a minimum distance of 3: > > CREATE TABLE test (x INT) > INSERT INTO test VALUES(1) > INSERT INTO test VALUES(3) > INSERT INTO test VALUES(5) > INSERT INTO test VALUES(6) > INSERT INTO test VALUES(7) > INSERT INTO test VALUES(8) > > WITH RECURSIVE > breaks(t) AS ( > SELECT 1 > UNION > SELECT min(x) FROM test, breaks WHERE x > t + 2 AND x < 10 > ) > SELECT t FROM breaks > > The result I would expect would be 1,5,8. The above query worked > with the sqlite3 executable that ships with OSX (I think, its > version is around 3.8.11), but when a tried it on a different > machine with 3.9.2, there was an error having to do with aggregate > function (min) in the recursive subquery.
An aggregate function can still be used in a sub-query for a column value. For the example with integers: WITH RECURSIVE breaks(t) AS ( SELECT 1 UNION SELECT (SELECT min(x) FROM test WHERE x > t + 2 AND x < 10) FROM breaks ) SELECT t FROM breaks ; > The solution I thought would work was > > WITH RECURSIVE > breaks(t) AS ( > SELECT 1 > UNION > SELECT x FROM test, breaks WHERE x > t + 2 LIMIT 1 > ) > SELECT t FROM breaks > > with the reasoning being that the recursive subquery would only > return the first value that satisfies the condition of being greater > than the last one plus 2, but "LIMIT 1" gets applied the the whole > statement instead of just the subquery, limiting the output to just > one item. > > A solution can be approximated by grouping items into bins and then > selecting one value from each bin, but this will lead to some wrong > results, e.g. in the example above, if we build bins by the division > result of 3, and then take the smallest value from each bin, we'll > have: > > [1,3], [5,6], [7,8] -> 1,5,7 > > with 5 and 7 being too close to each other. > > Can this be done? I don't know and have no experience with dates in SQLite.