I have a dataset of events with timestamps, and want to extract a subset 
of them so that there is at least, say, an hour between items in the 
result. It would be trivial to solve this in an imperative language, 
e.g. the solution in Python would be:

result = [table[0]]
for row in table[1:]:
     if row['timestamp'] >= result[-1]['timestamp'] + timedelta(hours=1):
         result.append(row)

But I would like to have a solution in pure SQL. The closest I came to a 
solution was this code, with integers instead of datetimes, and a 
minimum distance of 3:

CREATE TABLE test (x INT)
INSERT INTO test VALUES(1)
INSERT INTO test VALUES(3)
INSERT INTO test VALUES(5)
INSERT INTO test VALUES(6)
INSERT INTO test VALUES(7)
INSERT INTO test VALUES(8)

WITH RECURSIVE
   breaks(t) AS (
     SELECT 1
     UNION
     SELECT min(x) FROM test, breaks WHERE x > t + 2 AND x < 10
   )
SELECT t FROM breaks

The result I would expect would be 1,5,8. The above query worked with 
the sqlite3 executable that ships with OSX (I think, its version is 
around 3.8.11), but when a tried it on a different machine with 3.9.2, 
there was an error having to do with aggregate function (min) in the 
recursive subquery.

The solution I thought would work was

WITH RECURSIVE
   breaks(t) AS (
     SELECT 1
     UNION
     SELECT x FROM test, breaks WHERE x > t + 2 LIMIT 1
   )
SELECT t FROM breaks

with the reasoning being that the recursive subquery would only return 
the first value that satisfies the condition of being greater than the 
last one plus 2, but "LIMIT 1" gets applied the the whole statement 
instead of just the subquery, limiting the output to just one item.

A solution can be approximated by grouping items into bins and then 
selecting one value from each bin, but this will lead to some wrong 
results, e.g. in the example above, if we build bins by the division 
result of 3, and then take the smallest value from each bin, we'll have:

[1,3], [5,6], [7,8] -> 1,5,7

with 5 and 7 being too close to each other.

Can this be done?

Reply via email to