19 nov 2015, 11:20 Ilja Heckmann:
> I have a dataset of events with timestamps, and want to extract a
> subset of them so that there is at least, say, an hour between items
> in the result. It would be trivial to solve this in an imperative
> language, e.g. the solution in Python would be:
>
> result = [table[0]]
> for row in table[1:]:
> if row['timestamp'] >= result[-1]['timestamp'] +
> timedelta(hours=1):
> result.append(row)
>
> But I would like to have a solution in pure SQL. The closest I came
> to a solution was this code, with integers instead of datetimes, and
> a minimum distance of 3:
>
> CREATE TABLE test (x INT)
> INSERT INTO test VALUES(1)
> INSERT INTO test VALUES(3)
> INSERT INTO test VALUES(5)
> INSERT INTO test VALUES(6)
> INSERT INTO test VALUES(7)
> INSERT INTO test VALUES(8)
>
> WITH RECURSIVE
> breaks(t) AS (
> SELECT 1
> UNION
> SELECT min(x) FROM test, breaks WHERE x > t + 2 AND x < 10
> )
> SELECT t FROM breaks
>
> The result I would expect would be 1,5,8. The above query worked
> with the sqlite3 executable that ships with OSX (I think, its
> version is around 3.8.11), but when a tried it on a different
> machine with 3.9.2, there was an error having to do with aggregate
> function (min) in the recursive subquery.
An aggregate function can still be used in a sub-query for a column
value. For the example with integers:
WITH RECURSIVE
breaks(t) AS (
SELECT 1
UNION
SELECT (SELECT min(x) FROM test WHERE x > t + 2 AND x < 10)
FROM breaks
)
SELECT t FROM breaks
;
> The solution I thought would work was
>
> WITH RECURSIVE
> breaks(t) AS (
> SELECT 1
> UNION
> SELECT x FROM test, breaks WHERE x > t + 2 LIMIT 1
> )
> SELECT t FROM breaks
>
> with the reasoning being that the recursive subquery would only
> return the first value that satisfies the condition of being greater
> than the last one plus 2, but "LIMIT 1" gets applied the the whole
> statement instead of just the subquery, limiting the output to just
> one item.
>
> A solution can be approximated by grouping items into bins and then
> selecting one value from each bin, but this will lead to some wrong
> results, e.g. in the example above, if we build bins by the division
> result of 3, and then take the smallest value from each bin, we'll
> have:
>
> [1,3], [5,6], [7,8] -> 1,5,7
>
> with 5 and 7 being too close to each other.
>
> Can this be done?
I don't know and have no experience with dates in SQLite.