19 nov 2015, 11:20 Ilja Heckmann:

> I have a dataset of events with timestamps, and want to extract a  
> subset of them so that there is at least, say, an hour between items  
> in the result. It would be trivial to solve this in an imperative  
> language, e.g. the solution in Python would be:
>
> result = [table[0]]
> for row in table[1:]:
>    if row['timestamp'] >= result[-1]['timestamp'] +  
> timedelta(hours=1):
>        result.append(row)
>
> But I would like to have a solution in pure SQL. The closest I came  
> to a solution was this code, with integers instead of datetimes, and  
> a minimum distance of 3:
>
> CREATE TABLE test (x INT)
> INSERT INTO test VALUES(1)
> INSERT INTO test VALUES(3)
> INSERT INTO test VALUES(5)
> INSERT INTO test VALUES(6)
> INSERT INTO test VALUES(7)
> INSERT INTO test VALUES(8)
>
> WITH RECURSIVE
>  breaks(t) AS (
>    SELECT 1
>    UNION
>    SELECT min(x) FROM test, breaks WHERE x > t + 2 AND x < 10
>  )
> SELECT t FROM breaks
>
> The result I would expect would be 1,5,8. The above query worked  
> with the sqlite3 executable that ships with OSX (I think, its  
> version is around 3.8.11), but when a tried it on a different  
> machine with 3.9.2, there was an error having to do with aggregate  
> function (min) in the recursive subquery.

An aggregate function  can still be used in a sub-query for a column  
value.  For the example with integers:

WITH RECURSIVE
  breaks(t) AS (
    SELECT 1
    UNION
    SELECT (SELECT min(x) FROM test WHERE x > t + 2 AND x < 10)
    FROM   breaks
  )
SELECT t FROM breaks
;


> The solution I thought would work was
>
> WITH RECURSIVE
>  breaks(t) AS (
>    SELECT 1
>    UNION
>    SELECT x FROM test, breaks WHERE x > t + 2 LIMIT 1
>  )
> SELECT t FROM breaks
>
> with the reasoning being that the recursive subquery would only  
> return the first value that satisfies the condition of being greater  
> than the last one plus 2, but "LIMIT 1" gets applied the the whole  
> statement instead of just the subquery, limiting the output to just  
> one item.
>
> A solution can be approximated by grouping items into bins and then  
> selecting one value from each bin, but this will lead to some wrong  
> results, e.g. in the example above, if we build bins by the division  
> result of 3, and then take the smallest value from each bin, we'll  
> have:
>
> [1,3], [5,6], [7,8] -> 1,5,7
>
> with 5 and 7 being too close to each other.
>
> Can this be done?
I don't know and have no experience with dates in SQLite.

Reply via email to