Grouping or Window

1. Window functions plus subqueries

Count the steps to form groups, similar to Evan's idea, with modifications and fixes:

SELECT id_type
     , min(date) AS begin
     , max(date) AS end
     , count(*)  AS row_ct  -- optional addition
FROM  (
   SELECT date, id_type, count(step OR NULL) OVER (ORDER BY date) AS grp
   FROM  (
      SELECT date, id_type
           , lag(id_type, 1, id_type) OVER (ORDER BY date) <> id_type AS step
      FROM   tmp
      ) sub1
   ) sub2
GROUP  BY id_type, grp
ORDER  BY min(date);

This assumes involved columns are NOT NULL. Else you need to do more.

Also assuming date to be defined UNIQUE, else you need to add a tiebreaker to the ORDER BY clauses get deterministic results. Like: ORDER BY date, id.

Detailed explanation (answer to very similar question):

Select longest continuous sequence

Note in particular:

In related cases, lag() with 3 parameters can be essential to cover the corner case of the first (or last) row elegantly. (The 3rd param is used as default if there is no previous (next) row.
```
lag(id_type, 1, id_type) OVER ()
```
Since we are only interested in an actual change of id_type (TRUE), it does not matter in this particular case. NULL and FALSE both don't count as step.
count(step OR NULL) OVER (ORDER BY date) is the shortest syntax that also works in Postgres 9.3 or older. count() only counts non-null values ...

In modern Postgres, the cleaner, equivalent syntax would be:
```
count(step) FILTER (WHERE step) OVER (ORDER BY date)
```
Details:
- For absolute performance, is SUM faster or COUNT?

2. Subtract two window functions, one subquery

Similar to Erik's idea with modifications:

SELECT min(date) AS begin
     , max(date) AS end
     , id_type
FROM  (
   SELECT date, id_type
        , row_number() OVER (ORDER BY date)
        - row_number() OVER (PARTITION BY id_type ORDER BY date) AS grp
   FROM   tmp
   ) sub
GROUP  BY id_type, grp
ORDER  BY min(date);

If date is defined UNIQUE, like I mention above (you never clarified), dense_rank() would be pointless, since the result is the same as for row_number() and the latter is substantially cheaper.

If date is not defined UNIQUE (and we don't know that the only duplicates are on (date, id_type)), all of these queries are pointless, since the result is arbitrary.

Also, a subquery is typically cheaper than a CTE in Postgres. Only use CTEs when you need them.

3. Top performance with plpgsql function

Since this question has become unexpectedly popular, I'll add another solution to demonstrate top performance.

SQL has many sophisticated tools to create solutions with short and elegant syntax. But a declarative language has its limits for more complex requirements that involve procedural elements.

A server-side procedural function is faster for this than anything posted so far because it only needs a single sequential scan over the table and a single sort operation. If a fitting index is available, even just a single index-only scan.

CREATE OR REPLACE FUNCTION f_tmp_groups()
  RETURNS TABLE (id_type int, grp_begin timestamp, grp_end timestamp) AS
$func$
DECLARE
   _row  tmp;                       -- use table type for row variable
BEGIN
   FOR _row IN
      TABLE tmp ORDER BY date       -- add more columns to make order deterministic
   LOOP
      CASE _row.id_type = id_type 
      WHEN TRUE THEN                -- same group continues
         grp_end := _row.date;      -- remember last date so far
      WHEN FALSE THEN               -- next group starts
         RETURN NEXT;               -- return result for last group
         id_type   := _row.id_type;
         grp_begin := _row.date;
         grp_end   := _row.date;
      ELSE                          -- NULL for 1st row
         id_type   := _row.id_type; -- remember row data for starters
         grp_begin := _row.date;
         grp_end   := _row.date;
      END CASE;
   END LOOP;

   RETURN NEXT;                     -- return last result row      
END
$func$ LANGUAGE plpgsql;

Call:

SELECT * FROM f_tmp_groups();

Test with:

EXPLAIN (ANALYZE, TIMING OFF)  -- to focus on total performance
SELECT * FROM  f_tmp_groups();

You could make the function generic with polymorphic types and pass table type and column names. Details:

Refactor a PL/pgSQL function to return the output of various SELECT queries

If you don't want to or cannot persist a function for this, it would even pay to create a temporary function on the fly. Costs a few ms.

How to create a temporary function in PostgreSQL?

dbfiddle for Postgres 9.6, comparing performance for all three.Building on Jack's test case, modified.

dbfiddle for Postgres 8.4, where performance differences are even bigger.

You can do this as a simple subtraction of ROW_NUMBER() operations (or if your dates are not unique, though still unique per id_type, then you can use DENSE_RANK() instead, though it will be a more expensive query):

WITH IdTypes AS (
   SELECT
      date,
      id_type,
      Row_Number() OVER (ORDER BY date)
         - Row_Number() OVER (PARTITION BY id_type ORDER BY date)
         AS Seq
   FROM
      tmp
)
SELECT
   Min(date) AS begin,
   Max(date) AS end,
   id_type
FROM IdTypes
GROUP BY id_type, Seq
ORDER BY begin
;

See this work at DB Fiddle (or see the DENSE_RANK version)

Result:

begin                  end                    id_type
---------------------  ---------------------  -------
2017-01-10 07:19:21    2017-01-10 07:19:25    3
2017-01-10 07:19:26    2017-01-10 07:19:26    5
2017-01-10 07:19:27.1  2017-01-10 07:19:27.1  3
2017-01-10 07:19:28    2017-01-10 07:19:29    5
2017-01-10 07:19:30.1  2017-01-10 07:19:30.1  3
2017-01-10 07:19:31    2017-01-10 07:19:31    5
2017-01-10 07:19:32    2017-01-10 07:19:32    3
2017-01-10 07:19:33.1  2017-01-10 07:19:37.1  5

Logically, you can think of this as a simple DENSE_RANK() with a PREORDER BY, that is, you want the DENSE_RANK of all the items that are ranked together, and you want them ordered by the dates, you just have to deal with the pesky problem of the fact that at each change in the date, DENSE_RANK will increment. You do that by using the expression as I showed you above. Imagine if you had this syntax: DENSE_RANK() OVER (PREORDER BY date, ORDER BY id_type) where the PREORDER is excluded from the ranking calculation and only the ORDER BY is counted.

Note that it's important to GROUP BY both the generated Seq column as well as the id_type column. Seq is NOT unique by itself, there can be overlaps--you must also group by id_type.

For further reading on this topic:

Detect changes between row values—read the See It For Yourself section.
Or this simpler explanation

That first link gives you some code you can use if you wanted the begin or end date to be the same as the previous or next period's end/begin date (so there are no gaps). Plus other versions that could assist you in your query. Though they have to be translated from SQL Server syntax...

On Postgres 8.4 you can use a RECURSIVE function.

How do they do it

The recursive function adds a level to each different id_type, by selecting dates one by one on descending order.

       date           | id_type | lv
--------------------------------------
2017-01-10 07:19:21.0      3       8
2017-01-10 07:19:22.0      3       8
2017-01-10 07:19:23.1      3       8
2017-01-10 07:19:24.1      3       8
2017-01-10 07:19:25.0      3       8
2017-01-10 07:19:26.0      5       7
2017-01-10 07:19:27.1      3       6
2017-01-10 07:19:28.0      5       5
2017-01-10 07:19:29.0      5       5
2017-01-10 07:19:30.1      3       4
2017-01-10 07:19:31.0      5       3
2017-01-10 07:19:32.0      3       2
2017-01-10 07:19:33.1      5       1
2017-01-10 07:19:35.0      5       1
2017-01-10 07:19:36.1      5       1
2017-01-10 07:19:37.1      5       1

Then use MAX(date), MIN(date) grouping by level, id_type to get the desired result.

with RECURSIVE rdates as 
(
    (select   date, id_type, 1 lv 
     from     yourTable
     order by date desc
     limit 1
    )
    union
    (select    d.date, d.id_type,
               case when r.id_type = d.id_type 
                    then r.lv 
                    else r.lv + 1 
               end lv    
    from       yourTable d
    inner join rdates r
    on         d.date < r.date
    order by   date desc
    limit      1)
)
select   min(date) StartDate,
         max(date) EndDate,
         id_type
from     rdates
group by lv, id_type
;

+---------------------+---------------------+---------+
| startdate           |       enddate       | id_type |
+---------------------+---------------------+---------+
| 10.01.2017 07:19:21 | 10.01.2017 07:19:25 |    3    |
| 10.01.2017 07:19:26 | 10.01.2017 07:19:26 |    5    |
| 10.01.2017 07:19:27 | 10.01.2017 07:19:27 |    3    |
| 10.01.2017 07:19:28 | 10.01.2017 07:19:29 |    5    |
| 10.01.2017 07:19:30 | 10.01.2017 07:19:30 |    3    |
| 10.01.2017 07:19:31 | 10.01.2017 07:19:31 |    5    |
| 10.01.2017 07:19:32 | 10.01.2017 07:19:32 |    3    |
| 10.01.2017 07:19:33 | 10.01.2017 07:19:37 |    5    |
+---------------------+---------------------+---------+

Check it: http://rextester.com/WCOYFP6623

Grouping or Window

1. Window functions plus subqueries

2. Subtract two window functions, one subquery

3. Top performance with plpgsql function

Tags:

Postgresql

Group By

Window Functions

Gaps And Islands

Postgresql 8.4

Related

Recent Posts