Cycle detection with recursive subquery factoring

AFAIK:

  • MySQL doesn't support recursive CTE's
  • SQL Sever does not support cycle detection in recursive CTE's

From documentation on CONNECT_BY_ISCYCLE:

The CONNECT_BY_ISCYCLE pseudocolumn returns 1 if the current row has a child which is also its ancestor

and that on CYCLE:

A row is considered to form a cycle if one of its ancestor rows has the same values for the cycle columns.

In your example, row 2 does have a child which is also its ancestor, but its id has not been returned yet.

In other words, CONNECT_BY_ISCYCLE checks the children (which are yet to be returned), while CYCLE checks the current row (which is already returned).

CONNECT BY is row based, while recursive CTE's are set-based.

Note that Oracle's documentation on CYCLE mentions an "ancestor row". However, generally speaking, there is no concept of an "ancestor row" in a recursive CTE. It's a set based operation which can yield results completely out of the tree. Generally speaking, the anchor part and the recursive part can even use the different tables.

Since recursive CTE's are usually used to build hierarchy trees, Oracle decided to add a cycle check. But due the set-based way the recursive CTE's operate, it's generally impossible to tell will the next step generate a cycle or not, because without a clear definition of the "ancestor row" cycle condition cannot be defined either.

To perform the "next" step, the whole "current" set needs to be available, but to generate each row of the current set (which includes the cycle column) we just need to have the results of the "next" operation.

It's not a problem if the current set always consists of a single row (like in CONNECT BY), but it is a problem if the recursive operation defined on a set as a whole.

Didn't look into Oracle 11 yet, but SQL Server implements recursive CTE's by just hiding a CONNECT BY behind them, which requires placing numerous restrictions (all of which effectively forbid all set-based operations).

PostgreSQL's implementation, on the other hand, is truly set-based: you can do any operation with the anchor part in the recursive part. It does not have any means to detect cycles, though, because cycles are not defined in the first place.

As was mentioned before, MySQL does not implement CTE's at all (it does not implement HASH JOIN's or MERGE JOINs as well, only the nested loops, so don't be surprised much).

Ironically, I received a letter today on this very subject, which I will cover in my blog.

Update:

Recursive CTE's in SQL Server are no more than CONNECT BY in disguise. See this article in my blog for shocking details:

  • SQL Server: are the recursive CTE’s really set-based?

PostgreSQL supports WITH-style hierarchical queries, but doesn't have any automatic cycle detection. This means that you need to write your own and the number of rows returned depends on the way you specify join conditions in the recursive part of the query.

Both examples use an array if IDs (called all_ids) to detect loops:

WITH recursive tr (id, parent_id, all_ids, cycle) AS (
    SELECT id, parent_id, ARRAY[id], false
    FROM t
    WHERE id = 1
    UNION ALL
    SELECT t.id, t.parent_id, all_ids || t.id, t.id = ANY(all_ids)
    FROM t
    JOIN tr ON t.parent_id = tr.id AND NOT cycle)
SELECT id, parent_id, cycle
FROM tr;

 id | parent_id | cycle
----+-----------+-------
  1 |         2 | f
  2 |         1 | f
  1 |         2 | t


WITH recursive tr (id, parent_id, all_ids, cycle) AS (
    SELECT id, parent_id, ARRAY[id], false
    FROM t
    WHERE id = 1
    UNION ALL
    SELECT t.id, t.parent_id, all_ids || t.id, (EXISTS(SELECT 1 FROM t AS x WHERE x.id = t.parent_id))
    FROM t
    JOIN tr ON t.parent_id = tr.id
    WHERE NOT t.id = ANY(all_ids))
SELECT id, parent_id, cycle
FROM tr;

 id | parent_id | cycle
----+-----------+-------
  1 |         2 | f
  2 |         1 | t