Recursive JPA query?

I know this question is old, but as it was linked in a different question, I wanted to give an update on this, as Blaze-Persistence offers support for working with recursive CTEs on top of the JPA model.

Blaze-Persistence is a query builder on top of JPA which supports many of the advanced DBMS features on top of the JPA model. To model CTEs or recursive CTEs, which is what you need here, you first need to introduce a CTE entity that models the result type of the CTE.

@CTE
@Entity
public class GroupCTE {
  @Id Integer id;
}

A query that fetches a hierarchy of groups could look like the following

List<Group> groups = criteriaBuilderFactory.create(entityManager, Group.class)
  .withRecursive(GroupCTE.class)
    .from(Group.class, "g1")
    .bind("id").select("g1.id")
    .where("g1.parent").isNull()
  .unionAll()
    .from(Group.class, "g2")
    .innerJoinOn(GroupCTE.class, "cte")
      .on("cte.id").eq("g2.parent.id")
    .end()
    .bind("id").select("g2.id")
  .end()
  .from(Group.class, "g")
  .fetch("groups")
  .where("g.id").in()
    .from(GroupCTE.class, "c")
    .select("c.id")
  .end()
  .getResultList();

This renders to SQL looking like the following

WITH RECURSIVE GroupCTE(id) AS (
    SELECT g1.id
    FROM Group g1
    WHERE g1.parent_group_id IS NULL
  UNION ALL
    SELECT g2.id
    FROM Group g2
    INNER JOIN GroupCTE cte ON g2.parent_group_id = cte.id
)
SELECT *
FROM Group g
LEFT JOIN Group gsub ON gsub.parent_group_id = g.id
WHERE g.id IN (
  SELECT c.id
  FROM GroupCTE c
)

You can find out more about recursive CTEs in the documentation: https://persistence.blazebit.com/documentation/core/manual/en_US/index.html#recursive-ctes


I had problem like this, querying a menu nodes from one table, The way I founded was this: suppose we have a class named Node,created a Unidirectional One-to-Many Association like this:

    @OneToMany(  fetch = FetchType.EAGER)
    @JoinColumn(name = "parent_id", referencedColumnName = "id")
    private List<Node> subNodeList;

also have a filed named for example boolean isRoot in entity, to mention if this node is root menu item , and then, by querying for nodes that there isRoot is true, we just get top nodes and because of FetchType.EAGER, we also get sub nodes in List. This will cause multiple queries , but for small menu like things it will be ok.


Using the simple Adjacency Model where each row contains a reference to its parents which will refer to another row in same table doesn't co-operate well with JPA. This is because JPA doesn't have support for generating queries using the Oracle CONNECT BY clause or the SQL standard WITH statement. Without either of those 2 clauses its not really possible to make the Adjacency Model useful.

However, there are a couple of other approaches to modelling this problem that can applied to this problem. The first is the Materialised Path Model. This is where the full path to the node is flattened into a single column. The table definition is extended like so:

CREATE TABLE node (id INTEGER,
                   path VARCHAR, 
                   parent_id INTEGER REFERENCES node(id));

To insert a tree of nodes looks some thing like:

INSERT INTO node VALUES (1, '1', NULL);  -- Root Node
INSERT INTO node VALUES (2, '1.2', 1);   -- 1st Child of '1'
INSERT INTO node VALUES (3, '1.3', 1);   -- 2nd Child of '1'
INSERT INTO node VALUES (4, '1.3.4', 3); -- Child of '3'

So to get Node '1' and all of its children the query is:

SELECT * FROM node WHERE id = 1 OR path LIKE '1.%';

To map this to JPA just make the 'path' column an attribute of your persistent object. You will however have to do the book-keeping to keep the 'path' field up to date. JPA/Hibernate won't do this for you. E.g. if you move the node to a different parent you will have to update both the parent reference and determine the new path value from the new parent object.

The other approach is called the Nested Set Model, which is bit more complex. Probably best described by its originator (rather than added verbatim by me).

There is a third approach called Nested Interval Model, however this has a heavy reliance of stored procedures to implement.

A much more complete explanation to this problem is described in chapter 7 of The Art of SQL.


The best answer in this post seems like a massive work-around hack to me. I've already had to deal with data models where brilliant engineers decided it would be a good Idea to code Tree Hiarchies in DB fields as text such as: "Europe|Uk|Shop1|John" and with massive volumes of data in these tables. Not surprsingly, the performance of query of the form MyHackedTreeField LIKE 'parentHierharchy%' where killers. Addressing this type of problem ultimately required creating In memory cache of the tree hiearchies and so many others...

If you need to run a recursive query and your data volume is not massive... make your life simple and simply load the DB fields you need to run your plan. And code your recursion in java. Don't make it in the DB unless you have a good reason to do it.

And even if the volume of data you have is massive, you most likely can subdivide your problem into indepent recursive tree batches and process those one at time without needing to load all the data at once.