Hierarchical queries return data in a tree like structure. The query is performed by walking the tree that is made up of parents and children, each non-root-node linked to a parent node. A common example of a hierarchical query is the one involving Employees who are linked to each other through the Manager reference.

I was recently facing a situation where I wanted to retrieve a number of nodes from a tree-like data structure, based on certain criteria. I wanted to present the resulting nodes again in a tree like fashion. However, if a child node was selected and its parent or other ancestors were not, I could not set up the tree structure to present the selected nodes in. So I have to make sure that along with certain specific nodes, also all their ancestors nodes are returned in order to restore the tree.

In this article, I will demonstrate several ways of creating such a query, one that returns selected nodes and their ancestors, given a specific hierarchical relation between records. The solutions I show make use of Oracle 9i features – the sys_connect_by_path operator and the combination of hierarchical queries and joins in a single query, which was not allowed prior to 9i.

What do we want to achieve?

We start from the classical EMP table. This table has three columns of interest: EMPNO (primary key), ENAME (display label) and MGR (self referencing foreign key). We use the connection between MGR and EMPNO to build the hierarchy. The simplest query to retrieve the EMP “tree” is this one:

select  lpad('+',3*(level-1))||ename||'('||empno||')'Employeefrom    emp
connect
by      prior empno = mgr
start
with    mgr isnull/

Note how we added the lpad function to create indentation for the non-root nodes in our tree. We use the pseudo-column LEVEL that indicates the level of nesting (or the number of ancestors) for any node in the tree. The result looks like this:

EMPLOYEE
----------------------------------
KING(7839)+JONES(7566)+SCOTT(7788)+ADAMS(7876)+FORD(7902)+SMITH(7369)+TURNER(7844)+MARTIN(7654)+BLAKE(7698)+ALLEN(7499)+WARD(7521)+JAMES(7900)+CLARK(7782)+MILLER(7934)14 rows selected.

This query could be executed as far back as Oracle 7 (or perhaps even earlier). In Oracle 9i, two knew features were introduced for hierarchical queries: the ability to join with other tables inside an hierarchical query and the operator SYS_CONNECT_BY_PATH. The latter returns for any node in the tree the concattenation of a certain expression for all nodes from the current node through all its ancestors all the way up to the root node. For example:

select  lpad('+',3*(level-1))||ename||' ('||dname||') '||sys_connect_by_path(job,'/')Employeefrom    emp
,       dept
where   emp.deptno = dept.deptno
connect
by      prior empno = mgr
start
with    mgr isnull

Here we request the JOB value for each node and all its ancestors. The Job values are separated by the ‘/’ sign:

EMPLOYEE
-------------------------------------------------------------------------------------
KING (ACCOUNTING)/PRESIDENT
  +JONES (RESEARCH)/PRESIDENT/ANALYST
     +SCOTT (RESEARCH)/PRESIDENT/ANALYST/ANALYST
        +ADAMS (RESEARCH)/PRESIDENT/ANALYST/ANALYST/CLERK
     +FORD (RESEARCH)/PRESIDENT/ANALYST/ANALYST
        +SMITH (RESEARCH)/PRESIDENT/ANALYST/ANALYST/CLERK
           +TURNER (SALES)/PRESIDENT/ANALYST/ANALYST/CLERK/SALESMAN
              +MARTIN (SALES)/PRESIDENT/ANALYST/ANALYST/CLERK/SALESMAN/ACCOUNTNT
  +BLAKE (SALES)/PRESIDENT/MANAGER
     +ALLEN (SALES)/PRESIDENT/MANAGER/SALESMAN
     +WARD (SALES)/PRESIDENT/MANAGER/SALESMAN
     +JAMES (SALES)/PRESIDENT/MANAGER/CLERK
  +CLARK (ACCOUNTING)/PRESIDENT/MANAGER
     +MILLER (ACCOUNTING)/PRESIDENT/MANAGER/CLERK

Now returning to the job at hand. I want to be able to select from the tree-like structure all nodes that satisfy certain conditions, for example the requirement that the SAL is greater than 3000 or the ENAME contains an ‘I’. However, I want to represent the result in a tree, with all manager-ancestors for each selected employee. Clearly simply adding a where clause of where ename like '%I%' will not do the trick here:

select  lpad('+',3*(level-1))||ename||'('||empno||')'Employeefrom    emp
where   ename like '%I%'
connect
by      prior empno = mgr
start
with    mgr isnull/

The result is clearly not what we intended:

EMPLOYEE
----------------------------
KING(7839)+SMITH(7369)+MARTIN(7654)+MILLER(7934)

We lack the managers of SMITH, MARTIN and MILLER – because obviously their names do not include an ‘I’. So what to do instead?

We need a way to include not only the nodes that directly satisfy the search condition, but also their ancestors. Here we can make use of the SYS_CONNECT_BY_PATH function. We can for example ask each selected node for its SYS_CONNECT_BY_PATH for the EMPNO column. This gives us not only the Employees that satisfy the criteria, but also a list of the EMPNO values for their ancestors in the tree:

select  ename
,       empno
,       sys_connect_by_path(empno,'.')||'.' scbp
from    emp
connect
by      prior empno = mgr
start
with    mgr isnull/
ENAME           EMPNO SCBP
----------------------------------------------------
KING             7839.7839.
JONES            7566.7839.7566.
SCOTT            7788.7839.7566.7788.
ADAMS            7876.7839.7566.7788.7876.
FORD             7902.7839.7566.7902.
SMITH            7369.7839.7566.7902.7369.
TURNER           7844.7839.7566.7902.7369.7844.
MARTIN           7654.7839.7566.7902.7369.7844.7654.
BLAKE            7698.7839.7698.
ALLEN            7499.7839.7698.7499.
WARD             7521.7839.7698.7521.
JAMES            7900.7839.7698.7900.
CLARK            7782.7839.7782.
MILLER           7934.7839.7782.7934.

We can make use of this approach in the following way:

with tree as(select  ename
  ,       empno
  ,       sys_connect_by_path(empno,'.')||'.' scbp
  from    emp
  connect
  by      prior empno = mgr
  start
  with    mgr isnull)select distinct
       emp.empno
,      emp.ename
,      emp.mgr
from   tree
,      emp
where  instr('.'||tree.scbp,emp.empno||'.')>0--select any employee whose empno is part of the path from the selected tree-nodes all the way to the top
and    tree.ename like '%I%'--from the entire tree, only select those nodes that satisfy the search requirements
/

First we build the tree – with all the nodes – in the inline view ‘tree’. Then we select from the tree only the nodes that satisfy the search condition. Last we join these selected tree-nodes with table EMP using the condition instr('.'||tree.scbp,emp.empno||'.') > 0 . This specifies that if the SYS_CONNECT_BY_PATH on EMPNO for one of the selected tree-nodes includes the primary key EMPNO of a record in EMP, that record should be includes, as it is either a directly selected tree node (with an I in the ENAME) or one of the ancestors of such a node. The result:

     EMPNO ENAME             MGR
------------------------------7369 SMITH            79027566 JONES            78397654 MARTIN           78447782 CLARK            78397839 KING
      7844 TURNER           73697902 FORD             75667934 MILLER           77828 rows selected.

Now we would like to present this search result in a tree-structure. Using this record set, which we know to include all ancestor nodes from the selected nodes to the root, it should be simple to create the tree again:

with tree as(select  ename
  ,       empno
  ,       sys_connect_by_path(empno,'.')||'.' scbp
  from    emp
  connect
  by      prior empno = mgr
  start
  with    mgr isnull), selected_tree_nodes as(select distinct
         emp.empno
  ,      emp.ename
  ,      emp.mgr
  from   tree
  ,      emp
  where  instr('.'||tree.scbp,emp.empno||'.')>0--select any employee whose empno is part of the path from the selected tree-nodes all the way to the top
  and    tree.ename like '%I%'--from the entire tree, only select those nodes that satisfy the search requirements
)select   lpad(ename,level*3+10)||' ('||empno||')' emp_node --finally build a tree from the subset of nodes that were returned
from     selected_tree_nodes
connect
by       PRIOR empno = mgr
start
with     mgr isnull/

The resulting tree looks like this:

EMP_NODE
-----------------------------------
         KING (7839)
           JONES (7566)
               FORD (7902)
                 SMITH (7369)
                   TURNER (7844)
                      MARTIN (7654)
           CLARK (7782)
             MILLER (7934)8 rows selected.

Alternative approach: Bottom Up Tree

Instead of building the entire tree of employees, starting from all root-nodes and traversing down through all nodes, including nodes and entire sub-branches that do not qualify, is perhaps somewhat overdoing it. Could there not be a more direct approach? What if we start by selecting all nodes that qualify and then build the tree from these nodes? If we need only a few nodes from just a few branches of the tree, would this not be much cheaper? Well, it probably would be. Let’s see how to do this.

select e1.*,      e2.empno marker
from   emp e1
left outer join
(select empno
  from   emp
  where  ename like '%I%') e2 -- find all empnos of employees that satisfy the search requirement
on (e1.empno = e2.empno)/
     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO     MARKER
----------------------------------------------------------------------------------------7369 SMITH      CLERK           790217-DEC-808002073697654 MARTIN     ACCOUNTNT       784428-SEP-81125014003076547839 KING       PRESIDENT            17-NOV-8150001078397934 MILLER     CLERK           778223-JAN-8213001079347844 TURNER     SALESMAN        736908-SEP-8115000307782 CLARK      MANAGER         783909-JUN-812450107521 WARD       SALESMAN        769822-FEB-811250500307788 SCOTT      ANALYST         756609-DEC-823000207698 BLAKE      MANAGER         783901-MAY-812850307566 JONES      ANALYST         783902-APR-812975207499 ALLEN      SALESMAN        769820-FEB-811600300307902 FORD       ANALYST         756603-DEC-813000207876 ADAMS      CLERK           778812-JAN-831100207900 JAMES      CLERK           769803-DEC-8195030

Here we selected all EMP records, and for each record we have determined whether or not it is one of the selected nodes; this is indicated through the MARKER column. Next we are going to use this set to build a tree, starting from all the nodes that have a value for their marker:

with emps as-- all employees with a marker column for those that satisfy the search requirement
(select e1.*,      e2.empno marker
 from   emp e1
 left outer join
 (select empno
   from   emp
   where  ename like '%I%') e2 -- find all empnos of employees that satisfy the search requirement
 on (e1.empno = e2.empno))select distinct
       ename
,      empno
from   emps
connect
by     PRIOR mgr=empno -- note that the connect by condition is exactly the reverse of the one we used earlier; it reflects the fact that we build the tree from the bottom upwards, linking records to the MGR reference of the prior node
start
with   marker isnotnull/

The result looks familiar, as it should:

ENAME           EMPNO
--------------------
CLARK            7782
FORD             7902
JONES            7566
KING             7839
MARTIN           7654
MILLER           7934
SMITH            7369
TURNER           78448 rows selected.

A slightly more compact alternative for this query is the following – it can be used whenever the condition to select the nodes is relatively simple and does not require a subquery:

with emps as-- all employees with a marker column for those that satisfy the search requirement
(select e1.*,casewhen ename like '%I%'then'X'end  marker
 from   emp e1
)select distinct
       ename
,      empno
from   emps
connect
by     PRIOR mgr=empno
start
with   marker isnotnull/
Finally, building a tree from the query result - showing all employees whose names contain an 'I'with all their managerial burden,isdone like this:
with emps as-- all employees with a marker column for those that satisfy the search requirement
(select e1.*,casewhen ename like '%I%'then'X'end  marker
 from   emp e1
), tree_nodes as(select distinct
        ename
 ,      empno
 ,      mgr
 from   emps
 connect
 by     PRIOR mgr=empno
 start
 with   marker isnotnull)select  lpad(ename, level *4)from    tree_nodes
connect by prior empno = mgr
start with mgr isnull/
The result, again,is familiar:
LPAD(ENAME,LEVEL*4)------------------------------
KING
   CLARK
      MILLER
   JONES
        FORD
           SMITH
              TURNER
                  MARTIN

8 rows selected.

Note: if for some reason the WITH clause cannot be used – and it seems that for example Oracle ADF Business Components does not like it – you can rewrite the above queries using plain In Line views:

select distinct
       ename
,      empno
from(select e1.*,      e2.empno marker
         from   emp e1
                left outer join
                (select empno
                  from   emp
                  where  ename like '%I%') e2 -- find all empnos of employees that satisfy the search requirement
                on (e1.empno = e2.empno))
connect
by     PRIOR mgr=empno
start
with   marker isnotnull

and the last one:

select  lpad(ename, level *4)from(select distinct
                 ename
          ,      empno
          ,      mgr
          from(select e1.*,      e2.empno marker
                   from   emp e1
                          left outer join
                          (select empno
                            from   emp
                            where  ename like '%I%') e2 -- find all empnos of employees that satisfy the search requirement
                          on (e1.empno = e2.empno))
          connect
          by     PRIOR mgr=empno
          start
          with   marker isnotnull)
connect by prior empno = mgr
start with mgr isnull