Cross join on a numbers table to get line vertices, is there a better way?

I know a bit about Oracle performance and pretty much nothing about custom data types, but I'll try to give you a plan to improve performance.

1) Verify that you cannot get an explain plan.

It's possible to get explain plans even if you don't have sophisicated database software. What happens if you execute set autotrace on explain?

You could also try DBMS_XPLAN. First save off the plan by wrapping your query with a few extra key words:

explain plan for (SELECT... your query goes here); 

Then execute this:

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

It's possible that neither of those will work and you truly cannot get an explain plan. I just wanted to verify that because with an explain plan it'll be much easier for the community to help you.

2) Consider requirements.

You said that 20 seconds isn't good enough. Have you or someone else defined exactly what is good enough? Is there any room for negotiation? Does your query need to be exactly one SELECT query? Could you populate a global temporary table in one step and select the results you wanted in the next? Could you create a stored procedure that returns a result set and call that?

3) Establish a lower bound for the time required to complete the query.

I suggest running a simple query that "cheats" to figure out what a well-optimized query would look like. For example, how long does this query that gets only the first vertices take?

SELECT
    ROWNUM
    ,ROAD_ID
    ,VERTEX_INDEX
    ,SDE.ST_X(ST_POINT) AS X
    ,SDE.ST_Y(ST_POINT) AS Y
FROM
(
    SELECT  
          ROWNUM
          ,a.ROAD_ID
          ,1 VERTEX_INDEX
          ,SDE.ST_PointN(a.SHAPE, 1) AS ST_POINT
    FROM  ENG.ROAD a
)
ORDER BY ROAD_ID, VERTEX_INDEX;

I suspect that will give you 4000 rows. If you multiply that query's response time by 17.5/4 that could give you a good lower bound for the total execution time.

If your lower bound for the total execution time is longer than what you established in step 2 then you either need to get creative with your data model by calculating results ahead of time and storing them in tables or you need to renegotiate the required response time.

4) Benchmark to figure out which functions are contributing the most to your execution time.

You were on the right track with Update #1 but you need to try to control for the amount of work being done. For example, is it possible to write a group of relatively simple queries that execute each function exactly 10000 times? How do the response times compare?

5) Go to work.

Depending on the requirements established in step 2 and what you found in step 4 try any trick that you can think of to reduce the query runtime. Are you able to pre-compute results and save off them? If the problem relates to the number of times the functions are executed then the undocumented materialize hint may be helpful. That forces Oracle to create a hidden temp table behind the scenes to store the results. I do not know if it is compatible with the special data types that you are using.

For example, maybe something like this performs better? Apologies if it does not compile but I have no way to test.

WITH ROAD_CTE (ROAD_ID, VERTEX_INDEX, SHAPE) AS
(
    SELECT /*+ materalize */
      a.ROAD_ID
    , b.NUMBERS VERTEX_INDEX
    , a.SHAPE
    FROM ENG.ROAD a
    CROSS JOIN ENG.NUMBERS b
    WHERE b.NUMBERS <= SDE.ST_NUMPOINTS(a.SHAPE)
)
, CTE_WITH_ST_POINT (ROAD_ID, VERTEX_INDEX, ST_POINT) AS
(
    SELECT /*+ materalize */
      rcte.ROAD_ID
    , rcte.VERTEX_INDEX
    , SDE.ST_PointN(rcte.SHAPE, rcte.VERTEX_INDEX) ST_POINT
    FROM ROAD_CTE rcte
)
SELECT 
      ROAD_ID
    , VERTEX_INDEX
    , SDE.ST_X(ST_POINT) AS X
    , SDE.ST_Y(ST_POINT) AS Y
FROM CTE_WITH_ST_POINT
ORDER BY ROAD_ID, VERTEX_INDEX;

If you're still stuck after all of this I suspect that it'll at least give you additional information that you can edit into the question. Good luck!


I tried using CONNECT BY (and DUAL) to see if it would be quicker, but it isn't (it's about the same).

SELECT  ROAD_ID
        ,T.VERTEX_INDEX
        ,SDE.ST_X(SDE.ST_PointN(SHAPE, T.VERTEX_INDEX)) AS X
        ,SDE.ST_Y(SDE.ST_PointN(SHAPE, T.VERTEX_INDEX)) AS Y
FROM    ENG.ROADS 
        CROSS JOIN
            (
            SELECT LEVEL AS VERTEX_INDEX 
            FROM DUAL CONNECT BY LEVEL <= 
                (
                SELECT MAX(SDE.ST_NUMPOINTS(SHAPE)) 
                FROM ENG.ROADS 
                )
            ) T
WHERE    T.VERTEX_INDEX <= SDE.ST_NUMPOINTS(SHAPE)
--removed to do explain plan: ORDER BY ROAD_ID, VERTEX_INDEX

-------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                      |   200 | 54800 |    36   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                  |                      |   200 | 54800 |    36   (0)| 00:00:01 |
|   2 |   VIEW                         |                      |     1 |    13 |     2   (0)| 00:00:01 |
|*  3 |    CONNECT BY WITHOUT FILTERING|                      |       |       |            |          |
|   4 |     FAST DUAL                  |                      |     1 |       |     2   (0)| 00:00:01 |
|   5 |     SORT AGGREGATE             |                      |     1 |   261 |            |          |
|   6 |      TABLE ACCESS FULL         | ROAD                 |  3997 |  1018K|    34   (0)| 00:00:01 |
|*  7 |   TABLE ACCESS FULL            | ROAD                 |   200 | 52200 |    34   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
"   3 - filter(LEVEL<= (SELECT MAX(""SDE"".""ST_NUMPOINTS""(""SHAPE"")) FROM "
"              ""ENG"".""ROAD"" ""ROAD""))"
"   7 - filter(""T"".""VERTEX_INDEX""<=""SDE"".""ST_NUMPOINTS""(""ROAD"".""SHAPE""))"

I got the idea from this post: How to calculate ranges in Oracle?


Results and response to Joe Obbish’s answer:

Note: From here on in, I will be referring to the query in Update #2 as 'the query'; I will not be referring to the query in the original question.

1) Verify that you cannot get an explain plan.

I am unable to execute set autotrace on explain. I get this error: ORA-00922: missing or invalid option (#922)

But I am able to execute DBMS_XPLAN. I had assumed that I would be unable to do this. Fortunately, I was wrong. I am now running explain plans.

2) Consider requirements.

Does your query need to be exactly one SELECT query?

I think the query does need to be exactly one query. The software that I'm using is very limited, and does not allow for multiple select statements.

Have you defined exactly what your requirements are?

  • The query will be used to update vertex coordinates after edits have been made to line geometry. This would typically happen to either a single line at a time, or perhaps tens of lines, but not likely to thousands of lines. In this scenario, the present performance of the query will be adequate.
  • The query will also be used to construct new line geometry for all 3,805 lines (this is related to the subject of dynamic segmentation/linear referencing). This will happen on-the-fly in a view, so performance is absolutely crucial. The query would likely need to execute in less than 5 seconds.

3) Establish a lower bound for the time required to complete the query.

The first-vertex query executes in 3.75 seconds (returns 3805 rows, as expected).

3.75 sec * (16495 total / 3805 lines) = 16.25 sec

The result: the lower bound for the total execution time is longer than what I established in step 2 (5 seconds). Therefore, I think the solution is to '...get creative with my data model by calculating results ahead of time and storing them in a table' (the required response time is non-negotiable). In other words, make a materialized view.

Additionally, the lower bound of 16.25 seconds matches the total execution time of the query in Update #2 (16 secs). I think this proves that my query is fully optimized, given the functions and data that I have to work with.

4) Benchmark to figure out which functions are contributing the most to your execution time.

I've created two tables (both contain 10,000 rows): ROADS_BM and ROADS_STARTPOINT_BM. I've run simple queries on the tables using each of the functions that are involved. Here are the results:

               +-----------+------------------+---------------------------------------------------------------------------+
               | TIME(sec) | RETURN TYPE      | QUERY                                                                     |
+--------------+-----------+------------------+---------------------------------------------------------------------------+
| ST_X         | < 0.5     | Double precision | SELECT ROAD_ID FROM (                                                     |
|              |           | (Number)         | SELECT ROAD_ID, SDE.ST_X(SHAPE) AS X FROM ENG.ROADS_STARTPOINT_BM         |
|              |           |                  | ) WHERE X IS NOT NULL ORDER BY ROAD_ID                                    |
+--------------+-----------+------------------+---------------------------------------------------------------------------+
| ST_Y         | < 0.5     | Double precision | SELECT ROAD_ID FROM (                                                     |
|              |           | (Number)         | SELECT ROAD_ID, SDE.ST_Y(SHAPE) AS Y FROM ENG.ROADS_STARTPOINT_BM         |
|              |           |                  | ) WHERE Y IS NOT NULL ORDER BY ROAD_ID                                    |
+--------------+-----------+------------------+---------------------------------------------------------------------------+
| ST_NumPoints | < 0.5     | Integer          | SELECT ROAD_ID FROM (                                                     |
|              |           |                  | SELECT ROAD_ID, SDE.ST_NumPoints(SHAPE) AS NUM_POINTS FROM ENG.ROADS_BM   |
|              |           |                  | ) WHERE NUM_POINTS IS NOT NULL ORDER BY ROAD_ID                           |
+--------------+-----------+------------------+---------------------------------------------------------------------------+
| ST_PointN*   | **9.5**   | ST_POINT         | SELECT ROAD_ID FROM (                                                     |
|              |           | (ST_GEOMETRY     | SELECT ROAD_ID, SDE.ST_PointN(SHAPE,1) AS ST_POINT FROM ENG.ROADS_BM      |
|              |           | subclass)        | ) WHERE ST_POINT IS NOT NULL ORDER BY ROAD_ID                             |
+--------------+-----------+------------------+---------------------------------------------------------------------------+

Function documentation: ST_X, ST_Y, ST_NumPoints, ST_PointN

The result? ST_PointN is the problem. It's 9.5 second response time is abysmal compared to the other functions. I suppose this makes a bit of sense though. ST_PointN returns a ST_POINT geometry data type, which has got to be fairly complex compared the other functions which return a simple number.

Note: ST_PointNis tricky. It's return type is ST_POINT, which my software doesn't know how to handle in a result set: ORA-24359: OCIDefineObject not invoked for a Object type or Reference.

To get around this, I put it in an inline query to prevent the column being returned to the result set. But when I do that, the query doesn't actually process the column, which defeats the purpose of the test. So I check if it is null in the outer query: WHERE ST_POINT IS NOT NULL ORDER BY RDSEC. By doing this, I ensure that the ST_PointN function is actually being used, without returning it to the result set.

And of course, I want to do an apples-to-apples test, so I do the same sort of inline query for the other functions too (even though it's not technically necessary).

5) Go to work.

Based on steps 2, 3 & 4, here are my findings:

  • The problem is the ST_PointN function. It is slow. I don't think there's much I can do about this though. Other than attempting to completely reprogram/recreate the function in hopes that I could do better than the specialists who made it. Not exactly practical.
  • In order to achieve the performance that I require, I'll need to pre-compute the query in a table or materialized view.
  • As far as '..tricks that you can think of to reduce the query runtime' goes, I might be able to eliminate some of the vertices in the longer lines. This would allow me to remove a few rows from the NUMBERS table (which presently has 30 rows). This would speed up the join (although any gain in performance would be minimal). I should also review all table indexes, despite the fact that my performance issues are unrelated to indexes/joins.
  • Based on the testing, I don't think the problem '...relates to the number of times the functions are executed'.
  • The CTE query that was provided in #5 compiled just fine (I'm impressed that Joe was able to pull this off). Surprisingly though, the execution time was 30 seconds, which isn't an improvement. I guess ST_PointN is to blame for that too. The CTE query wasn't a waste though; I learned a lot just by using it.

6) Conclusion.

I'm satisfied that I've optimized the query as much as possible. I'll set up the pre-calculation, and move on to the next thing. A big thanks to Joe Obbish; I have learned a ton from the steps he provided.