Jonathan Lewis

Subscribe to Jonathan Lewis feed Jonathan Lewis
Just another Oracle weblog
Updated: 10 hours 42 min ago

use_nl redux

Fri, 2021-10-15 08:58

A question has just appeared on a note I wrote in 2012 about the incorrect use of the use_nl() hint in some sys-recursive SQL, linking forward to an explanation I wrote in 2017 of the use_nl() hint – particularly the interpretation of the form use_nl(a,b), which does not mean “use a nested loop from table A to table B)”.

The question is essentially turns into – “does Oracle pick the join order before it looks at the hints”?

I’m going to look at one of the queries (based on the 2017 table creation code) that was supplied in the question and explain how Oracle gets to the plan it uses in my (21.3) system; here’s the query, followed by the plan:

select
        /*+ use_nl(b) */
        a.v1, b.v1, c.v1, d.v1
from
        a, b, c, d
where
        d.n100 = 0
and     a.n100 = d.id
and     b.n100 = a.n2
and     c.id   = a.id
/


| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 20000 |  1347K|   105   (5)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 20000 |  1347K|   105   (5)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | C    | 10000 |   146K|    26   (4)| 00:00:01 |
|*  3 |   HASH JOIN          |      | 20000 |  1054K|    78   (4)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL | D    |   100 |  1800 |    26   (4)| 00:00:01 |
|*  5 |    HASH JOIN         |      | 20000 |   703K|    52   (4)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| B    | 10000 |   136K|    26   (4)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| A    | 10000 |   214K|    26   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      SWAP_JOIN_INPUTS(@"SEL$1" "C"@"SEL$1")
      SWAP_JOIN_INPUTS(@"SEL$1" "D"@"SEL$1")
      USE_HASH(@"SEL$1" "C"@"SEL$1")
      USE_HASH(@"SEL$1" "D"@"SEL$1")
      USE_HASH(@"SEL$1" "A"@"SEL$1")
      LEADING(@"SEL$1" "B"@"SEL$1" "A"@"SEL$1" "D"@"SEL$1" "C"@"SEL$1")
      FULL(@"SEL$1" "C"@"SEL$1")
      FULL(@"SEL$1" "D"@"SEL$1")
      FULL(@"SEL$1" "A"@"SEL$1")
      FULL(@"SEL$1" "B"@"SEL$1")
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('21.1.0')
      OPTIMIZER_FEATURES_ENABLE('21.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C"."ID"="A"."ID")
   3 - access("A"."N100"="D"."ID")
   4 - filter("D"."N100"=0)
   5 - access("B"."N100"="A"."N2")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   6 -  SEL$1 / "B"@"SEL$1"
         U -  use_nl(b)

Note
-----
   - this is an adaptive plan

Points to note:

  • The Hint Report says the plan final did not use the use_nl(b) hint.
  • Whatever you may think the join order is by looking at the bodyy of the plan, the leading() hint in the Outline Information tells us that the join order was (B A D C) – and that explains why the use_nl(b) hint could not be used, because B was never “the next table in the join order”.
  • The “visible” order of activity displayed in the plan is C D B A, but that’s because we swap_join_inputs(D) to put it about the (B,A) join, then swap_join_inputs(C) to put that above D.

So did Oracle completely pre-empt any plans that allowed B to be “the next table”, thus avoiding the hint, or did it consider some plans where B wasn’t the first table in the join order, and if would it have used a nested loop into B if that plan had had a low enough cost?

The only way to answer these questions is to look at the CBO (10053) trace file; and for very simply queries it’s often enough to pick out a few lines as a starting point – in my case using egrep:

egrep -e "^Join order" -e"Best so far" or21_ora_15956.trc

Join order[1]:  D[D]#0  A[A]#1  B[B]#2  C[C]#3
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
Join order[2]:  D[D]#0  A[A]#1  C[C]#3  B[B]#2
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
Join order[3]:  D[D]#0  B[B]#2  A[A]#1  C[C]#3
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
Join order[4]:  D[D]#0  B[B]#2  C[C]#3  A[A]#1
Join order aborted2: cost > best plan cost
Join order[5]:  D[D]#0  C[C]#3  A[A]#1  B[B]#2
Join order aborted2: cost > best plan cost
Join order[6]:  D[D]#0  C[C]#3  B[B]#2  A[A]#1
Join order aborted2: cost > best plan cost

Join order[7]:  A[A]#1  D[D]#0  B[B]#2  C[C]#3
Join order aborted2: cost > best plan cost
Join order[8]:  A[A]#1  D[D]#0  C[C]#3  B[B]#2
Join order aborted2: cost > best plan cost
Join order[9]:  A[A]#1  B[B]#2  D[D]#0  C[C]#3
Join order aborted2: cost > best plan cost
Join order[10]:  A[A]#1  C[C]#3  D[D]#0  B[B]#2
Join order aborted2: cost > best plan cost
Join order[11]:  A[A]#1  C[C]#3  B[B]#2  D[D]#0
Join order aborted2: cost > best plan cost

Join order[12]:  B[B]#2  D[D]#0  A[A]#1  C[C]#3
Join order aborted2: cost > best plan cost
Join order[13]:  B[B]#2  A[A]#1  D[D]#0  C[C]#3
Best so far:  Table#: 2  cost: 25.692039  card: 10000.000000  bytes: 140000.000000
Join order[14]:  B[B]#2  A[A]#1  C[C]#3  D[D]#0
Join order aborted2: cost > best plan cost
Join order[15]:  B[B]#2  C[C]#3  D[D]#0  A[A]#1
Join order aborted2: cost > best plan cost

Join order[16]:  C[C]#3  D[D]#0  A[A]#1  B[B]#2
Join order aborted2: cost > best plan cost
Join order[17]:  C[C]#3  A[A]#1  D[D]#0  B[B]#2
Join order aborted2: cost > best plan cost
Join order[18]:  C[C]#3  A[A]#1  B[B]#2  D[D]#0
Join order aborted2: cost > best plan cost
Join order[19]:  C[C]#3  B[B]#2  D[D]#0  A[A]#1
Join order aborted2: cost > best plan cost

Oracle has considerd 19 possible join orders (out of a maximum of 24 (= 4!). In theory we should see 6 plans starting with wach of the 4 tables. In fact we we that the optimizer’s first choice started with table D, producing 6 join orders, then switched to starting with table A, producing only 5 join orders.

The “missing” order is (A, B, C, D) which should have appeared between join orders 9 and 10. If we check the trace file in more detail we’ll see that the optimizer aborted after calculation the join from A to B because the cost had already exceeded the “Best so far” by then so it didn’t carry on to calculate the cost going on to D. Clearly , then, there was no point in considering any other order that starting with (A, B) hence the absence of (A, B, C, D).

I’ve highlighted all the join orders where the optimizer didn’t abort. The “Best so far” line that I have reported (for ease of searching and reporting) is misleading – it’s only the cost of the first table in join order, this is what the 4 non-aborted summaries look like:

egrep -A+3 -e"Best so far" or21_ora_15956.trc

Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
              Table#: 1  cost: 51.767478  card: 10000.000000  bytes: 400000.000000
              Table#: 2  cost: 30137.036118  card: 20000.000000  bytes: 1080000.000000
              Table#: 3  cost: 30163.548157  card: 20000.000000  bytes: 1380000.000000
--
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
              Table#: 1  cost: 51.767478  card: 10000.000000  bytes: 400000.000000
              Table#: 3  cost: 78.079517  card: 10000.000000  bytes: 550000.000000
              Table#: 2  cost: 30163.348157  card: 20000.000000  bytes: 1380000.000000
--
Best so far:  Table#: 0  cost: 25.752439  card: 100.000000  bytes: 1800.000000
              Table#: 2  cost: 2483.956340  card: 1000000.000000  bytes: 32000000.000000
              Table#: 1  cost: 2530.068379  card: 20000.000000  bytes: 1080000.000000
              Table#: 3  cost: 2556.580418  card: 20000.000000  bytes: 1380000.000000
--
Best so far:  Table#: 2  cost: 25.692039  card: 10000.000000  bytes: 140000.000000
              Table#: 1  cost: 52.204078  card: 20000.000000  bytes: 720000.000000
              Table#: 0  cost: 78.479517  card: 20000.000000  bytes: 1080000.000000
              Table#: 3  cost: 104.991556  card: 20000.000000  bytes: 1380000.000000

As you can see, when we start with (B A) the estimated cost drops dramatically.

Now that we’ve see that Oracle looks at many (though not a completely exhaustive set of) plans on the way to the one it picks the thing we need to do (in theory) is check that for every single calculation where B is “the next table”, Oracle obeys our hint. Each time the optimizer join “the next” table it usually considers the cost of a Nested Loop, a Sort Merge, and a Hash Join in that order; if the optimizer is obeying the hint it will only consider the nested loop join. Here’s a suitable call to egrep with the first four join orders::

egrep -e "^Join order" -e "^Now joining" -e"^NL Join" -e"^SM Join" -e"^HA Join" or21_ora_15956.trc

Join order[1]:  D[D]#0  A[A]#1  B[B]#2  C[C]#3
Now joining: A[A]#1
NL Join
SM Join
SM Join (with index on outer)
HA Join
Now joining: B[B]#2
NL Join
Now joining: C[C]#3
NL Join
SM Join
HA Join

Join order[2]:  D[D]#0  A[A]#1  C[C]#3  B[B]#2
Now joining: C[C]#3
NL Join
SM Join
HA Join
Now joining: B[B]#2
NL Join

Join order[3]:  D[D]#0  B[B]#2  A[A]#1  C[C]#3
Now joining: B[B]#2
NL Join
Now joining: A[A]#1
NL Join
SM Join
HA Join
Now joining: C[C]#3
NL Join
SM Join
HA Join

Join order[4]:  D[D]#0  B[B]#2  C[C]#3  A[A]#1
Now joining: C[C]#3
NL Join
Join order aborted2: cost > best plan cost


As you can see, the only join considered when “Now joining” B is a nested loop join; for all other tables the three possible joins (and sometimes two variants of the Sort Merge join) are evaluated.

You may also notice another of the clever strategies the optimizer uses to minimise its workload. On the second join order the optimizer goes straight to “Now joining C” because it has remembered the result of joining A from the previous join order.

This is only a very simple example and analysis, but I hope it’s given you some idea of how the optimizer works, and how clever it tries to be about minimising the work; and how it can obey a hint while still producing an execution plan that appears to have ignored the hint.

Adaptive Study

Mon, 2021-10-11 05:57

This is a little case study of adaptive optimisation in Oracle 19c with a surprising side-effect showing up when thfe optimizer gave the execution engine the option to “do the right thing” and the execution engine took it – except the “right thing” turned out to be a wrong thing.

We started with a request to the Oracle-L list server asking about the difference between the operations “table access by rowid” and “table access by rowid batched” and why changing the parameter “optimizer_adaptive_reporting_only” should make a plan switch from one to the other, and how much of a performance impact this would have because this was the only change that showed up in a plan that went from fast (enough) to very slow when the parameter was changed from true to false.

The batching (or not) of the table access really shouldn’t make much difference; the batch option tends to appear if there’s a “blocking” operation (such as a hash join) further up the execution plan, but the mechanism by which a rowsource is produced and passed up the tree is only likely to be affected very slightly. So there had to be something else going on.

Fortunately the OP had the SQL Monitor reports available from a fast / non-batched / reporting only = true run and a slow / batched / “reporting only = false” run. I’ve shown these below with the option to expand and contract them on demand:

Fast plan (reporting only): Click on this line to expand the “reporting only = true (fast)” plan
Global Information
------------------------------
 Status              :  DONE (ALL ROWS)         
 Instance ID         :  2                       
 Session             :  XXXXX (510:5394) 
 SQL ID              :  791qwn38bq6gv           
 SQL Execution ID    :  33554432                
 Execution Started   :  10/07/2021 11:46:56     
 First Refresh Time  :  10/07/2021 11:46:56     
 Last Refresh Time   :  10/07/2021 11:51:36     
 Duration            :  280s                    
 Module/Action       :  SQL*Plus/-              
 Service             :  XXXXX.XXXXX.com 
 Program             :  sqlplus.exe             
 Fetch Calls         :  370                     

Global Stats
===========================================================================
| Elapsed |   Cpu   |    IO    | Cluster  | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
===========================================================================
|     252 |     170 |       71 |       11 |   370 |    39M | 251K |   2GB |
===========================================================================

SQL Plan Monitoring Details (Plan Hash Value=250668601)
===============================================================================================================================================================================
| Id |                      Operation                       |             Name              |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  |  Mem  |
|    |                                                      |                               | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes | (Max) |
===============================================================================================================================================================================
|  0 | SELECT STATEMENT                                     |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
|  1 |   FILTER                                             |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
|  2 |    NESTED LOOPS OUTER                                |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  3 |     NESTED LOOPS OUTER                               |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  4 |      HASH JOIN OUTER                                 |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  5 |       NESTED LOOPS OUTER                             |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  6 |        STATISTICS COLLECTOR                          |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
|  7 |         NESTED LOOPS OUTER                           |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  8 |          HASH JOIN OUTER                             |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
|  9 |           NESTED LOOPS OUTER                         |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
| 10 |            STATISTICS COLLECTOR                      |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
| 11 |             NESTED LOOPS OUTER                       |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
| 12 |              NESTED LOOPS OUTER                      |                               |       1 |    3M |       279 |     +2 |     1 |       2M |       |       |     . |
| 13 |               NESTED LOOPS                           |                               |    272K |    2M |       279 |     +2 |     1 |       2M |       |       |     . |
| 14 |                NESTED LOOPS OUTER                    |                               |    272K |    2M |       279 |     +2 |     1 |       2M |       |       |     . |
| 15 |                 NESTED LOOPS                         |                               |    272K |    2M |       279 |     +2 |     1 |       2M |       |       |     . |
| 16 |                  NESTED LOOPS OUTER                  |                               |    272K |    1M |       279 |     +2 |     1 |       2M |       |       |     . |
| 17 |                   NESTED LOOPS                       |                               |    272K |    1M |       279 |     +2 |     1 |       2M |       |       |     . |
| 18 |                    FILTER                            |                               |         |       |       279 |     +2 |     1 |       2M |       |       |     . |
| 19 |                     NESTED LOOPS OUTER               |                               |    272K |  598K |       279 |     +2 |     1 |       2M |       |       |     . |
| 20 |                      VIEW                            | index$_join$_006              |    276K | 48299 |       279 |     +2 |     1 |       2M |       |       |     . |
| 21 |                       HASH JOIN                      |                               |         |       |       279 |     +2 |     1 |       2M |       |       | 132MB |
| 22 |                        HASH JOIN                     |                               |         |       |         2 |     +1 |     1 |       2M |       |       | 124MB |
| 23 |                         INDEX STORAGE FAST FULL SCAN | TET_IX2                       |    276K |  8505 |         1 |     +2 |     1 |       2M |       |       |     . |
| 24 |                         INDEX STORAGE FAST FULL SCAN | TET_IX4                       |    276K | 13077 |         1 |     +2 |     1 |       2M |       |       |     . |
| 25 |                        INDEX STORAGE FAST FULL SCAN  | TET_PK                        |    276K | 11889 |       279 |     +2 |     1 |       2M |   149 |  62MB |     . |
| 26 |                      TABLE ACCESS BY INDEX ROWID     | TT                            |       1 |     2 |       279 |     +2 |    2M |       2M |  2347 |  18MB |     . |
| 27 |                       INDEX UNIQUE SCAN              | TT_PK                         |       1 |     1 |       279 |     +2 |    2M |       2M |    11 | 90112 |     . |
| 28 |                    TABLE ACCESS BY INDEX ROWID       | TM                            |       1 |     2 |       279 |     +2 |    2M |       2M | 12476 |  97MB |     . |
| 29 |                     INDEX UNIQUE SCAN                | TM_PK                         |       1 |     1 |       279 |     +2 |    2M |       2M |  1683 |  13MB |     . |
| 30 |                   TABLE ACCESS BY INDEX ROWID        | TU                            |       1 |     1 |       257 |    +21 |    2M |    17764 |   137 |   1MB |     . |
| 31 |                    INDEX UNIQUE SCAN                 | TU_PK                         |       1 |       |       257 |    +21 |    2M |    17764 |     1 |  8192 |     . |
| 32 |                  TABLE ACCESS BY INDEX ROWID         | TEP                           |       1 |     2 |       279 |     +2 |    2M |       2M |  155K |   1GB |     . |
| 33 |                   INDEX UNIQUE SCAN                  | TEP_PK                        |       1 |     1 |       279 |     +2 |    2M |       2M |  1729 |  14MB |     . |
| 34 |                 TABLE ACCESS BY INDEX ROWID          | TLIM                          |       1 |     1 |       279 |     +2 |    2M |       2M |       |       |     . |
| 35 |                  INDEX UNIQUE SCAN                   | TLIM_PK                       |       1 |       |       279 |     +2 |    2M |       2M |       |       |     . |
| 36 |                TABLE ACCESS BY INDEX ROWID           | TLPSE                         |       1 |     1 |       279 |     +2 |    2M |       2M |       |       |     . |
| 37 |                 INDEX UNIQUE SCAN                    | TLPSE_PK                      |       1 |       |       279 |     +2 |    2M |       2M |       |       |     . |
| 38 |               INDEX RANGE SCAN                       | TCX_IX2                       |       1 |     2 |       279 |     +2 |    2M |       2M |  8870 |  69MB |     . |
| 39 |              TABLE ACCESS BY INDEX ROWID             | TC                            |       1 |     2 |       279 |     +2 |    2M |       2M | 14648 | 114MB |     . |
| 40 |               INDEX UNIQUE SCAN                      | TC_PK                         |       1 |     1 |       279 |     +2 |    2M |       2M |   157 |   1MB |     . |
| 41 |            INDEX RANGE SCAN                          | TCX_PK                        |       1 |     2 |       279 |     +2 |    2M |       2M |       |       |     . |
| 42 |           INDEX RANGE SCAN                           | TCX_PK                        |       1 |     2 |           |        |       |          |       |       |     . |
| 43 |          TABLE ACCESS BY INDEX ROWID                 | TC                            |       1 |     2 |       279 |     +2 |    2M |       2M | 16037 | 125MB |     . |
| 44 |           INDEX UNIQUE SCAN                          | TC_PK                         |       1 |     1 |       279 |     +2 |    2M |       2M |   224 |   2MB |     . |
| 45 |        TABLE ACCESS BY INDEX ROWID                   | TP                            |       1 |     3 |       279 |     +2 |    2M |       2M |       |       |     . |
| 46 |         INDEX RANGE SCAN                             | TP_PK                         |      15 |     1 |       279 |     +2 |    2M |      28M |       |       |     . |
| 47 |       TABLE ACCESS BY INDEX ROWID                    | TP                            |       1 |     3 |           |        |       |          |       |       |     . |
| 48 |        INDEX RANGE SCAN                              | TP_PK                         |      15 |     1 |           |        |       |          |       |       |     . |
| 49 |      TABLE ACCESS STORAGE FULL FIRST ROWS            | TLIET                         |       1 |     3 |       279 |     +2 |    2M |       2M |       |       |     . |
| 50 |     VIEW PUSHED PREDICATE                            | TEB_VW                        |       1 |    57 |       256 |    +24 |    2M |     1459 |       |       |     . |
| 51 |      NESTED LOOPS OUTER                              |                               |       1 |    57 |       272 |     +8 |    2M |     1459 |       |       |     . |
| 52 |       NESTED LOOPS                                   |                               |       1 |    55 |       256 |    +24 |    2M |     1459 |       |       |     . |
| 53 |        NESTED LOOPS                                  |                               |       1 |    53 |       256 |    +24 |    2M |     1459 |       |       |     . |
| 54 |         NESTED LOOPS                                 |                               |       1 |    51 |       272 |     +9 |    2M |     1459 |       |       |     . |
| 55 |          NESTED LOOPS                                |                               |       5 |    41 |       279 |     +2 |    2M |     6965 |       |       |     . |
| 56 |           NESTED LOOPS                               |                               |       1 |     7 |       279 |     +2 |    2M |     770K |       |       |     . |
| 57 |            NESTED LOOPS                              |                               |       1 |     4 |       279 |     +2 |    2M |     770K |       |       |     . |
| 58 |             NESTED LOOPS                             |                               |       1 |     3 |       279 |     +2 |    2M |     770K |       |       |     . |
| 59 |              TABLE ACCESS BY INDEX ROWID             | TEP                           |       1 |     3 |       279 |     +2 |    2M |     770K |       |       |     . |
| 60 |               INDEX UNIQUE SCAN                      | TEP_PK                        |       1 |     2 |       279 |     +2 |    2M |       2M |       |       |     . |
| 61 |              INDEX RANGE SCAN                        | TLP_IX1                       |       1 |       |       279 |     +2 |  770K |     770K |       |       |     . |
| 62 |             VIEW                                     |                               |       1 |     1 |       279 |     +2 |  770K |     770K |       |       |     . |
| 63 |              SORT AGGREGATE                          |                               |       1 |       |       279 |     +2 |  770K |     770K |       |       |     . |
| 64 |               TABLE ACCESS BY INDEX ROWID            | TPR                           |       1 |     1 |       279 |     +2 |  770K |     770K |       |       |     . |
| 65 |                INDEX UNIQUE SCAN                     | TPR_PK                        |       1 |       |       279 |     +2 |  770K |     770K |       |       |     . |
| 66 |            TABLE ACCESS BY INDEX ROWID               | TET                           |       1 |     3 |       279 |     +2 |  770K |     770K | 28892 | 226MB |     . |
| 67 |             INDEX RANGE SCAN                         | TET_Ix1                       |       1 |     2 |       279 |     +2 |  770K |     899K |  6957 |  54MB |     . |
| 68 |           TABLE ACCESS BY INDEX ROWID                | TWE                           |       5 |    34 |       272 |     +9 |  770K |     6965 |   890 |   7MB |     . |
| 69 |            INDEX RANGE SCAN                          | TWE_IDX1                      |      35 |     2 |       272 |     +9 |  770K |     6965 |    22 | 176KB |     . |
| 70 |          TABLE ACCESS BY INDEX ROWID                 | TT                            |       1 |     2 |       272 |     +9 |  6965 |     1459 |       |       |     . |
| 71 |           INDEX UNIQUE SCAN                          | TT_PK                         |       1 |     1 |       272 |     +9 |  6965 |     6965 |       |       |     . |
| 72 |         INDEX RANGE SCAN                             | TCX_IX2                       |       1 |     2 |       256 |    +24 |  1459 |     1459 |   932 |   7MB |     . |
| 73 |        TABLE ACCESS BY INDEX ROWID                   | TC                            |       1 |     2 |       256 |    +24 |  1459 |     1459 |       |       |     . |
| 74 |         INDEX UNIQUE SCAN                            | TC_PK                         |       1 |     1 |       256 |    +24 |  1459 |     1459 |       |       |     . |
| 75 |       TABLE ACCESS BY INDEX ROWID                    | TLS                           |       1 |     2 |       256 |    +24 |  1459 |     1451 |       |       |     . |
| 76 |        INDEX SKIP SCAN                               | TLS_PK                        |       1 |     1 |       256 |    +24 |  1459 |     1451 |       |       |     . |
| 77 |    SORT AGGREGATE                                    |                               |       1 |       |       279 |     +2 |    2M |       2M |       |       |     . |
| 78 |     FIRST ROW                                        |                               |       1 |     3 |       279 |     +2 |    2M |       2M |       |       |     . |
| 79 |      INDEX RANGE SCAN (MIN/MAX)                      | TCX_IX2                       |       1 |     3 |       279 |     +2 |    2M |       2M |       |       |     . |
===============================================================================================================================================================================

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------
    0 -  STATEMENT
         U -  first_rows / hint overridden by another in parent query block
           -  first_rows
 
  56 -  SEL$5
           -  no_merge
 
Note
-----
   - this is an adaptive plan
Slow plan (runtime adapted): Click on this line to expand the “reporting_only = false (slow)” plan
Global Information
------------------------------
 Status              :  DONE (ALL ROWS)          
 Instance ID         :  2                        
 Session             :  XXXXX (509:27860) 
 SQL ID              :  8t19y7v5j9ztg            
 SQL Execution ID    :  33554432                 
 Execution Started   :  10/07/2021 07:56:09      
 First Refresh Time  :  10/07/2021 07:56:09      
 Last Refresh Time   :  10/07/2021 08:07:17      
 Duration            :  668s                     
 Module/Action       :  SQL*Plus/-               
 Service             :  XXXXX.XXXXX.com  
 Program             :  sqlplus.exe              
 Fetch Calls         :  370                      

Global Stats
==========================================================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  | Fetch | Buffer | Read | Read  | Write | Write |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs | Bytes | Reqs  | Bytes | Returned Bytes |
==========================================================================================================================
|     705 |     280 |      270 |        0.00 |      155 |   370 |    40M | 984K |  11GB |  6422 |   3GB |            6GB |
==========================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3015036808)
========================================================================================================================================================================================================
| Id |                      Operation                       |             Name              |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write |  Mem  | Temp  |
|    |                                                      |                               | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Reqs  | Bytes | (Max) | (Max) |
========================================================================================================================================================================================================
|  0 | SELECT STATEMENT                                     |                               |         |       |       512 |   +157 |     1 |       2M |       |       |       |       |     . |     . |
|  1 |   FILTER                                             |                               |         |       |       512 |   +157 |     1 |       2M |       |       |       |       |     . |     . |
|  2 |    NESTED LOOPS OUTER                                |                               |       1 |    3M |       512 |   +157 |     1 |       2M |       |       |       |       |     . |     . |
|  3 |     NESTED LOOPS OUTER                               |                               |       1 |    3M |       512 |   +157 |     1 |       2M |       |       |       |       |     . |     . |
|  4 |      HASH JOIN OUTER                                 |                               |       1 |    3M |       538 |   +131 |     1 |       2M |  3387 |   2GB |  3387 |   2GB | 450MB |   2GB |
|  5 |       NESTED LOOPS OUTER                             |                               |       1 |    3M |        27 |   +131 |     1 |       2M |       |       |       |       |     . |     . |
|  6 |        STATISTICS COLLECTOR                          |                               |         |       |        27 |   +131 |     1 |       2M |       |       |       |       |     . |     . |
|  7 |         NESTED LOOPS OUTER                           |                               |       1 |    3M |        27 |   +131 |     1 |       2M |       |       |       |       |     . |     . |
|  8 |          HASH JOIN OUTER                             |                               |       1 |    3M |       155 |     +3 |     1 |       2M |  3035 |   1GB |  3035 |   1GB | 309MB |   1GB |
|  9 |           NESTED LOOPS OUTER                         |                               |       1 |    3M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 10 |            STATISTICS COLLECTOR                      |                               |         |       |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 11 |             NESTED LOOPS OUTER                       |                               |       1 |    3M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 12 |              NESTED LOOPS OUTER                      |                               |       1 |    3M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 13 |               NESTED LOOPS                           |                               |    272K |    2M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 14 |                NESTED LOOPS OUTER                    |                               |    272K |    2M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 15 |                 NESTED LOOPS                         |                               |    272K |    2M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 16 |                  NESTED LOOPS OUTER                  |                               |    272K |    1M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 17 |                   NESTED LOOPS                       |                               |    272K |    1M |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 18 |                    FILTER                            |                               |         |       |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 19 |                     NESTED LOOPS OUTER               |                               |    272K |  598K |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 20 |                      VIEW                            | index$_join$_006              |    276K | 48299 |       129 |     +3 |     1 |       2M |       |       |       |       |     . |     . |
| 21 |                       HASH JOIN                      |                               |         |       |       129 |     +3 |     1 |       2M |       |       |       |       | 132MB |     . |
| 22 |                        HASH JOIN                     |                               |         |       |         3 |     +1 |     1 |       2M |       |       |       |       | 124MB |     . |
| 23 |                         INDEX STORAGE FAST FULL SCAN | TET_IX2                       |    276K |  8505 |         1 |     +1 |     1 |       2M |   129 |  54MB |       |       |     . |     . |
| 24 |                         INDEX STORAGE FAST FULL SCAN | TET_IX4                       |    276K | 13077 |         3 |     +1 |     1 |       2M |   167 |  81MB |       |       |     . |     . |
| 25 |                        INDEX STORAGE FAST FULL SCAN  | TET_PK                        |    276K | 11889 |       129 |     +3 |     1 |       2M |   198 |  61MB |       |       |     . |     . |
| 26 |                      TABLE ACCESS BY INDEX ROWID     | TT                            |       1 |     2 |       129 |     +3 |    2M |       2M |  1488 |  12MB |       |       |     . |     . |
| 27 |                       INDEX UNIQUE SCAN              | TT_PK                         |       1 |     1 |       129 |     +3 |    2M |       2M |     7 | 57344 |       |       |     . |     . |
| 28 |                    TABLE ACCESS BY INDEX ROWID       | TM                            |       1 |     2 |       129 |     +3 |    2M |       2M |  9875 |  77MB |       |       |     . |     . |
| 29 |                     INDEX UNIQUE SCAN                | TM_PK                         |       1 |     1 |       129 |     +3 |    2M |       2M |  1235 |  10MB |       |       |     . |     . |
| 30 |                   TABLE ACCESS BY INDEX ROWID        | TU                            |       1 |     1 |       119 |    +11 |    2M |    17764 |       |       |       |       |     . |     . |
| 31 |                    INDEX UNIQUE SCAN                 | TU_PK                         |       1 |       |       119 |    +11 |    2M |    17764 |       |       |       |       |     . |     . |
| 32 |                  TABLE ACCESS BY INDEX ROWID         | TEP                           |       1 |     2 |       129 |     +3 |    2M |       2M |  140K |   1GB |       |       |     . |     . |
| 33 |                   INDEX UNIQUE SCAN                  | TEP_PK                        |       1 |     1 |       129 |     +3 |    2M |       2M |  1478 |  12MB |       |       |     . |     . |
| 34 |                 TABLE ACCESS BY INDEX ROWID          | TLIM                          |       1 |     1 |       129 |     +3 |    2M |       2M |       |       |       |       |     . |     . |
| 35 |                  INDEX UNIQUE SCAN                   | TLIM_PK                       |       1 |       |       129 |     +3 |    2M |       2M |       |       |       |       |     . |     . |
| 36 |                TABLE ACCESS BY INDEX ROWID           | TLPSE                         |       1 |     1 |       129 |     +3 |    2M |       2M |       |       |       |       |     . |     . |
| 37 |                 INDEX UNIQUE SCAN                    | TLPSE_PK                      |       1 |       |       129 |     +3 |    2M |       2M |       |       |       |       |     . |     . |
| 38 |               INDEX RANGE SCAN                       | TCX_IX2                       |       1 |     2 |       129 |     +3 |    2M |       2M |  4642 |  36MB |       |       |     . |     . |
| 39 |              TABLE ACCESS BY INDEX ROWID             | TC                            |       1 |     2 |       129 |     +3 |    2M |       2M | 22307 | 174MB |       |       |     . |     . |
| 40 |               INDEX UNIQUE SCAN                      | TC_PK                         |       1 |     1 |       129 |     +3 |    2M |       2M |   546 |   4MB |       |       |     . |     . |
| 41 |            INDEX RANGE SCAN                          | TCX_PK                        |       1 |     2 |           |        |       |          |       |       |       |       |     . |     . |
| 42 |           INDEX RANGE SCAN                           | TCX_PK                        |       1 |     2 |         1 |   +131 |     1 |     976K |       |       |       |       |     . |     . |
| 43 |          TABLE ACCESS BY INDEX ROWID                 | TC                            |       1 |     2 |        27 |   +131 |    2M |       2M | 21549 | 168MB |       |       |     . |     . |
| 44 |           INDEX UNIQUE SCAN                          | TC_PK                         |       1 |     1 |        27 |   +131 |    2M |       2M |   959 |   7MB |       |       |     . |     . |
| 45 |        TABLE ACCESS BY INDEX ROWID BATCHED           | TP                            |       1 |     3 |           |        |       |          |       |       |       |       |     . |     . |
| 46 |         INDEX RANGE SCAN                             | TP_PK                         |      15 |     1 |           |        |       |          |       |       |       |       |     . |     . |
| 47 |       TABLE ACCESS BY INDEX ROWID BATCHED            | TP                            |       1 |     3 |        36 |   +157 |     1 |       15 |       |       |       |       |     . |     . |
| 48 |        INDEX RANGE SCAN                              | TP_PK                         |      15 |     1 |        36 |   +157 |     1 |       15 |       |       |       |       |     . |     . |
| 49 |      TABLE ACCESS STORAGE FULL FIRST ROWS            | TLIET                         |       1 |     3 |       512 |   +157 |    2M |       2M |       |       |       |       |     . |     . |
| 50 |     VIEW PUSHED PREDICATE                            | TEB_VW                        |       1 |    57 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 51 |      NESTED LOOPS OUTER                              |                               |       1 |    57 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 52 |       NESTED LOOPS                                   |                               |       1 |    55 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 53 |        NESTED LOOPS                                  |                               |       1 |    53 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 54 |         NESTED LOOPS                                 |                               |       1 |    51 |       506 |   +163 |    2M |     1459 |       |       |       |       |     . |     . |
| 55 |          NESTED LOOPS                                |                               |       5 |    41 |       510 |   +159 |    2M |     6965 |       |       |       |       |     . |     . |
| 56 |           NESTED LOOPS                               |                               |       1 |     7 |       510 |   +159 |    2M |     770K |       |       |       |       |     . |     . |
| 57 |            NESTED LOOPS                              |                               |       1 |     4 |       510 |   +159 |    2M |     770K |       |       |       |       |     . |     . |
| 58 |             NESTED LOOPS                             |                               |       1 |     3 |       510 |   +159 |    2M |     770K |       |       |       |       |     . |     . |
| 59 |              TABLE ACCESS BY INDEX ROWID             | TEP                           |       1 |     3 |       512 |   +157 |    2M |     770K |  661K |   5GB |       |       |     . |     . |
| 60 |               INDEX UNIQUE SCAN                      | TEP_PK                        |       1 |     2 |       512 |   +157 |    2M |       2M |  2934 |  23MB |       |       |     . |     . |
| 61 |              INDEX RANGE SCAN                        | TLP_IX1                       |       1 |       |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 62 |             VIEW                                     |                               |       1 |     1 |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 63 |              SORT AGGREGATE                          |                               |       1 |       |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 64 |               TABLE ACCESS BY INDEX ROWID            | TPR                           |       1 |     1 |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 65 |                INDEX UNIQUE SCAN                     | TPR_PK                        |       1 |       |       510 |   +159 |  770K |     770K |       |       |       |       |     . |     . |
| 66 |            TABLE ACCESS BY INDEX ROWID BATCHED       | TET                           |       1 |     3 |       511 |   +158 |  770K |     770K | 79759 | 623MB |       |       |     . |     . |
| 67 |             INDEX RANGE SCAN                         | TET_Ix1                       |       1 |     2 |       510 |   +159 |  770K |     899K | 15834 | 124MB |       |       |     . |     . |
| 68 |           TABLE ACCESS BY INDEX ROWID BATCHED        | TWE                           |       5 |    34 |       506 |   +163 |  770K |     6965 |  2080 |  16MB |       |       |     . |     . |
| 69 |            INDEX RANGE SCAN                          | TWE_IDX1                      |      35 |     2 |       506 |   +163 |  770K |     6965 |   118 | 944KB |       |       |     . |     . |
| 70 |          TABLE ACCESS BY INDEX ROWID                 | TT                            |       1 |     2 |       506 |   +163 |  6965 |     1459 |   208 |   2MB |       |       |     . |     . |
| 71 |           INDEX UNIQUE SCAN                          | TT_PK                         |       1 |     1 |       506 |   +163 |  6965 |     6965 |       |       |       |       |     . |     . |
| 72 |         INDEX RANGE SCAN                             | TCX_IX2                       |       1 |     2 |       506 |   +163 |  1459 |     1459 |  1388 |  11MB |       |       |     . |     . |
| 73 |        TABLE ACCESS BY INDEX ROWID                   | TC                            |       1 |     2 |       506 |   +163 |  1459 |     1459 |   936 |   7MB |       |       |     . |     . |
| 74 |         INDEX UNIQUE SCAN                            | TC_PK                         |       1 |     1 |       506 |   +163 |  1459 |     1459 |    75 | 600KB |       |       |     . |     . |
| 75 |       TABLE ACCESS BY INDEX ROWID BATCHED            | TLS                           |       1 |     2 |       506 |   +163 |  1459 |     1451 |     1 |  8192 |       |       |     . |     . |
| 76 |        INDEX SKIP SCAN                               | TLS_PK                        |       1 |     1 |       506 |   +163 |  1459 |     1451 |     1 |  8192 |       |       |     . |     . |
| 77 |    SORT AGGREGATE                                    |                               |       1 |       |       512 |   +157 |    2M |       2M |       |       |       |       |     . |     . |
| 78 |     FIRST ROW                                        |                               |       1 |     3 |       512 |   +157 |    2M |       2M |       |       |       |       |     . |     . |
| 79 |      INDEX RANGE SCAN (MIN/MAX)                      | TCX_IX2                       |       1 |     3 |       512 |   +157 |    2M |       2M |  9356 |  73MB |       |       |     . |     . |
=======================================================================================================================================================================================================

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------
   0 -  STATEMENT
         U -  first_rows / hint overridden by another in parent query block
           -  first_rows
 
  56 -  SEL$5
           -  no_merge
 
Note
-----
   - this is an adaptive plan

If you want to pull these plans into separate windows and compare (nothing but) the Operations and Names line by line you’ll find that the only differences appear at operations 45, 47, 66, 68, and 75:

| 45 |        TABLE ACCESS BY INDEX ROWID                   | TP                            |
| 47 |       TABLE ACCESS BY INDEX ROWID                    | TP                            |
| 66 |            TABLE ACCESS BY INDEX ROWID               | TET                           |
| 68 |           TABLE ACCESS BY INDEX ROWID                | TWE                           |
| 75 |       TABLE ACCESS BY INDEX ROWID                    | TLS                           |

| 45 |        TABLE ACCESS BY INDEX ROWID BATCHED           | TP                            |
| 47 |       TABLE ACCESS BY INDEX ROWID BATCHED            | TP                            |
| 66 |            TABLE ACCESS BY INDEX ROWID BATCHED       | TET                           |
| 68 |           TABLE ACCESS BY INDEX ROWID BATCHED        | TWE                           |
| 75 |       TABLE ACCESS BY INDEX ROWID BATCHED            | TLS                           |

So what could possibly make one plan so much slower than the other?

There are all sorts of bits and pieces in these plans that you need to be able to spot “in passing” if you want to become fluent at understanding execution plans. It’s something that takes a lot of practise but there’s one general tip (or warning, perhaps) that I can offer.

If you start out by looking for one particular thing you’ll miss lots of important clues; on a first pass through the plan just try to notice anything that looks a little interesting or informative, then go back for a more detailed examination on a second pass through the plan.

I won’t go through the entire pattern of thought that went through my mind as I started looking at these plans, but here are a couple of flags I raised

  • Adaptive plans in SQL Monitor – we’re likely to see some “statistics collector” operations and that’s the “obvious” source of the anomaly, but reading plans that include their adaptive bits can be a mind-bending experience.
  • Global Stats of the slow one says 984K read requests (compared to 251K for the fast plan) – that might explain the difference in timing – keep an eye out for where the big numbers appear. (NB Don’t, at this point, go looking for them as that may lead you into missing the real issue.)
  • The slow plan plan shows the top operation with a Start Active of +157 while the fast plan has a start active of +2: that’s pretty consistent with a comment the user made (but I hadn’t mentioned) about the response time from the user’s persective; and it tells us that there’s a blocking operation somewhere in the slow plan. That’s nice because that’s what we might expect from seeing an adaptive plan switching from a nested loop to a hash join. (So we already think we’re a clever bunny – too bad it wasn’t quite the right answer.)
  • There are two places in the plan which report Statistics Collector, with one Nested Loop Outer then a Hash Join Outer immediately above them as the two candidate consumers for the rowsource supplied by the collector’s child operation. (Note: in some cases you will see two Nested Loop operations above a Statistics Collector before you get to the Hash Join, but that’s just because Oracle may implement a nested loop join in two steps, first to the index, second to the table)

With that the warning about not digging “too deep too soon” in mind, here are a few potentially significant observations;. In the following notes I’ll simply refer to the plans as the “fast” plan and the “slow” plan..

  • The slow plan shows a large hash join with spill to disc at operation 4, and that’s one of the options triggered by the statistics collector at operation 6. It’s a little confusing that the nested loop join following it also reports execution statistics since only one of the two lines would have been doing anything, but we’ll assume for now that that’s a reporting error and postpone making a fuss about it. The same pattern appears at operations 8, 9 and 10.
  • A cross-check to the fast plan shows that the hash joins don’t do any work at corresponding operations (although it also displays the oddity of operation that (we assume) didn’t actually run reporting a long Time Active.
  • The slow plan has an oddity at operation 41 – it’s an index range scan of a primary key (TCX_PK) that doesn’t report any execution statistics. Cross-checking to the slow plan we see that it’s operation 42 (which is also an index range scan, and using the same index !) that doesn’t report any execution statistics. We note that the fast plan “Starts” its range scan 2 million times, while the slow plan starts just once, starting at time +131 and having an active time of 1 second. [side-note: I am a little suspicious of that number – It looks to me as if it ought to be reporting 27 seconds]
  • Keep going – because just a bit further down we see that the slow plan has no stats for operations 45 and 46 (index range scan of TP_PK with table access of TP) while the fast plan has no stats for operations 47 and 48 (also an index range scan of TP_PK with table access to TP). Again we see the same pattern that the slow plan executes the operation just once while the fast plan executes its operations 2M times.
  • Keep going – the previous two observations are interesing and probably need further investigation, but they might not be critical. The very next line (operation 49) in both plans shows us a “TABLE ACCESS STORAGE FULL FIRST ROWS” that executes 2 million times – that’s got to hurt, surely, but let’s not get side-tracked just yet.
  • Operation 50 is a “VIEW PUSHED PREDICATE” – that almost certainly means it’s the second child of a nested loop join with a join predicate pushed into a non-mergeable view (and the view name is TEB_VW so it’s not a view created by an internal transformation) and the operation has, like so many other lines in the plan, started 2 million times.
  • Looking at the rest of the plan, there are no more statistics collectors and the plans have an exact match on operations. Unfortunately we don’t have a Predicate Information section, so we can’t tell whether matching operations were really doing the same thing e.g. was an index range scan in one plan using more columns to probe the index than the corresponging index range scan in the other plan. However we can check times:
  • The View Pushed Predicate in the fast plan starts at time +24 [Another slightly suspicious time given all the +2 starts in the locale] and is active for 256 seconds, while in the slow plan it starts at time +163 and is active for 506 seconds. So it looks as if a lot of the excess run time of the query time is spent in this part of the plan — for no logical reason that we can see.
  • Again we take a quick note, and move on. The final observation is that the last three lines of the plan look like the plan for a subquery block (executed very efficiently) of the “find the most recent / highest / lowest” type, and a quick check to the top of the plan shows that its parent is a FILTER operation, corroborating our first guess.

Reviewing the first pass we can see that we lose a lot of “startup” time to the two hash joins where the build table in each case has to be completed before any further tables can be joined. This is in the order of 160 seconds, which is consistent with the OP’s observations, and it’s happening because adaptive plans are activated, triggering a change from nested loop joins to hash joins.

More significantly, from the perspective of time, is that the nested loop join into the View Pushed Predicate is active for twice as long in the slow plan as it is in the fast plan – so that’s a place to look a little more closely, revealing that operation 59 is probably the reason for the difference: 661 thousand read requests in the slow plan but none in the fast plan.

Unfortunately we don’t have any Activity Stats (i.e. active session history data) in the report, but since the access to the table is reported as unique access by unique index in both cases we can be fairly sure that the difference isn’t due to a difference in the Predicate Information (that isn’t in the report).

Clearly we need to stop the stop the adaptive plan from being taken to avoid the start-up delay – e.g. add a /*+ no_adaptive_plan */ hint to the query, but that still leaves two puzzles:

  1. why are the rows estimates so bad (and at least part of the reason for that is that it turned out that the query was being optimized with optimizer_mode = first_rows – that’s the legacy first_rows, not a cost-based first_rows_N);
  2. how could the same sub-plan result in far more physical reads in one case compared to the other when the critical operation is a unique index access.

The answer to the second question could be in an observation I first published 14 years ago – and it could indicate a generic threat to adaptive optimisation.

If you have an execution plan which, stripped to a minimum, looks like this:

Join Operation
        Table_X
        Table_Y

The order in which the result appears is likely to change depending on the join mechanism that Oracle chooses, viz Hash Join, Merge Join or Nested Loop Join.

Under “non-adaptive” conditions if you have a join that’s border-line between a hash join and a nested loop join it frequently means that the optimizer will fip flop between two plans like the following (leading to the classic question – nothing changed why did the plan change):

Hash Join
        Table_X
        Table_Y

Nested Loop Join
        Table_Y
        Table_X

Note that the order in which the tables appear is reversed.

As it says in another article of mine: all joins are nested loop joins, just with different startup costs”. In both the plans above Oracle picks a row from Table_Y and looks for a match in Table_X, so the order of the result set is dictated by the Table_Y regardless of whether the join is a hash join or a nested loop join. However, if Oracle has decided to use an adaptive plan and starts with the nested loop (Y -> X) and decides to switch to a hash join it doesn’t swap the join order as the join mechanism is selected, so a result set whose order would have been dictated by Table_Y turns into the same result set (we hope) but in an order dictated by Table_X.

Consequences:

If you’re using very big tables and Oracle produces an adaptive nested loop join early in the plan, this may result in a later nested loop being lucky and inducing lots of “self-caching” because its driving rowsource is in a nice order. If the plan adapts to a hash join the driving data set may appear in a completely different order that makes the later nested loop jump randomly around a very large table, inducing a lot of “self-flushing” as one table block is discarded from the buffer cache to make space for another. (I published an article several years ago about how a similar – though far from identical – type of anomaly could happen with Exadata and compression: an unlucky order of data access causing a massive extra workload.)

Conclusion and further thoughts

In this note I’ve tried to present my thoughts as I’ve read through an execution plan trying to understand what it’s doing and why it’s showing the performance characteristics it does.

In this case the interpretation was made harder because the plan was an adaptive plan – and there doesn’t appear to be an option in the procedure in dbms_sql_monitor to choose between hiding and revealing the adaptive parts [ed: this statement may apply only to the text option – see comment #1 for a counter-example using the ‘Active HTML” option]; moreover there was no Activity (ASH) information supplied and we didn’t have the Predicate Information.

The performance “issue” was that when adaptive plans were allowed (as opposed to reported only) we could see that two nested loops changed to hash joins. It was fairly clear that this explained the huge delay before results started to appear, but didn’t explain why the query took so much longer to complete.

We have a hypothesis that the extra run time of the query was due to “bad luck” because we can see very clearly that a nested loop into a non-mergeable view with pushed predicate reports a huge number of single block read requests; and we know that changing a join from a nested loop to a hash join without changing the order of the child operations will change the order in which the join’s rowsource is generated.

Ini this case the query was executing under the legacy first_rows optimizer mode, and it’s possible that if first_rows_N had been used the optimizer would have behaved differently, especially since we have a query that is returning 2M rows and we only want the first few rows.

Next Steps

The obvious “next step” in this investigation is to check whether first_rows_N co-operates nicely with adaptive optimisation. After all, the only significant thing that adaptive optimisation does to (serial) execution plans is set an inflexion point to dictate when a nested loop should change to a hash join – and a hash join is a blocking operation which is rarely a good thing for a first_rows_N plan.

So, does first_rows_N disable this adaptive plan analysis, does it move the inflection point significantly, or does the optimizer simply forget that hash joins are less desirable in first_rows N optimisation. And if you’re running a system in first_rows_N mode should you disable adaptive plans by default, and only enable it for special cases.

I also have an urge to test a couple of ideas about why the two timing anomalies I mentioned have appeared, but it’s already taken me several hours to write notes (including a few replies to the list server) about the 30 minutes I’ve spent looking at an execution plan, so that’s just another couple of items on my to-do list.

Hints and Costs

Thu, 2021-10-07 06:06

This note is one I drafted three years ago, based on a question from the Oracle-L. It doesn’t directly address that question because at the time I was unable to create a data set that reproduced the problem’ but it did highlight a detail that’s worth mentioning, so I’ve finally got around to completing it (and testing on a couple of newer versions of Oracle).

I’ll start with a model that was supposed to demonstrate the problem behind the question:


rem
rem     Script:         122_or_expand.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2018
rem     Purpose:        
rem
rem     Last tested
rem             21.3.0.0
rem             19.11.0.0
rem             12.2.0.1
rem

create table t1
segment creation immediate
pctfree 80 pctused 20
nologging
as
select
        *
from
        all_objects
where
        rownum <= 50000
;

alter table t1 add constraint t1_pk
        primary key(object_id)
        using index pctfree 80
;

variable b1 number
variable b2 number
variable b3 number
variable b4 number
variable b5 number
variable b6 number

exec :b1 := 100
exec :b2 := 120
exec :b3 := 1100
exec :b4 := 1220
exec :b5 := 3100
exec :b6 := 3320

set serveroutput off

select
        object_name
from 
        t1
where
        object_id between :b1 and :b2
or      object_id between :b3 and :b4
or      object_id between :b5 and :b6
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));


The critical feature of the query is the list of disjuncts (ORs) which all specify a range for object_id. The problem was that the query used a plan with an index full scan when there were no statistics on the table (or its indexes), but switched to a plan  that used index range scans when statistics were gathered – and the performance of the plan with the full scan was unacceptable.  (Clearly the “proper” solution is to have some suitable statistics in place – but sometimes such things are out of the control of the people who have to solve the problems.)

The /*+ index() */ and (undocumented) /*+ index_rs_asc() */ hints had no effect on the plan. The reason why the /*+ index() */ hint made no difference is because an index full scan is one of the ways in which the /*+ index() */ hint can be obeyed – the hint doesn’t instruct the optimizer to pick an index range scan. The hint /*+ index_rs_asc() */ specifically tells the optimizer to pick an index Range Scan ASCending if the hint has been specified correctly and the choice is available and legal. So why was the optimizer not doing as it was told. Without seeing the execution plan or CBO trace file from a live example I can’t guarantee that the following hypothesis is correct, but I think it’s in the right ball park.

I think the optimizer was probably using the (new to 12c) cost-based“OR expansion” transformation, which basically transformed the query into a UNION ALL of several index range scans – and that’s why its outline would show /*+ index_rs_asc() */ hints, and the hint would only become valid after the transformation had taken place so if Oracle didn’t consider (or considered and discarded) the transformation when there were no stats in place then the hint would have to be “Unused” (as the new 19c hint-report would say).

When I tried to model the problem the optimizer kept doing nice things with my data, so I wasn’t able to demonstrate the OP’s problem. However in one of my attempts to get a silly plan I did something silly – that can happen by accident if your client code isn’t careful! I’ll tell you what that was in a moment – first, a couple of plans.

As it stands, with the data and bind variables as shown, the optimizer used “b-tree / bitmap conversion” to produce an execution plan that did three separate index range scans, converts rowids to bit, OR-ed the bit-strings, then converted back to rowids before accessing the table:

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |       |       |    84 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   291 | 12804 |    84   (5)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |       |       |       |            |          |
|   3 |    BITMAP OR                        |       |       |       |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS   |       |       |       |            |          |
|   5 |      SORT ORDER BY                  |       |       |       |            |          |
|*  6 |       INDEX RANGE SCAN              | T1_PK |       |       |     2   (0)| 00:00:01 |
|   7 |     BITMAP CONVERSION FROM ROWIDS   |       |       |       |            |          |
|   8 |      SORT ORDER BY                  |       |       |       |            |          |
|*  9 |       INDEX RANGE SCAN              | T1_PK |       |       |     2   (0)| 00:00:01 |
|  10 |     BITMAP CONVERSION FROM ROWIDS   |       |       |       |            |          |
|  11 |      SORT ORDER BY                  |       |       |       |            |          |
|* 12 |       INDEX RANGE SCAN              | T1_PK |       |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

So the first thing I had to do was disable this feature, which I did by adding the hint /*+ opt_param(‘_b_tree_bitmap_plans’,’false’) */ to the query. This adjustment left Oracle doing the OR-expansion that I didn’t want to see:


----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       |   297 (100)|          |
|   1 |  VIEW                                  | VW_ORE_BA8ECEFB |   288 | 19008 |   297   (1)| 00:00:01 |
|   2 |   UNION-ALL                            |                 |       |       |            |          |
|*  3 |    FILTER                              |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    18 |   792 |    20   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T1_PK           |    18 |       |     2   (0)| 00:00:01 |
|*  6 |    FILTER                              |                 |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    97 |  4268 |   100   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T1_PK           |    97 |       |     2   (0)| 00:00:01 |
|*  9 |    FILTER                              |                 |       |       |            |          |
|  10 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |   173 |  7612 |   177   (1)| 00:00:01 |
|* 11 |      INDEX RANGE SCAN                  | T1_PK           |   173 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

You’ll notice that the three range scans have different row estimates and costs – that’s the effect of bind variable peeking and my careful choice of bind variables to define different sized ranges. Take note, by the way, for the three filter predicates flagged at operations 3, 6, and 9.  These are the “conditional plan” filters that say things like: “don’t run the sub-plan if the runtime value of :b5 is greater than :b6”.

Since I didn’t want to see OR-expansion just yet I then added the hint /*+ no_or_expand(@sel$1) */ to the query and that gave me a plan with tablescan:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   617 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   291 | 12804 |   617   (4)| 00:00:01 |
--------------------------------------------------------------------------

This was a shame because I really wanted to see the optimizer produce an index full scan at this point – so I decided to add an “unnamed index” hint to the growing list of hints – specifically: /*+ index_(@sel$1 t1@sel$1) */

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |       |       |   405 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   291 | 12804 |   405   (2)| 00:00:01 |
|*  2 |   INDEX FULL SCAN                   | T1_PK |   291 |       |   112   (7)| 00:00:01 |
---------------------------------------------------------------------------------------------

This, of course, is where things started to get a little interesting – the index full scan costs less than the tablescan but didn’t appear until hinted. But after a moment’s thought you can dismiss this one (possibly correctly) as an example of the optimizer being cautious about the cost of access paths that are dictated by bind variables or unpeekable inputs. (But these bind variables were peekable – so maybe there’s more to it than that – I was still trying to get to a point where my model would behave more like the OP’s, so I didn’t follow up on this detail: maybe in a couple of years time … ).

Once last tweak – and that will bring me to the main point of this note. In my original code I was using three ranges dictated by 3 pairs of bind variables, for example [:b5, :b6]. What would happen if I made :b5 greater than :b6, say I swapped their values?

The original btree/bitmap plan didn’t change, but where I had simply blocked bree/bitmap plans and seen OR-expansion as a result the plan changed to a full tablescan (with the cost you saw above of 617). So tried again, adding the hint /*+ or_expand(@sel$1) */ to see why; and this is the plan I got:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       |   735 (100)|          |
|   1 |  VIEW                                  | VW_ORE_BA8ECEFB |   116 |  7656 |   735   (3)| 00:00:01 |
|   2 |   UNION-ALL                            |                 |       |       |            |          |
|*  3 |    FILTER                              |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    18 |   792 |    20   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T1_PK           |    18 |       |     2   (0)| 00:00:01 |
|*  6 |    FILTER                              |                 |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1              |    97 |  4268 |   100   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN                  | T1_PK           |    97 |       |     2   (0)| 00:00:01 |
|*  9 |    FILTER                              |                 |       |       |            |          |
|* 10 |     TABLE ACCESS FULL                  | T1              |     1 |    44 |   615   (4)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

I still get the same three branches in the expansion, but look what’s happened to the sub-plan for the third pair of bind variables. The optimizer still has the FILTER at operation 9 – and that will evaluate to FALSE for the currently peeked values; but the optimizer has decided that it should use a tablescan for this part of the query if it ever gets a pair of bind variables in the right order; and the cost of the tablescan has echoed up the plan to make the total cost of the plan 735, which is (for obvious reasons) higher than the cost of running the whole query as a single tablescan.

The same anomaly appears in 19.11.0.0 and 21.3.0.0. On the plus side, it’s possible that if you have code like this the optimizer will be using the btree/bitmap conversion anyway;

tl;dr

As a generic point it’s worth ensuring that if you’re using bind variables in client code to define ranges then you’ve got to get the values in the right order otherwise one day the answer to the question “nothing changed why is the query running so slowly?” will be “someone got in first with the bound values the wrong way round”.

DB Optimizer

Thu, 2021-10-07 02:56

Optimizer Tip

Mon, 2021-09-20 03:04

This is a note I drafted in March 2016, starting with a comment that it had been about that time the previous year that I had written:

I’ve just responded to the call for items for the “IOUG Quick Tips” booklet for 2015 – so it’s probably about time to post the quick tip that I put into the 2014 issue. It’s probably nothing new to most readers of the blog, but sometimes an old thing presented in a new way offers fresh insights or better comprehension.

I keep finding ancient drafts like this (there are still more than 730 drafts on my blog at present – which means one per day for the next 2 years!) and if they still seem relevant – even if they are a little behind the times – I’ve taken to dusting them down and publishing.

With the passing of time, though, new information becomes available, algorithms change, and (occasionally) I discover I’ve made a significant error in my inferences. In this case  there are a couple of important additions that I’ve added to the end of the note.

Optimizer Tips (IOUG Quick Tips 2015)

There are two very common reasons why the optimizer picks a bad execution plan. The first is that its estimate of the required data volume is bad, the second is that it has a misleading impression of how scattered that data is.

The first issue is often due to problems with the selectivity of complex predicates, the second to unsuitable values for the clustering_factor of potentially useful indexes. Recent [ed: i.e. pre-2015] versions of the Oracle software have given us features that try to address both these issues, and I’m going to comment on them in the following note.

As always, any change can have side effects – introducing a new feature might have no effect on 99% of what we do, a beneficial effect on 99% of the remainder, and a hideous effect on the 1% of 1% that’s left, so I will be commenting on both the pros and cons of both features.

Column Group Stats

The optimizer assumes that the data in two different columns of a single table are independent – for example the registration number on your car (probably) has nothing to do with the account number of your bank account. So when we execute queries like:

     colX = 'abcd'
and  colY = 'wxyz'

the optimizer’s calculations will be something like:

“one row in 5,000 is likely to have colX = ‘abcd’ and one row in 2,000 is likely to have colY = ‘wxyz’, so the combination will probably appear in roughly one row in a million”.

On the other hand we often find tables that do things like storing post codes (zipcodes) in one column and city names in another, and there’s a strong correlation between post codes and city – for example the district code (first part of the post code) “OX1” will be in the city of Oxford (Oxfordshire, UK). So if we query a table of addresses for rows where:

     district_code = 'OX1'
and  city          = 'Oxford

there’s a degree of redundancy, but the optimizer will multiple the total number of distinct district codes in the UK by the total number of distinct city names in the UK as it tries to work out the number of addresses that match the combined predicate and will come up with a result that is far too small.

In cases like this we can define “column group” statistics about combinations of columns that we query together, using the function dbms_stats.create_extended_stats(). This function will create a type of virtual column for a table and report the system-generated name back to us, and we will be able to see that name in the view user_tab_cols, and the definition in the view user_stat_extensions. If we define a column group in this way we then need to gather stats on it, which we can do in one of two ways, either by using the generated name or by using the expression that created it.


SQL> create table addresses (district_code varchar2(8), city varchar2(40));

Table created.

SQL> execute dbms_output.put_line( - 
>        dbms_stats.create_extended_stats( - 
>            user,'addresses','(district_code, city)'))

SYS_STU12RZM_07240SN3V2667EQLW

PL/SQL procedure successfully completed.

begin
        dbms_stats.gather_table_stats(
                user, 'addresses',
                method_opt => 'for columns SYS_STU12RZM_07240SN3V2667EQLW size 1'
        );
        dbms_stats.gather_table_stats(
                user, 'addresses',
                method_opt => 'for columns (district_code, city) size 1'
        );
end;
/

I’ve included both options in the anonymous pl/sql block, but you only need one of them. In fact if you use the second one without calling create_extended_stats() first Oracle will create the column group implicitly, but you won’t know what it’s called until you query user_stat_extensions.

I’ve limited the stats collection to basic stats with the “size 1” option. You can collect a histogram on a column group but since the optimizer can only use a column group with equality predicates you should only create a histogram in the special cases where you know that you’re going to get a frequency histogram or “Top-N” histogram.

You can also define extended stats on expressions (e.g. trunc(delivery-date) – trunc(collection_date)) rather than column groups, but since you’re only allowed 20 column groups per table [see update 1] it would be better to use virtual columns for expressions since you can have as many virtual columns you like on a table provided the total column count stays below the limit of 1,000 columns per table.

Warnings
  • Column group statistics are only used for equality expressions. [see also update 2]
  • Column group statistics will not be used if you’ve created a histogram on any of the underlying columns unless there’s also a histogam on the column group itself.
  • Column group statistics will not be used if you query any of the underlying columns with an “out of range” value. This, perhaps, is the biggest instability threat with column groups. As time passes and new data appears you may find people querying the new data. If you haven’t kept the column stats up to date then plans can change dramatically as the optimizer switches from using column group stats to multiplying the selectivities of underlying columns.
  • The final warning arrives with 12c. If you have all the adaptive optimizer options enabled the optimizer will keep a look out for tables that it thinks could do with column group stats, and automatically creates them the next time you gather stats on the table. In principle this shouldn’t be a problem – the optimizer should only do this when it has seen that column stats should improve performance – but you might want to monitor your system for the arrival of new automatic columns.
Preference: table_cache_history

Even when the cardinality estimates are correct we may find that we get an inefficient execution plan because the optimizer doesn’t want to use an index that we think would be a really good choice. A common reason for this failure is that the clustering_factor on the index is unrealistically large.

The clustering_factor of an index is a measure of how randomly you will jump around the table as you do an index range scan through the index – and the algorithm Oracle uses to calculate this number has a serious flaw in it: it can’t tell the difference between a little bit of localised jumping and constant random leaps across the entire width of the table.

To calculate the clustering_factor Oracle basically walks the index in order using the rowid at the end of each index entry to check which table block it has to visit, and every time it has to visit a “new” table block it increments a counter. The trouble with this approach is that, by default, it doesn’t remember its recent history so, for example, it can’t tell the difference in quality between the following two sequences of table block visits:

Block 612, block 87, block 154, block 3,  block 1386, block 834, block 237
Block 98,  block 99, block 98,  block 99, block 98,   block 99,  block 98

In both cases Oracle would say that it had visited seven different blocks and the data was badly scattered. This has always been a problem, but it became much more of a problem when Oracle introduced ASSM (automatic segment space management). The point of ASSM is to ensure that concurrent inserts from different sessions tend to use different table blocks, the aim being to reduce contention due to buffer busy waits. As we’ve just seen, though, the clustering_factor doesn’t differentiate between “a little bit of scatter” and “a totally random disaster area”.

Oracle finally addressed this problem by introducing a “table preference” which allows you to tell it to “remember history” when calculating the clustering_factor. So, for example, a call like this:

execute dbms_stats.set_table_prefs(user,'t1','table_cached_blocks',16)

would tell Oracle that the next time you collect statistics on any indexes on table t1 the code to calculate the clustering_factor should remember the last 16 table blocks it had “visited” and not increment the counter if the “next” block was already in that list.

If you look at the two samples above, this means the counter for the first list of blocks would reach 7 while the counter for the second list would only reach 2. Suddenly the optimizer will be able to tell the difference between data that is “locally” scattered and data that really is randomly scattered. You and the optimizer may finally agree on what constitutes a good index.

It’s hard to say whether there’s a proper “default” value for this preference. If you’re using ASSM (and there can’t be many people left who aren’t) then the obvious choice for the parameter would be 16 since ASSM tends to format 16 consecutive blocks at a time when a segment needs to make more space available for users [but see Update 3]. However, if you know that the real level of insert concurrency on a table is higher than 16 then you might be better off setting the value to match the known level of concurrency.

Are there any special risks to setting this preference to a value like 16? I don’t think so; it’s going to result in plans changing, of course, but indexes which should have a large clustering_factor should still end up with a large clustering_factor after setting the preference and gathering of statistics; the indexes that ought to have a low clustering_factor are the ones most likely to change, and change in the right direction.

Footnote: “Danger, Will Robinson”.

I’ve highlighted two features that are incredibly useful as tools to give the optimizer better information about your data and allow it to get better execution plans with less manual intervention. The usual warning applies, though: “if you want to get there, you don’t want to start from here”. When you manipulate the information the optimizer is using it will give you some new plans; better information will normally result in better plans but it is almost inevitable that some of your current queries are running efficiently “by accident” (possibly because of bugs) and the new code paths will result in some plans changing for the worse.

Clearly it is necessary to do some thorough testing but fortunately both features are incremental and any changes can be backed out very quickly and easily. We can change the “table_cached_blocks” one table at a time (or even, with a little manual intervention, one index at a time) and watch the effects; we can add column groups one at a time and watch for side effects. All it takes to back out of a change is a call to gather index stats, or a call to drop extended stats. It’s never nice to live through change – especially change that can have a dramatic impact – but if we find after going to production that we missed a problem with our testing we can reverse the changes very quickly.

Updates

Update 1 – 20 sets of extended stats. In fact the limit is the larger of 20 and ceiling(column count/10), and the way the arithmetic is applied is a little odd so there are ways to hack around the limit.

Update 2 – Column groups and equality. It’s worth a special menton that the predicate colX is null is not an equality predicate, and column group stats will not apply but there can be unexpected side effects even for cases where you don’t use this “is null” predicate.

Update 3 – table_cache_history = 16. This suggestions doesn’t allow for systems running RAC.

Quiz Night

Sun, 2021-09-05 12:35

This little observation came from running a couple of tests while looking at a problem on OTN – hence the odd bit of PL/SQL.

declare
        cursor c1
        is
        select * from t1;

        rec c1%rowtype;
        ch char(1);

begin
        open c1;
        loop
              fetch c1 into rec;
                  exit when c1%notfound;
                  select null into ch from t1 where id  = rec.id;
                  select null into ch from t1 where id  = rec.id;
                  select null into ch from t1 where id  = rec.id;
        end loop;
        close c1;
end;

There is a simple B-tree index on t1(id), and t1 was populated with the values from 1 to 10,000 in order. A query against user_tables (immediately after gathering stats) reported 20 blocks and 10,000 rows. This is NOT an attempt to trick anyone, it’s simple demonstrating a surprising result.

The trace file shows a tablescan for the driving cursor and an index(-only) path for the three identical queries inside the loop.

Question: if you look at the session activity stats (v$mystat) from running this anonymous PL/SQL block what value (ballpark figure) would you expect to see for the following statistics:

table scans (short tables)
table scan rows gotten
table scan blocks gotten

If it’s any help: when I counted rows per block I had 15 blocks with 657 rows and one block with 145 rows. (The 20 blocks reported in the table stats is the number of formatted blocks below the highwater mark, but some may be empty and some may be space management (bitmap) blocks).

Ordered hint

Fri, 2021-09-03 12:49

It’s been such a long time since Oracle deprecated the /*+ ordered */ hint that I can’t remember when it happened. The hint you should be using is the /*+ leading(…) */ hint which initially – maybe some time in 9i – would only allow you to specify the first table that the optimizer should use when examining join orders, but which soon changed to allow you to specify a complete join order.

I’ve written a few notes about the need to get rid of any /*+ ordered */ hints in production SQL, because it can produce a join order you’re not expecting. I’ve just found an extreme case of this running a quick test on 19.11.0.0 then 21.3.0.0

I’m not going to bother with the data setup for the query, but it’s a simple parent/child query that exhibits a surprising pattern. Here’s the query:

select
        /*+
                no_adaptive_plan
                ordered
                use_nl(ch)
        */
        par.n1,
        par.small_vc,
        sum(ch.n1)
from
        parent par,
        child ch
where
        par.n1 <= 20
and     ch.id_par = par.id
group by
        par.n1,
        par.small_vc
;

And here’s the plan, pulled from memory with a call to dbms_xplan.display_cursor() with ordered hint in place. I’ve included the outline information, hint report and (since this is from 21c) the query block registry:

-----------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |       |    32 (100)|          |
|   1 |  HASH GROUP BY         |          |    20 |   780 |    32   (7)| 00:00:01 |
|*  2 |   HASH JOIN            |          |    20 |   780 |    31   (4)| 00:00:01 |
|   3 |    JOIN FILTER CREATE  | :BF0000  |    20 |   440 |     8   (0)| 00:00:01 |
|   4 |     VIEW               | VW_GBF_6 |    20 |   440 |     8   (0)| 00:00:01 |
|*  5 |      TABLE ACCESS FULL | PARENT   |    20 |   380 |     8   (0)| 00:00:01 |
|   6 |    VIEW                | VW_GBC_5 |  1000 | 17000 |    23   (5)| 00:00:01 |
|   7 |     HASH GROUP BY      |          |  1000 |  8000 |    23   (5)| 00:00:01 |
|   8 |      JOIN FILTER USE   | :BF0000  |  4000 | 32000 |    22   (0)| 00:00:01 |
|*  9 |       TABLE ACCESS FULL| CHILD    |  4000 | 32000 |    22   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('21.1.0')
      DB_VERSION('21.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$D2EA58F1")
      ELIM_GROUPBY(@"SEL$FFB6458A")
      OUTLINE_LEAF(@"SEL$FE9D3122")
      OUTLINE_LEAF(@"SEL$E2E47E3A")
      PLACE_GROUP_BY(@"SEL$1" ( "PAR"@"SEL$1" ) ( "CH"@"SEL$1" ) 5)
      OUTLINE(@"SEL$FFB6458A")
      ELIM_GROUPBY(@"SEL$1D9E464A")
      OUTLINE(@"SEL$E26B953F")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$1D9E464A")
      OUTLINE(@"SEL$E132E821")
      NO_ACCESS(@"SEL$E2E47E3A" "VW_GBF_6"@"SEL$E132E821")
      NO_ACCESS(@"SEL$E2E47E3A" "VW_GBC_5"@"SEL$E26B953F")
      LEADING(@"SEL$E2E47E3A" "VW_GBF_6"@"SEL$E132E821"
              "VW_GBC_5"@"SEL$E26B953F")
      USE_HASH(@"SEL$E2E47E3A" "VW_GBC_5"@"SEL$E26B953F")
      PX_JOIN_FILTER(@"SEL$E2E47E3A" "VW_GBC_5"@"SEL$E26B953F")
      USE_HASH_AGGREGATION(@"SEL$E2E47E3A" GROUP_BY)
      FULL(@"SEL$D2EA58F1" "PAR"@"SEL$1")
      FULL(@"SEL$FE9D3122" "CH"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$FE9D3122" GROUP_BY)
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ITEM_1"="ITEM_1")
   5 - filter("PAR"."N1"<=20)
   9 - filter(SYS_OP_BLOOM_FILTER(:BF0000,"CH"."ID_PAR"))

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------

   0 -  STATEMENT
           -  no_adaptive_plan

   1 -  SEL$E2E47E3A
           -  ordered

   9 -  SEL$FE9D3122 / "CH"@"SEL$1"
         U -  use_nl(ch)

Query Block Registry:
---------------------

  SEL$1 (PARSER)
    SEL$E26B953F (QUERY BLOCK TABLES CHANGED SEL$1)
      SEL$E132E821 (QUERY BLOCK TABLES CHANGED SEL$E26B953F)
        SEL$1D9E464A (SPLIT/MERGE QUERY BLOCKS SEL$E132E821)
          SEL$FFB6458A (ELIMINATION OF GROUP BY SEL$1D9E464A)
            SEL$D2EA58F1 (ELIMINATION OF GROUP BY SEL$FFB6458A) [FINAL]
      SEL$FE9D3122 (SPLIT/MERGE QUERY BLOCKS SEL$E26B953F) [FINAL]
    SEL$E2E47E3A (PLACE GROUP BY SEL$1) [FINAL]

The optimizer seems to have got rather carried away with how clever it cn be; so here’s the result of switching from /*+ ordered */ to using /*+ leading(par ch) */ – I won’t bother with all the extras since it’s a very simple plan:

----------------------------------------------------------------------------------------
| Id  | Operation                     | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |        |       |       |   109 (100)|          |
|   1 |  HASH GROUP BY                |        |    80 |  2160 |   109   (1)| 00:00:01 |
|   2 |   NESTED LOOPS                |        |    80 |  2160 |   108   (0)| 00:00:01 |
|   3 |    NESTED LOOPS               |        |    80 |  2160 |   108   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL         | PARENT |    20 |   380 |     8   (0)| 00:00:01 |
|*  5 |     INDEX RANGE SCAN          | CHI_PK |     4 |       |     1   (0)| 00:00:01 |
|   6 |    TABLE ACCESS BY INDEX ROWID| CHILD  |     4 |    32 |     5   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("PAR"."N1"<=20)
   5 - access("CH"."ID_PAR"="PAR"."ID")


tl;dr

You should not be using the /*+ ordered */ hint in any recent version of Oracle.

qbregistry 2

Wed, 2021-08-25 07:45

Following a question (very similar to one I had been asking myself) that appeared on twitter in response to my original posting on the new qbregistry format option in calls to dbms_xplan, I’ve drafted a note of how I interpret the execution plan so that I can more clearly see how my visualisation of the transformation maps (or fails to map) to the Query Block Registry.

This is still a work in progress as I am still working on naming the query blocks that appear with the names that I think would have come from the CBO trace file.

Original Query (hiding the unnest and no_semijoin hints)

select  
        /* sel$1 */
        * 
from    t1 
where   t1.owner = 'OUTLN' 
and     object_name in (
                select  /* sel$2 */
                        distinct t2.object_name 
                from   t2 
                where  t2.object_type = 'TABLE'
        )
;

Transformation 1: unnest the subquery

This produces two new query blocks, the query block that “is” the unnested subquery and the query block that joins t1 to the unnested subquery vw_nso_1.

select  
        /* SEL$5DA710D3 */
        t1.* 
from
        t1,
        (
        select  /* SEL$683B0107 */
                distinct
                t2.object_name 
        from    t2 
        where   t2.object_type = 'TABLE'
        )       vw_nso_1
where
        t1.owner = 'OUTLN' 
and     vw_nso_1.object_name = t1.object_name



Transformation 2: view merge (join then aggregate)

I think this produces threenew query blocks; the block that “is” the merged view, a block that selects (projects) from the merged view, and the query block that the main query now becomes.

We will pretend that t1 has only 4 columns, owner, object_name, object_type, object_id.

select
        /* SEL$B186933D */
        vm_nwvw_2.owner,
        vm_nwvw_2.object_name,
        vm_nwvw_2.object_type
        vm_nwvw_2.object_id
from    (
        select  /* SEL$2F1334C4 */
                *
        from    (
                select  /* SEL$88A77D12 */
                        distinct
                        t1.rowid,
                        t1.owner,
                        t1.object_name,
                        t1.object_type
                        t1.object_id
                from
                        t1,
                        t2
                where
                        t2.object_type = 'TABLE'
                and     t1.owner = 'OUTLN'
                and     t1.object_name = t2.object_name
                ) 
        ) vm_nwvw_2
;

Transformation 3: aggregate into partial join

I realised only as I was writing this note that I had completely forgotten that the plan reported a semi join even though the subquery had been hinted with a no_semijoin hint, and that the reported semi join was actually a partial join.

However, the query block registry is identical with or without a partial join (controlled by a no_partial_join hint), so there doesn’t seem to be transformation corresponding to the strategy. Maybe the apparently redundant extra layer allows the variation to appear if required.

It’s Difficult

A problem I have with the query block registry is deciding what it’s telling us – and maybe the trace file and the execution plan are not trying to tell us exactly the same thing. I think, anyway, that there is a problem built into the requirement that is inherently difficult to address (in the general case).

Until this final sentence is replaced with a completion notice, this is an incomplete and possible misleading note.

qbregistry

Tue, 2021-08-24 05:54

If you look at the “Outline Information” from an execution plan it shows you a list of hints that will (in theory, at least) recreate the execution plan and it’s this information that gets stored as the “injection” part of an SQL Plan Baseline. Unfortunately the hints won’t necessarily allow you to infer what transformations the optimizer has used to get to the final execution plan.

If you’re prepared to generate the CBO trace file you could examine the Query Block Registry that appears near the end of the trace file to get some clues – here’s an example from 19.11.0.0 for a simple query involving a single table plus an IN subquery:

Query Block Registry:
SEL$2 0x6d47cde8 (PARSER)
  SEL$5DA710D3 0x6d480e60 (SUBQUERY UNNEST SEL$1; SEL$2;)
    SEL$2F1334C4 0x6d480e60 (SPLIT/MERGE QUERY BLOCKS SEL$5DA710D3) [FINAL]
  SEL$683B0107 0x6d47cde8 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
    SEL$B186933D 0x6d48e3a8 (VIEW MERGE SEL$88A77D12; SEL$683B0107; SEL$5DA710D3) [FINAL]
    SEL$88A77D12 0x6d48e3a8 (PROJECTION VIEW FOR CVM SEL$683B0107)
      SEL$B186933D 0x6d48e3a8 (VIEW MERGE SEL$88A77D12; SEL$683B0107; SEL$5DA710D3) [FINAL]
SEL$1 0x6d480e60 (PARSER)
  SEL$5DA710D3 0x6d480e60 (SUBQUERY UNNEST SEL$1; SEL$2;)

I’m not going to say anything about interpreting this extract because I want to highlight a recent feature of the dbms_xplan package (brought to my attention by Franck Pachot some time ago). One of the format options for displaying execution plans will report the query block registry. Here’s the output from display_cursor(format=>’qbregistry’)) in 21.3.0.0 for the query that produced the CBO trace extract above:

Query Block Registry:
---------------------
  SEL$1 (PARSER)
    SEL$5DA710D3 (SUBQUERY UNNEST SEL$1 ; SEL$2)
      SEL$2F1334C4 (SPLIT/MERGE QUERY BLOCKS SEL$5DA710D3) [FINAL]
  SEL$2 (PARSER)
    SEL$683B0107 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
      SEL$88A77D12 (PROJECTION VIEW FOR CVM SEL$683B0107)
        SEL$B186933D (VIEW MERGE SEL$88A77D12 ; SEL$683B0107) [FINAL]

Two things to notice here – first that the output has reduced the 9 lines to 7 lines (which can only be helpful). secondly that the redundant memory addresses which appear in the trace file don’t get copied into the report.

I’m still not going to say anything about interpreting the output because I want to show you the display_cursor() output for the same query when executed in 19.11.0..0. It looks like this:

Query Block Registry:
---------------------

  <q o="13"><n><![CDATA[SEL$88A77D12]]></n><p><![CDATA[SEL$683B0107]]></p><
        f><h><t><![CDATA[T1]]></t><s><![CDATA[SEL$1]]></s></h><h><t><![CDATA[VW_N
        SO_1]]></t><s><![CDATA[SEL$5DA710D3]]></s></h></f></q>
  <q o="12"><n><![CDATA[SEL$683B0107]]></n><p><![CDATA[SEL$2]]></p><f><h><t
        ><![CDATA[T2]]></t><s><![CDATA[SEL$2]]></s></h></f></q>
  <q o="2"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[T1]]></t><s><![CDATA[S
        EL$1]]></s></h></f></q>
  <q o="2"><n><![CDATA[SEL$2]]></n><f><h><t><![CDATA[T2]]></t><s><![CDATA[S
        EL$2]]></s></h></f></q>
  <q o="18" f="y" h="y"><n><![CDATA[SEL$B186933D]]></n><p><![CDATA[SEL$88A7
        7D12]]></p><i><o><t>VW</t><v><![CDATA[SEL$683B0107]]></v></o></i><f><h><t
        ><![CDATA[T1]]></t><s><![CDATA[SEL$1]]></s></h><h><t><![CDATA[T2]]></t><s
        ><![CDATA[SEL$2]]></s></h></f></q>
  <q o="19" h="y"><n><![CDATA[SEL$5DA710D3]]></n><p><![CDATA[SEL$1]]></p><i
        ><o><t>SQ</t><v><![CDATA[SEL$2]]></v></o></i><f><h><t><![CDATA[T1]]></t><
        s><![CDATA[SEL$1]]></s></h><h><t><![CDATA[VW_NSO_1]]></t><s><![CDATA[SEL$
        5DA710D3]]></s></h></f></q>
  <q o="15" f="y"><n><![CDATA[SEL$2F1334C4]]></n><p><![CDATA[SEL$5DA710D3]]
        ></p><f><h><t><![CDATA[VM_NWVW_2]]></t><s><![CDATA[SEL$2F1334C4]]></s></h
        ></f></q>

Yes, it’s naked XML (extracted from the v$sql_plan.other_xml column for operation 1).

I had been living in hope that someone else would write a messy bit of SQL to translate this into something readable – but the last time I searched the Internet for “other_xml qbregistry” I got the magical result of a Googlewhack (i.e. only one hit), which was in Russian, and largely a short description of all the options for the format command.

Since I’ve just installed 21.3 on a VM I decided to bite the bullet but I’ve taken the short-cut to writing the code. I’ve run a trace on a call to dbms_xplan.display_cursor() and extracted the critical query from the resulting trace file. Then I spent 30 minutes making it readable, hacking it to make it almost workable on 19c, then finding out why it can’t work without a little extra effort. Here’s the resulting hack:

rem
rem     Script:         qbregistry_query.sql
rem     Author:         Oracle Corp / Jonathan Lewis
rem     Dated:          Aug 2021
rem
rem     Last tested 
rem             19.11.0.0
rem

define m_sql_id='232sya6twg7sq'
define m_origin = 2

with 
xml as (
        select  other_xml
        from    V$sql_plan 
        where   sql_id = '&m_sql_id' 
        and     id = 1
        and     other_xml is not null
),
allqbs as ( 
        select 
                extractvalue(d.column_value, '/q/n') qbname, 
                extractvalue(d.column_value, '/q/@f') final, 
                extractvalue(d.column_value, '/q/p') prev, 
                extractvalue(d.column_value, '/q/@o') origin 
        from 
                table(xmlsequence(extract(xmltype ((select other_xml from xml)), '/other_xml/qb_registry/q'))) d 
), 
inpqbs as ( 
        select 
                xml.qbname qbname, 
                listagg(xml.depqbs, ',') within group (
                        order by xml.depqbs) depqbs 
        from 
                xmltable('/other_xml/qb_registry/q/i/o' passing xmltype((select other_xml from xml)) 
                        columns depqbs varchar2(256) path 'v', 
                        qbname varchar2(256) path './../../n'
                ) xml 
        where     xml.depqbs in ( select qbname from allqbs) 
        group by xml.qbname
), 
recqb   (src, origin, dest, final, lvl, inpobjs) as ( 
        select 
                qbname src, origin origin, null dest, final final, 1 lvl, null inpobjs 
        from 
                allqbs
        where 
--              origin = &m_origin
                origin in (2,3)
        union all 
        select 
                a.qbname src, a.origin origin, a.prev dest, a.final final, lvl+1, 
                (select depqbs from inpqbs i where i.qbname = a.qbname) inpobjs 
        from
                allqbs a, 
                recqb r 
        where a.prev = r.src
)
search depth first by src asc set ordseq, 
finalans as ( 
        select 
                src,
/* 
                (
                select 
                        name 
                from    v$query_block_origin 
                where 
                        origin_id=origin
                )       origin, 
*/
                origin,
                dest, final, lvl, inpobjs 
        from recqb order by ordseq
) 
select
        /*+ opt_param('parallel_execution_enabled', 'false') */ 
        g.qbreg 
from (
        select 
                rpad(' ', 2*(lvl-1)) || 
                src || ' (' || origin || 
                        case when length(dest)>0 
                                then ' ' || dest 
                                else '' 
                        end || 
                        case when length(inpobjs)>0 
                                then ' ; ' || inpobjs 
                                else ' ' 
                        end ||
                        ')' || 
                        case when final='y' 
                                then ' [final]' 
                                else '' 
                        end 
                qbreg 
        from 
                finalans
        ) g
/

In lines 10 and 11 I’ve defined a couple of substitution variables that appear further on in the script. One is the SQL_ID for the query you’re interested in, the other is a fixed (probably) symbolic constant used by Oracle.

Lines 14-20 are a “with” subquery that I’ve prepended to Oracle’s internal code to create a single row, single column table holding the other_xml value of the query of interest. You’ll notice that I’ve been fairly casual about this bit since I haven’t catered for the fact that a single sql_id may have several child cursors and might even be obsolete.

Lines 28 and 36 are where I’ve used my “with” subquery to supply the other_xml value that would have appeared as a bind variable (:B1) in the trace file.

Line 49 (commented out for the reason described in footnote 1) uses the m_origin variable to identify a row in the dynamic performance view v$query_block_origin (highlighted at line 68) and that’s where we have a problem with Oracle 19c: the view doesn’t exist, nor does the x$qbname structure that the view is based on.

In the code above I’ve actually commented out the whole of the inline scalar subquery that translates an origin number into an origin name and reported the actual value of origin. Originally I did this to check whether it was worth spending any more working on the code – and this is the result I got the initial test:

SEL$1 (2 )
  SEL$5DA710D3 (19 SEL$1 ; SEL$2)
    SEL$2F1334C4 (15 SEL$5DA710D3 ) [final]
SEL$2 (2 )
  SEL$683B0107 (12 SEL$2 )
    SEL$88A77D12 (13 SEL$683B0107 )
      SEL$B186933D (18 SEL$88A77D12 ; SEL$683B0107) [final]

A quick check by eye shows that it’s got the same pattern and set of query block names that the 21c output produced so it’s clearly a step in the right direction. Now all I need is a way to translate the origin numbers into names.

I could have tried searching x$ksmfsv to see if I could spot a pointer to the relevant structure and fake my way through the whole process of creating a “nearly dynamic” performance view, but I decided the quick and dirty workaround was to dump a CSV file listing the view contents in 21c, then read the file back as an external table to copy the data into a local IOT (index organized table) called my_query_block_origin. With the inline view back in play – and the name suitably changed – the 19c and 21c queries produced the same result.

Footnote 1

Here’s a query to show the content of that 21c view (which is fairly interesting in its own right):

set linesize 144
set pagesize 100
set trimspool on
set tabout off

column  name format a60
column  hint_token format a32

spool query_block_origin.lst

select
        origin_id,
        name,
        hint_token
from
        v$query_block_origin
/

 ORIGIN_ID NAME                                                         HINT_TOKEN
---------- ------------------------------------------------------------ --------------------------------
         0 NOT NAMED
         1 ALLOCATE
         2 PARSER
         3 HINT
         4 COPY
         5 SAVE
         6 MV REWRITE                                                   REWRITE
         7 PUSHED PREDICATE                                             PUSH_PRED
         8 STAR TRANSFORM SUBQUERY
         9 COMPLEX VIEW MERGE
        10 COMPLEX SUBQUERY UNNEST
        11 OR EXPANSION                                                 USE_CONCAT
        12 SUBQ INTO VIEW FOR COMPLEX UNNEST
        13 PROJECTION VIEW FOR CVM
        14 GROUPING SET TO UNION
        15 SPLIT/MERGE QUERY BLOCKS
        16 COPY PARTITION VIEW
        17 RESTORE
        18 VIEW MERGE                                                   MERGE
        19 SUBQUERY UNNEST                                              UNNEST
        20 STAR TRANSFORM                                               STAR_TRANSFORMATION
        21 INDEX JOIN
        22 STAR TRANSFORM TEMP TABLE
        23 MAP QUERY BLOCK
        24 VIEW ADDED
        25 SET QUERY BLOCK
        26 QUERY BLOCK TABLES CHANGED
        27 QUERY BLOCK SIGNATURE CHANGED
        28 MV UNION QUERY BLOCK
        29 SPLIT QUERY BLOCK FOR GSET-TO-UNION                          EXPAND_GSET_TO_UNION
        30 PREDICATES REMOVED FROM QUERY BLOCK                          PULL_PRED
        31 PREDICATES ADDED TO QUERY BLOCK
        32 OLD PUSHED PREDICATE                                         OLD_PUSH_PRED
        33 ORDER BY REMOVED FROM QUERY BLOCK                            ELIMINATE_OBY
        34 JOIN REMOVED FROM QUERY BLOCK                                ELIMINATE_JOIN
        35 OUTER-JOIN REMOVED FROM QUERY BLOCK                          OUTER_JOIN_TO_INNER
        36 STAR TRANSFORMATION JOINBACK ELIMINATION                     ELIMINATE_JOIN
        37 BITMAP JOIN INDEX JOINBACK ELIMINATION                       ELIMINATE_JOIN
        38 CONNECT BY COST BASED TRANSFORMATION                         CONNECT_BY_COST_BASED
        39 CONNECT BY WITH FILTERING                                    CONNECT_BY_FILTERING
        40 CONNECT BY WITH NO FILTERING                                 NO_CONNECT_BY_FILTERING
        41 CONNECT BY START WITH QUERY BLOCK
        42 CONNECT BY FULL SCAN QUERY BLOCK
        43 PLACE GROUP BY                                               PLACE_GROUP_BY
        44 CONNECT BY NO FILTERING COMBINE                              NO_CONNECT_BY_FILTERING
        45 VIEW ON SELECT DISTINCT
        46 COALESCED SUBQUERY                                           COALESCE_SQ
        47 QUERY HAS COALESCED SUBQUERIES                               COALESCE_SQ
        48 SPLIT QUERY BLOCK FOR DISTINCT AGG OPTIM                     TRANSFORM_DISTINCT_AGG
        49 CONNECT BY ELIMINATE DUPLICATES FROM INPUT                   CONNECT_BY_ELIM_DUPS
        50 CONNECT BY COST BASED TRANSFORMATION FOR WHR ONLY            CONNECT_BY_CB_WHR_ONLY
        51 TABLE EXPANSION                                              EXPAND_TABLE
        52 TABLE EXPANSION BRANCH
        53 JOIN FACTORIZATION SET QUERY BLOCK                           FACTORIZE_JOIN
        54 DISTINCT PLACEMENT                                           PLACE_DISTINCT
        55 JOIN FACTORIZATION BRANCH QUERY BLOCK
        56 TABLE LOOKUP BY NESTED LOOP QUERY BLOCK                      TABLE_LOOKUP_BY_NL
        57 FULL OUTER JOIN TRANSFORMED TO OUTER                         FULL_OUTER_JOIN_TO_OUTER
        58 LEFT OUTER JOIN TRANSFORMED TO ANTI                          OUTER_JOIN_TO_ANTI
        59 VIEW DECORRELATED                                            DECORRELATE
        60 QUERY VIEW DECORRELATED                                      DECORRELATE
        61 NOT EXISTS SQ ADDED
        62 BRANCH WITH OUTER JOIN
        63 BRANCH WITH ANTI JOIN
        64 UNION ALL FOR FULL OUTER JOIN
        65 VECTOR TRANSFORMATION                                        VECTOR_TRANSFORM
        66 VECTOR TRANSFORMATION TEMP TABLE
        67 QUERY ANSI REARCHiTECTURE                                    ANSI_REARCH
        68 VIEW ANSI REARCHiTECTURE                                     ANSI_REARCH
        69 ELIMINATION OF GROUP BY                                      ELIM_GROUPBY
        70 UAL BRANCH OF UNNESTED SUBQUERY
        71 QUERY BLOCK HAS BUSHY JOIN                                   BUSHY_JOIN
        72 SUBQUERY ELIMINATE                                           ELIMINATE_SQ
        73 OR EXPANSION UNION ALL BRANCH
        74 OR EXPANSION UNION ALL VIEW                                  OR_EXPAND
        75 DIST AGG GROUPING SETS UNION ALL TRANSFORMATION              USE_DAGG_UNION_ALL_GSETS
        76 MATERIALIZED WITH CLAUSE
        77 STATISTCS BASED TRANSFORMED QB
        78 PQ TABLE EXPANSION
        79 LEFT OUTER JOIN TRANSFORMED TO BOTH INNER AND ANTI
        80 SHARD TEMP TABLE
        81 BRANCH OF COMPLEX UNNESTED SET QUERY BLOCK
        82 DIST AGG GROUPING SETS OPTIMIZATION                          DAGG_OPTIM_GSETS


You’ll notice the highlight for origin_id 2 which has the name PARSER – that’s the (first) significant value when reporting the query block registry but take note, also, of origin_id 3 which has the name hint. This is where the code built into 21c goes wrong. If you use the qb_name hint to name all your query blocks then their origin_id will be 3, and Oracle’s code won’t find them.

When I added the hint /*+ qb_name(main) */ to the query this is what I got from my registry query:

Query Block Registry:
---------------------

  SEL$1 (PARSER)
    SEL$7D4DB4AA (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$1)
      SEL$EFD91A2C (PROJECTION VIEW FOR CVM SEL$7D4DB4AA)
        SEL$7086F02E (VIEW MERGE SEL$EFD91A2C ; SEL$7D4DB4AA) [FINAL]

And when I also added the hint /*+ qb_name(subq) */ to the subquery the result was this:

Query Block Registry:
---------------------

An uncaught error happened in display_cursor : ORA-06502: PL/SQL: numeric or value error

I’ve said for a long time: “always name all your query blocks”. I think 21c (temporarily) demonstrates why you have two options: name ALL of them or name NONE of them. If you name just some of them you might not notice that parts of your plan don’t appear in the registry report, and I’d say it’s better to see an error than to be fooled into thinking you’ve got complete information.

Footnote 2

There’s another new option for the format parameter in 21c which is qbregistry_graph. I haven’t considered playing about with the trace file to see if I can extract and hack the SQL that generates the appropriate output (but that might change if I pick up a tool to turn the textual description into a graphic). For the registry listing above this is what the “graph” output looks like:

Query Block Registry Graph (dot format):
---------------------
digraph g{
  rankdir = TB
  "SEL$88A77D12"
  "SEL$683B0107"
  "SEL$1"
  "SEL$2"
  "SEL$B186933D" [peripheries=2]
  "SEL$5DA710D3"
  "SEL$2F1334C4" [peripheries=2]
  "SEL$683B0107" -> "SEL$88A77D12" [label="PROJECTION VIEW FOR CVM"]
  "SEL$2" -> "SEL$683B0107" [label="SUBQ INTO VIEW FOR COMPLEX UNNEST"]
  "SEL$88A77D12" -> "SEL$B186933D" [label="VIEW MERGE"]
  "SEL$1" -> "SEL$5DA710D3" [label="SUBQUERY UNNEST"]
  "SEL$5DA710D3" -> "SEL$2F1334C4" [label="SPLIT/MERGE QUERY BLOCKS"]
  "SEL$683B0107" -> "SEL$B186933D" [style=dotted]
  "SEL$2" -> "SEL$5DA710D3" [style=dotted]
  { rank = same }
  {
    rank="sink";
    rankdir = LR;
    item1 [style=invis];
    item2 [shape="plaintext" label="Participating query blocks"];
    item3 [label="&nbsp;" peripheries=2];
    item4 [shape="plaintext" label="Final query blocks"];
    item1 -> item2 [style=dotted];
    { rank=same item3 item4; }
  }
}

Footnote 3

For completeness – here’s the original SQL and plan for the statement that produced this qbregistry example:

select
        *
from    t1
where   owner = 'OUTLN'
and     object_name in (
                select  /*+
                                unnest
                                no_semijoin
                        */
                        distinct object_name
                from   t2
                where  object_type = 'TABLE'
        )
;

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |       |       |     5 (100)|          |
|   1 |  VIEW                                  | VM_NWVW_2 |     1 |   483 |     5  (20)| 00:00:01 |
|   2 |   HASH UNIQUE                          |           |     1 |   155 |     5  (20)| 00:00:01 |
|   3 |    NESTED LOOPS SEMI                   |           |     1 |   155 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1        |     1 |   128 |     2   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | T1_I1     |     1 |       |     1   (0)| 00:00:01 |
|*  6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2        |     1 |    27 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | T2_I2     |    48 |       |     0   (0)|          |
----------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$B186933D / "VM_NWVW_2"@"SEL$2F1334C4"
   2 - SEL$B186933D
   4 - SEL$B186933D / "T1"@"SEL$1"
   5 - SEL$B186933D / "T1"@"SEL$1"
   6 - SEL$B186933D / "T2"@"SEL$2"
   7 - SEL$B186933D / "T2"@"SEL$2"

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('21.1.0')
      DB_VERSION('21.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$B186933D")
      MERGE(@"SEL$683B0107" >"SEL$5DA710D3")
      OUTLINE_LEAF(@"SEL$2F1334C4")
      OUTLINE(@"SEL$88A77D12")
      OUTLINE(@"SEL$683B0107")
      OUTLINE(@"SEL$5DA710D3")
      UNNEST(@"SEL$2" UNNEST_INNERJ_DISTINCT_VIEW)
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$2F1334C4" "VM_NWVW_2"@"SEL$2F1334C4")
      INDEX_RS_ASC(@"SEL$B186933D" "T1"@"SEL$1" ("T1"."OWNER"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B186933D" "T1"@"SEL$1")
      INDEX_RS_ASC(@"SEL$B186933D" "T2"@"SEL$2" ("T2"."OBJECT_TYPE"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$B186933D" "T2"@"SEL$2")
      LEADING(@"SEL$B186933D" "T1"@"SEL$1" "T2"@"SEL$2")
      USE_NL(@"SEL$B186933D" "T2"@"SEL$2")
      USE_HASH_AGGREGATION(@"SEL$B186933D" UNIQUE)
      PARTIAL_JOIN(@"SEL$B186933D" "T2"@"SEL$2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("OWNER"='OUTLN')
   6 - filter("OBJECT_NAME"="OBJECT_NAME")
   7 - access("OBJECT_TYPE"='TABLE')

Query Block Registry:
---------------------
  SEL$1 (PARSER)
    SEL$5DA710D3 (SUBQUERY UNNEST SEL$1 ; SEL$2)
      SEL$2F1334C4 (SPLIT/MERGE QUERY BLOCKS SEL$5DA710D3) [FINAL]
  SEL$2 (PARSER)
    SEL$683B0107 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
      SEL$88A77D12 (PROJECTION VIEW FOR CVM SEL$683B0107)
        SEL$B186933D (VIEW MERGE SEL$88A77D12 ; SEL$683B0107) [FINAL]

Tables t1 and t2 are copies of the same data set, which is a subset of 100 rows from all_objects. You won’t necessarily see this plan on your systems because (even with the hints) the plan can vary depending on the number of rows with owner = ‘OUTLN’ (which is likely to be zero) or with object_type = ‘TABLE’ (which might be all of them). The script I started with was one I had used in a note I wrote about “distinct” appearing in the select list of subqueries, but the data it produced in the newer versions of Oracle was sufficiently different that I had to be a little more careful in constructing a data set that produced stable plans.

If you cross-check the Query Block Registry with the Outline Information you’ll see that the lines labelled FINAL start with the query block names that are shown as “outline_leaf” entries, and the other 5 query block names appear as “outline” entries.

Reading down the tree I then find myself strugling to interpret the QBR. I think I know what has happened, but I can’t quite manage to see exactly how the QBR is telling me that it happened.

QBR – tentative interpretation

Part of the difficulty is that the QBR seems to have a section for every initial query block in the query, so there’s (a) likely to be some overlap between sections and (b) some sequencing that means you can’t get the full picture just by reading straight from top to bottom. In this case we have two initial query blocks (the main query, implicitly named sel$1, and the subquery implicitly named sel$2), and I think the interpretation is as follows:

Starting with sel$2 section we can see that its second line tells us that the subquery was unnested and the resulting aggregate inline view is the sole content of a query block called sel$683B0107.

Jumping backwards to the sel$1 section, its second line tells us that sel$5DA710D3 is a query block consisting of a join between t1 and the inline aggregate view.

Sticking with the sel$1 tree, we then see a query block that tells us that the optimizer has transformed an “aggregate then join” into a “join then aggregate”. sel$2F1334C4 is the query block holding nothing but a select from the view VM_VWNW_2.

Returning to the sel$2 tree, sel$88A77D12 is the resulting query block when the inline aggregate view is merged using complex view merging. This is where I get a bit stuck, because this seems to be repeating a step that we’ve handled in the sel$1 section by a different route.

The final step of the sel$2 tree is sel$B186933D the query block where we select from the non-mergeable inline view VM_VWNW_2 that seems to have come from one of two different places.

Bottom line on this one: even though it’s an extremely simple query and I believe I understand what the execution plan is telling us about the transformations that took place, the query block registry is still something of a mystery to me.

Distributed Query

Mon, 2021-08-23 11:24

Here’s an example that appeared on the Oracle Developer Community forum about a year ago that prompted me to do a little investigative work. The question involved a distributed query that was “misbehaving” – the interesting points were the appearance of the /*+ rule */ and /*+ driving_site() */ hints in the original query when combined with a suggestion to address the problem using the /*+ materialize */ hint with factored subqueries (common table expressions – CTEs), or when combined with my suggestion to use the /*+ no_merge */ hint.

If you don’t want to read the whole article there’s a tl;dr summary just before the end.

The original question was posed with a handful of poorly constructed code fragments that were supposed to describe the problem, viz:


select /*+ DRIVING_SITE (s1) */ * from  Table1 s1 WHERE condition in (select att1 from local_table) ; -- query n°1

select /*+ RULE DRIVING_SITE (s2) */  * from  Table2 s2 where  condition in (select att1 from local_table); -- query n°2

select * from
select /*+ DRIVING_SITE (s1) */ * from  Table1 s1 WHERE condition in (select att1 from local_table) ,
select /*+ RULE DRIVING_SITE (s2) */  * from  Table2 s2 where  condition in (select att1 from local_table)
where att_table_1 = att_table_2  -- sic

The crux of the problem was that the two separate statements individually produced an acceptable execution plan but the attempt to use the queries in inline views with a join resulted in a plan that (from the description) sounded like the result of Oracle merging the two inline views and running the two IN subqueries as FILTER (existence) subqueries.

We weren’t shown any execution plans and only had the title of the question (“Distributed sql query through multiple databases”) to give us the clue that there might be three different databases involved.

Obviously there are several questions worth asking when presented with this problem. The first being “can we have a more realistic piece of code”, also “which vesion of Oracle”, and “where are the execution plans”. I can’t help feeling that there’s more to the problem than just the three tables that seem to be suggested by the fragments supplied.

More significant, though, was the surprise that rule and driving_site should work together. There’s a long-standing (but incorrect) assertion that “any other hint invalidates the RULE hint”. I think I’ve published an example somewhere showing that /*+ unnest */ would affect an execution plan where the optimizer still obeyed the /*+ rule */ hint, and there’s an old post on this blog which points out that transformation and optimisation are (or were, at the time) independent of each other, implying that you could combine the rule hint with “transformational” hints and still end up with a rule-based execution plan.

Despite old memories suggesting the contrary my first thought was that the rule and driving_site hints couldn’t be working together – and that made it worth running a little test. Then one of the other specialists on the forums suggested using subquery factoring with the materialize hint – and I thought that probably wouldn’t help because when you insert into a global temporary table the driving site has to become the site that holds the global temporary tables (in fact this isn’t just a feature of GTTs). So there was another thing prompting me to run a test. (And then I suggested using the /*+ no_merge */ hint – but thought I’d check if that idea was going to work before I suggested it.)

So here’s a code sample to create some data, and the first two simple queries with calls for their predicted execution plans:

rem
rem     Script:         distributed_multi.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

rem     create public database link test@loopback using 'test';
rem     create public database link test2@loopback using 'test2';

rem     create public database link orcl@loopback using 'orcl';
rem     create public database link orcl2@loopback using 'orcl2';

rem     create public database link orclpdb@loopback using 'orclpdb';
rem     create public database link orclpdb2@loopback using 'orclpdb2';

define m_target=test@loopback
define m_target2=test2@loopback

define m_target=orcl@loopback
define m_target2=orcl2@loopback

define m_target=orclpdb@loopback
define m_target2=orclpdb2@loopback

create table t0
as
select  *
from    all_objects
where   mod(object_id,4) = 1
;

create table t1
as
select  *
from    all_objects
where   mod(object_id,11) = 0
;

create table t2
as
select  *
from    all_Objects
where   mod(object_id,13) = 0
;

explain plan for
select  /*+ driving_site(t1) */
        t1.object_name, t1.object_id
from    t1@&m_target    t1
where
        t1.object_id in (
                select  t0.object_id
                from    t0
        )
;

select * from table(dbms_xplan.display);

explain plan for
select
        /*+ rule driving_site(t2) */
        t2.object_name, t2.object_id
from    t2@&m_target2   t2
where
        t2.object_id in (
                select  t0.object_id
                from    t0
        )
;

select * from table(dbms_xplan.display);

Reading from the top down – t0 is in the local database, t1 is in remote database 1, t2 is in remote database 2. I’ve indicated the creation and selection of a pair of public database links at the top of the script – in this case both of them are loopback links to the local database, but I’ve used substitition variables in the SQL to allow me to adjust which databases are the remote ones. Since there are no indexes on any of the tables the optimizer is very limited in its choice of execution plans, which are as follows in 19.3 (the oraclepdb/orclpdb2 links).

First, the query against t1@orclpdb1 – which will run cost-based:


-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|      |  5168 |   287K|    57   (8)| 00:00:01 |        |      |
|*  1 |  HASH JOIN SEMI        |      |  5168 |   287K|    57   (8)| 00:00:01 |        |      |
|   2 |   TABLE ACCESS FULL    | T1   |  5168 |   222K|    16   (7)| 00:00:01 | ORCLP~ |      |
|   3 |   REMOTE               | T0   | 14058 |   178K|    40   (5)| 00:00:01 |      ! | R->S |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A1"."OBJECT_ID"="A2"."OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - SELECT "OBJECT_ID" FROM "T0" "A2" (accessing '!' )

Note
-----
   - fully remote statement

You’ll note that operation 3 is simply REMOTE, and t0 is the object accessed – which means this query is behaving as if the (local) t0 table is the remote one as far as the execution plan is concerned. The IN-OUT column tells us that this operation is “Remote to Serial” (R->S)” and the instance called to is named “!” which is how the local database is identified in the plan from a remote database.

We can also see that the execution plan gives us the “Remote SQL Information” for operation 2 – and that’s the text of the query that gets sent by the driving site to the instance that holds the object of interest. In this case the query is simply selecting the object_id values from all the rows in t0.

Now the plan for the query against t2@orclpdb2 which includes a /*+ rule */ hint:

-----------------------------------------------------------
| Id  | Operation              | Name     | Inst   |IN-OUT|
-----------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|          |        |      |
|   1 |  MERGE JOIN            |          |        |      |
|   2 |   SORT JOIN            |          |        |      |
|   3 |    TABLE ACCESS FULL   | T2       | ORCLP~ |      |
|*  4 |   SORT JOIN            |          |        |      |
|   5 |    VIEW                | VW_NSO_1 | ORCLP~ |      |
|   6 |     SORT UNIQUE        |          |        |      |
|   7 |      REMOTE            | T0       |      ! | R->S |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("A1"."OBJECT_ID"="OBJECT_ID")
       filter("A1"."OBJECT_ID"="OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   7 - SELECT /*+ RULE */ "OBJECT_ID" FROM "T0" "A2" (accessing '!' )

Note
-----
   - fully remote statement
   - rule based optimizer used (consider using cbo)

The most striking feature of this plan is that it is an RBO (rule based optimizer) plan not a cost-based plan – and the Note section confirms that observation. We can also see that the Remote SQL Information is echoing the /*+ RULE */ hint back in it’s query against t0. Since the query is operating rule-based the hash join mechanism is not available (it’s a costed path – it needs to know the size of the data that will be used in the build table), and that’s why the plan is using a sort/merge join.

Following the “incremental build” strategy for writing SQL all we have to do as the next step of producing the final code is put the two queries into separate views and join them:


explain plan for
select  v1.*, v2.*
from    (
        select  /*+ driving_site(t1) */
                t1.object_name, t1.object_id
        from    t1@&m_target    t1
        where
                t1.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v1,
        (
        select
                /*+ rule driving_site(t2) */
                t2.object_name, t2.object_id
        from    t2@&m_target2 t2
        where
                t2.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v2
where
        v1.object_id = v2.object_id
;

select * from table(dbms_xplan.display);

And here’s the execution plan – which, I have to admit, gave me a bit of a surprise on two counts when I first saw it:


-----------------------------------------------------------
| Id  | Operation              | Name     | Inst   |IN-OUT|
-----------------------------------------------------------
|   0 | SELECT STATEMENT       |          |        |      |
|   1 |  MERGE JOIN            |          |        |      |
|   2 |   MERGE JOIN           |          |        |      |
|   3 |    MERGE JOIN          |          |        |      |
|   4 |     SORT JOIN          |          |        |      |
|   5 |      REMOTE            | T2       | ORCLP~ | R->S |
|*  6 |     SORT JOIN          |          |        |      |
|   7 |      REMOTE            | T1       | ORCLP~ | R->S |
|*  8 |    SORT JOIN           |          |        |      |
|   9 |     VIEW               | VW_NSO_1 |        |      |
|  10 |      SORT UNIQUE       |          |        |      |
|  11 |       TABLE ACCESS FULL| T0       |        |      |
|* 12 |   SORT JOIN            |          |        |      |
|  13 |    VIEW                | VW_NSO_2 |        |      |
|  14 |     SORT UNIQUE        |          |        |      |
|  15 |      TABLE ACCESS FULL | T0       |        |      |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
       filter("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   8 - access("T2"."OBJECT_ID"="OBJECT_ID")
       filter("T2"."OBJECT_ID"="OBJECT_ID")
  12 - access("T1"."OBJECT_ID"="OBJECT_ID")
       filter("T1"."OBJECT_ID"="OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   5 - SELECT /*+ RULE */ "OBJECT_NAME","OBJECT_ID" FROM "T2" "T2"
       (accessing 'ORCLPDB2.LOCALDOMAIN@LOOPBACK' )

   7 - SELECT /*+ RULE */ "OBJECT_NAME","OBJECT_ID" FROM "T1" "T1"
       (accessing 'ORCLPDB.LOCALDOMAIN@LOOPBACK' )

Note
-----
   - rule based optimizer used (consider using cbo)

The two surprises were that (a) the entire plan was rule-based, and (b) the driving_site() selection has disappeared from the plan.

Of course as soon as I actually started thinking about what I’d written (instead of trusting the knee-jerk “just stick the two bits together”) the flaw in the strategy became obvious.

  • Either the whole query runs RBO or it runs CBO – you can’t split the planning.
  • In the words of The Highlander “There can be only one” (driving site that is) – only one of the database involved will decide how to decompose and distribute the query.

It’s an interesting detail that the /*+ rule */ hint seems to have pushed the whole query into the arms of the RBO despite being buried somewhere in the depths of the query rather than being in the top level query block – but we’ve seen that before in some old data dictionary views.

The complete disregard for the driving_site() hints is less interesting – there is, after all, a comment in the manuals somewhere to the effect that when two hints contradict each other they are both ignored. (But I did wonder why the Hint Report that should appear with 19.3 plans didn’t tell me that the hints had been observed but not used.)

The other problem (from the perspective of the OP) is that the two inline views have been merged so the join order no longer reflects the two isolated components we used to have. So let’s fiddle around a little bit to see how close we can get to what the OP wants. The first step would be to add the /*+ no_merge */ hint to both inline view, and eliminate one of the /*+ driving_site() */ hints to see what happens, and since we’re modern we’ll also get rid of the /*+ rule */ hint:


explain plan for
select  v1.*, v2.*
from    (
        select  /*+ qb_name(subq1) no_merge driving_site(t1) */
                t1.object_name, t1.object_id
        from    t1@&m_target    t1
        where
                t1.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v1,
        (
        select
                /*+ qb_name(subq2) no_merge */
                t2.object_name, t2.object_id
        from    t2@&m_target2 t2
        where
                t2.object_id in (
                        select  t0.object_id
                        from    t0
                )
        )       v2
where
        v1.object_id = v2.object_id
;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|      |  4342 |   669K|    72   (9)| 00:00:01 |        |      |
|*  1 |  HASH JOIN             |      |  4342 |   669K|    72   (9)| 00:00:01 |        |      |
|   2 |   VIEW                 |      |  4342 |   334K|    14   (8)| 00:00:01 |        |      |
|   3 |    REMOTE              |      |       |       |            |          |      ! | R->S |
|   4 |   VIEW                 |      |  5168 |   398K|    57   (8)| 00:00:01 |        |      |
|*  5 |    HASH JOIN SEMI      |      |  5168 |   287K|    57   (8)| 00:00:01 |        |      |
|   6 |     TABLE ACCESS FULL  | T1   |  5168 |   222K|    16   (7)| 00:00:01 | ORCLP~ |      |
|   7 |     REMOTE             | T0   | 14058 |   178K|    40   (5)| 00:00:01 |      ! | R->S |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A2"."OBJECT_ID"="A1"."OBJECT_ID")
   5 - access("A3"."OBJECT_ID"="A6"."OBJECT_ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
   3 - EXPLAIN PLAN INTO "PLAN_TABLE" FOR SELECT /*+ QB_NAME ("SUBQ2") NO_MERGE */
       "A1"."OBJECT_NAME","A1"."OBJECT_ID" FROM  (SELECT DISTINCT "A3"."OBJECT_ID"
       "OBJECT_ID" FROM "T0" "A3") "A2","T2"@ORCLPDB2.LOCALDOMAIN@LOOPBACK "A1" WHERE
       "A1"."OBJECT_ID"="A2"."OBJECT_ID" (accessing '!' )

   7 - SELECT "OBJECT_ID" FROM "T0" "A6" (accessing '!' )

Note
-----
   - fully remote statement

In this plan we can see that the /*+ driving_site() */ hint has been applied – the plan is presented from the point of view of orclpdb (the database holding t1). The order of the two inline views has apparently been reversed as we move from the statement to its plan – but that’s just a minor side effect of the hash join (picking the smaller result set as the build table).

Operations 5 – 7 tell us that t1 is treated as the local table and used for the build table in a hash semi-join, and then t0 is accessed by a call back to our database and its result set is used as the probe table.

From operation 3 (in the body of the plan, and in the Remote SQL Information) we see that orclpdb has handed off the entire t2 query block to a remote operation – which is ‘accessing “!”. But there’s a problem (in my opinion) in the SQL that it’s handing off – the text is NOT the text of our inline view; it’s already been through a heuristic transformation that has unnested the IN subquery of our original text into a “join distinct view” – if we had used a hint to force this transformation it would have been the /*+ unnest(UNNEST_INNERJ_DISTINCT_VIEW) */ variant.

SELECT /*+ NO_MERGE */
        "A1"."OBJECT_NAME","A1"."OBJECT_ID"
FROM
       (SELECT DISTINCT "A3"."OBJECT_ID" "OBJECT_ID" FROM "T0" "A3") "A2",
       "T2"@ORCLPDB2.LOCALDOMAIN@LOOPBACK "A1"
WHERE
        "A1"."OBJECT_ID"="A2"."OBJECT_ID"

I tried to change this by adding alternative versions of the /* unnest() */ hint to the original query, following the query block names indicated by the outline information (not shown), but it looks as if the code path constructs the Remote SQL operates without considering the main query hints – perhaps the decomposition code is simply following the code path of the old heuristic “I’ll do it if it’s legal” unnest. The drawback to this is that if the original form of the text had been sent to the other site the optimizer that had to handle it could have used cost-based query transformation and may have come up with a better plan.

You may be wondering why I left the /*+ driving_site() */ hint in one of the inline views rather than inserting it in the main query block. The answer is simple – it didn’t seem to work (even in 19.3) when I put /*+ driving_site(t1@subq1) */ in the main query block.

tl;dr

The optimizer has to operate rule-based or cost-based, it can’t do a bit of both in the same query – so if you’ve got a /*+ RULE */ hint that takes effect anywhere in the query the entire query will be optimised under the rule-based optimizer.

There can be only one driving site for a query, and if you manage to get multiple driving_site() hints in a query that contradict each other the optimizer will ignore all of them.

When the optimizer decomposes a distributed query and produces non-trivial components to send to remote sites you may find that some of the queries constructed for the remote sites have been subject to transformations that you cannot influence by hinting.

Footnote

I mentioned factored subqueries and the /*+ materialize */ option in the opening notes. In plans where the attempt to specify the driving site failed (i.e. when the query ran locally) the factored subqueries did materialize. In any plans where the driving site was a remote site the factored subqueries were always inline. This may well be related to the documented (though not always implemented) restriction that temporary tables cannot take part in distributed transactions.

GTT LOBs

Sat, 2021-08-21 07:53

Searching my blog recently for some details about a temporary space problem I came across a draft I’d written a few years ago about LOBs in global temporary tables. It was a summary of an exchange from OTN that did a good job of capturing my general air of skepticism when reviewing database results. The basic problem came from a user who had discovered that when they included a LOB column in a global temporary table the LOBINDEX segment was created in the SYSTEM tablespace. It’s probably not going to be of much practical benefit to many people – but it does demonstrate a principle and a pattern of thinking, so here it is – 4 years late.

 

Like the “Earn $50M by helping me steal $100M” email the claim seemed a little suspect.  My basic approach to Oracle is: “if it looks unreasonable test it”, so I did. Here’s the first bit of SQL (which I ran on 12.1.0.2 on an empty schema):


rem
rem     Script:         gtt_lobs.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2016
rem     Purpose:
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4

create global temporary table gtt1(
        id      number          not null,
        v1      varchar2(10)    not null,
        c1      clob,
        constraint gtt1_pk primary key(id)
)
-- tablespace gtt_temp
;

select
        table_name, index_name, tablespace_name
from
        user_indexes
;

TABLE_NAME           INDEX_NAME                       TABLESPACE_NAME
-------------------- -------------------------------- ------------------------------
GTT1                 GTT1_PK
GTT1                 SYS_IL0000168694C00003$$         SYSTEM

Sure enough, the query against user_indexes says the LOBINDEX for the LOB in the global temporary table will be created in the SYSTEM tablespace! This is clearly ridiculous – so I’m not going to believe it until I’ve actually confirmed it. I tend to trust the data dictionary rather more than I trust the manuals and MOS – but the data dictionary is just a bunch of tables and a bit of SQL that’s been written by someone else so even the data dictionary can be wrong. It’s easy enough to test:

insert into gtt1 values(1,'a',rpad('x',4000,'x'));

select
        username, tablespace, segtype, segfile#, segblk#
from
        v$tempseg_usage
;

USERNAME                       TABLESPACE                      SEGTYPE     SEGFILE#    SEGBLK#
------------------------------ ------------------------------- --------- ---------- ----------
TEST_USER                      TEMP                            DATA             201     261632
TEST_USER                      TEMP                            INDEX            201     261760
TEST_USER                      TEMP                            LOB_DATA         201     262016
TEST_USER                      TEMP                            INDEX            201     261888

Whatever the data dictionary says, the actual segment (at file 201, block 261888 – it would be nice if the type were LOB_INDEX, but that still hasn’t been fixed, even in 21.3.0.0) has been created in the temporary tablespace. Checking the definiton of the dba_indexes view, I came to the conclusion that the property column of the sys.ind$ table hadn’t had the bit set to show that it’s an index associated with a temporary table; as a result the SQL in the view definition reports tablespace 0 (SYSTEM) rather than decoding the zero to a null.

You’ll note that there’s a commented reference to “tablespace gtt_temp” in my original table creation statement. It’s not terribly well known but in recent versions of Oracle you can associate a global temporary table with a specific temporary tablespace – this gives you some scope for easily monitoring the impact of particular global temporary tables on the total I/O resource usage. After re-running the test but specifying this tablespace as the location for the GTT I got the following results:


TABLE_NAME           INDEX_NAME                       TABLESPACE_NAME
-------------------- -------------------------------- ------------------------------
GTT1                 GTT1_PK                          GTT_TEMP
GTT1                 SYS_IL0000262251C00003$$         GTT_TEMP

USERNAME                       TABLESPACE                      SEGTYPE     SEGFILE#    SEGBLK#
------------------------------ ------------------------------- --------- ---------- ----------
TEST_USER                      GTT_TEMP                        DATA             202        512
TEST_USER                      GTT_TEMP                        INDEX            202        384
TEST_USER                      GTT_TEMP                        LOB_DATA         202        128
TEST_USER                      GTT_TEMP                        INDEX            202        256

The anomaly disappears – everything is reported as being linked to the gtt_temp tablespace (and that includes the table itself, as reported in view user_tables, though I haven’t included that query in the test or results).

Footnote:

A few months after the first draft of this note  I happened to rediscover it (but still failed to publish it) and after a quick search on MOS found the following bug (reported as fixed in 12.2, with a backport currently available to 11.2.0.3) :

Bug 18897516 : GLOBAL TEMPORARY TABLE WITH LOBCOLUMNS CREATE LOB INDEX IN SYSTEM SCHE

Interestingly the description says: “lob index created in system tablespace” rather than “lob index incorrectly reported as being in system tablespace”. You do have to be very careful with how you describe things if you don’t want to cause confusion – this “problem” isn’t a space management threat, it’s just an irritating reporting error.

 

 

Preferences

Mon, 2021-08-09 06:58

I made a few comments in the past about setting “table preferences” for stats collection – most significantlyj the table_cache_blocks preference that affects the calculation of the clustering_factor of all the indexes on the table, the incremental preference for dictating the strategy used for dealing with partitioned tables, and the method_opt preference for dictating precise requirments for histograms.

If you want to check the current preferences set for a table you can query the XXXX_tab_stat_prefs views. For some reason in the dim and distant past (perhaps in a beta release before the views had been created but perhaps because the views show only the preferences that have been set) I wrote a little script to report all the possible table preferences showing both the table value and the current global value.

rem
rem     Script: get_table_prefs.sql
rem     Dated:  ???
rem     Author: Jonathan Lewis
rem
rem     Last tested
rem             19.11.0.0
rem
rem     Notes
rem     Report the table preferences for a given
rem     owner and table.
rem
rem     Needs to find a list of all legal preferences.
rem             Global prefs are in:    optstat_hist_control$ (sname, spare4)
rem             Table prefs are in:     optstat_user_prefs$ (valchar / valnum)
rem
rem     The public view is dba_tab_stat_prefs / user_tab_stat_prefs.
rem     But if a table has no prefs set there are no rows in the view
rem
rem     This script currently has to be run by sys or a user with 
rem     the select privileges on sys.optstat_hist_control$ (and
rem     execute on dbms_stats).
rem

define m_owner = '&enter_schema'
define m_table = '&enter_tablename'


<<anon_block>>
declare
        pref_count      number(2,0) := 0;
begin
        dbms_output.new_line;
        dbms_output.put_line(
                        rpad('Preference',32) || ' ' ||
                        rpad('Table value',32) || ' ' ||
                        '[Global value]'
        );
        dbms_output.put_line(
                        rpad('=',32,'=') || ' ' ||
                        rpad('=',32,'=') || ' ' ||
                        '================================'
        );
        for c1 in (
                select  sname, spare4 
                from    sys.optstat_hist_control$
                where   spare4 is not null
        ) loop
                anon_block.pref_count := anon_block.pref_count + 1;
                
                dbms_output.put_line(
                        rpad(c1.sname,32) || ' ' ||
                        rpad(dbms_stats.get_prefs(c1.sname,'&m_owner','&m_table'),32) || ' ' 
                        || '[' || c1.spare4 || ']'
                );      

        end loop;
        dbms_output.new_line;
        dbms_output.put_line('Preferences reported: ' || anon_block.pref_count);
end;
/

While I’ve hardly ever used the script – and so haven’t considered reviewing the strategy it uses – the benefit of having it around means that when I have run it I’ve occasionally discovered new preferences that I hadn’t previously noticed (and ought to investigate).

Here’s a sample of the output – from a table with no special settings for preferences:

Preference                       Table value                      [Global value]
================================ ================================ ================================
TRACE                            0                                [0]
DEBUG                            0                                [0]
SYS_FLAGS                        1                                [1]
SPD_RETENTION_WEEKS              53                               [53]
CASCADE                          DBMS_STATS.AUTO_CASCADE          [DBMS_STATS.AUTO_CASCADE]
ESTIMATE_PERCENT                 DBMS_STATS.AUTO_SAMPLE_SIZE      [DBMS_STATS.AUTO_SAMPLE_SIZE]
DEGREE                           NULL                             [NULL]
METHOD_OPT                       FOR ALL COLUMNS SIZE AUTO        [FOR ALL COLUMNS SIZE AUTO]
NO_INVALIDATE                    DBMS_STATS.AUTO_INVALIDATE       [DBMS_STATS.AUTO_INVALIDATE]
GRANULARITY                      AUTO                             [AUTO]
PUBLISH                          TRUE                             [TRUE]
STALE_PERCENT                    10                               [10]
APPROXIMATE_NDV                  TRUE                             [TRUE]
APPROXIMATE_NDV_ALGORITHM        REPEAT OR HYPERLOGLOG            [REPEAT OR HYPERLOGLOG]
ANDV_ALGO_INTERNAL_OBSERVE       FALSE                            [FALSE]
INCREMENTAL                      FALSE                            [FALSE]
INCREMENTAL_INTERNAL_CONTROL     TRUE                             [TRUE]
AUTOSTATS_TARGET                 AUTO                             [AUTO]
CONCURRENT                       OFF                              [OFF]
JOB_OVERHEAD_PERC                1                                [1]
JOB_OVERHEAD                     -1                               [-1]
GLOBAL_TEMP_TABLE_STATS          SESSION                          [SESSION]
ENABLE_TOP_FREQ_HISTOGRAMS       3                                [3]
ENABLE_HYBRID_HISTOGRAMS         3                                [3]
TABLE_CACHED_BLOCKS              1                                [1]
INCREMENTAL_LEVEL                PARTITION                        [PARTITION]
INCREMENTAL_STALENESS            ALLOW_MIXED_FORMAT               [ALLOW_MIXED_FORMAT]
OPTIONS                          GATHER                           [GATHER]
GATHER_AUTO                      AFTER_LOAD                       [AFTER_LOAD]
STAT_CATEGORY                    OBJECT_STATS, REALTIME_STATS     [OBJECT_STATS, REALTIME_STATS]
SCAN_RATE                        0                                [0]
GATHER_SCAN_RATE                 HADOOP_ONLY                      [HADOOP_ONLY]
PREFERENCE_OVERRIDES_PARAMETER   FALSE                            [FALSE]
AUTO_STAT_EXTENSIONS             OFF                              [OFF]
WAIT_TIME_TO_UPDATE_STATS        15                               [15]
ROOT_TRIGGER_PDB                 FALSE                            [FALSE]
COORDINATOR_TRIGGER_SHARD        FALSE                            [FALSE]
MAINTAIN_STATISTICS_STATUS       FALSE                            [FALSE]
AUTO_TASK_STATUS                 OFF                              [OFF]
AUTO_TASK_MAX_RUN_TIME           3600                             [3600]
AUTO_TASK_INTERVAL               900                              [900]

Preferences reported: 41

As the notes that I’ve left in-line say: this version of the script has to be run by SYS or a DBA because of the privileges required.

You might notice, by the way , that this is one of those rare cases where I’ve remembered to use a label to name the PL/SQL block, and then used the label to qualify a variable I’ve used inside the block.

Sequence Accelerator

Fri, 2021-08-06 05:22

Connor McDonald has just published a blog note about a tweak to sequences that appeared in recent versions of Oracle (19.10 – see tweet from Timur Akhmadeev).

To address the problems caused by people leaving the sequence cache size at the default of 20 (leading to contention on very busy sequences – see footnote) Oracle’s internal code will now check the rate at which a sequence nextval is being called and “ignore” the cache definition, using larger and larger values to bump the sequence highwater in the updates to the seq$ table.

Connor pointed out that if you really wanted to see how big the jump might get you could crash your instance in the middle of a run, and see how large the gap in the sequence was at the next startup. But if you want to experiment a little further with the feature here’s a less painful way of doing it – enable SQL trace for just the sequence update statement – which in current versions has an SQL_ID of 4m7m0t6fjcs5x:

alter system  set events 'sql_trace[SQL:4m7m0t6fjcs5x] wait=false, bind=true';

-- wait a bit

alter system  set events 'sql_trace[SQL:4m7m0t6fjcs5x] off';

I’ve shown how to set the trace at the system level but it is possible to use the session level, and I’ve requested bind variables to be dumped on every execution of the statement. After you’ve got some trace files you can examine them to pick out the relevant values. (In a unix environment I’d use grep and awk to automate this).

Here’s a little script to create a table and sequence, enable tracing, then hammer the sequence. I’ve left everything to default so the sequence cache will be 20 and on older versions of Oracle we’d see the highwater mark of the sequence incremented by 20 on each update to seq$. I’m running 19.11.0.0

rem
rem     Script:         trace_seq_update.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2021
rem
rem     Last tested 
rem             19.11.0.0
rem

define m_sql_id = '4m7m0t6fjcs5x'

drop table t1;
drop sequence s1;

create table t1 (n1 number (10,0));
create sequence s1;

insert into t1 values(0);
commit;

spool trace_seq_update.lst

select  cache 
from    user_sequences 
where   sequence_name = 'S1'
;


alter session  set events 'sql_trace[SQL:&m_sql_id] wait=false, bind=true';

insert into t1 select s1.nextval 
from    all_objects
where   rownum <= 5000
/

alter session  set events 'sql_trace[SQL:&m_sql_id] off';

And here’s the first extract from the trace file:

PARSING IN CURSOR #140658757581416 len=129 dep=1 uid=0 oct=6 lid=0 tim=338002229669 hv=2635489469 ad='77638310' sqlid='4m7m0t6fjcs5x'
update seq$ set increment$=:2,minvalue=:3,maxvalue=:4,cycle#=:5,order$=:6,cache=:7,highwater=:8,audit$=:9,flags=:10 where obj#=:1
END OF STMT
BINDS #140658757581416:

To find the values used to update highwater from this point onwards I just kept searching for “BINDS #140658757581416:”,stepping down to “Bind#6”, and reporting the “value=” line that was 4 lines beyond that.

If you want to repeat the tests you’ll (probably) find that your cursor number (BINDS #nnnnnnnnnnnn) is difference. If you’ve done a system-wide trace, of course, you might have multiple sequences updated in the same trace file, in which case you’ll also need to report the value for “Bind#9” to separate the different sequences. Moreover, just to make automatic harder, you may find that the update cursor closes and re-opens with a new cursor number from time to time.

Here’s the complete list of Bind#6 entries for my test:

 Bind#6
  oacdty=02 mxl=22(02) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=76723ce7  bln=22  avl=02  flg=09
  value=21

 Bind#6
  oacdty=02 mxl=22(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=76723ce7  bln=22  avl=03  flg=09
  value=221

 Bind#6
  oacdty=02 mxl=22(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=76723ce7  bln=22  avl=03  flg=09
  value=2221

 Bind#6
  oacdty=02 mxl=22(04) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=76723ce7  bln=22  avl=04  flg=09
  value=22221

As you can see, the highwater jumps by 20, then 200, then 2,000 then 20,000. As a preliminary hypothesis it looks as if Oracle is going to take the cache size (BIND#5) as a base, but escalate that value internally by powers of 10 until the frequency of updates to seq$ drops to an acceptable value. (The value of cache – Bind#5 – isn’t changed by this mechanism, however).

Connor supplies a link to the 21c documentation on the feature – but it’s a bit thin, so there’s some scope for checking if RAC or scaled/expanded sequences produce any unexpected or puzzling side-effects.

Timur’s tweet also supplied a reference to MOS Doc ID: 2790985.1 Sequence dynamic cache resizing feature which has some helpful technical details.

Footnote

I’ve written a 4-part series on sequences for Simpletalk (final part awaiting publication), starting with at this URL.

Finding SQL

Thu, 2021-08-05 12:24

There are some questions about performance issues for which there is no easy answer, and sometimes the best you can do is try to generate an approximate answer then examine the results to eliminate the innocent.

Three such problems – essentially similar – appeared recently on the Oracle Developer Forum, and in this note I’ll supply a mechanism that may be a good enough step in the right direction for at least two of them. The questions were:

  • How do I find who’s been using up all the temporary space recently?
  • How do I find the execution plans for all the SQL that is called by a procedure?
  • What SQL is responsible for generating most redo?

A basic (but incomplete) strategy for attacking these questions is to think of a way of to identify the sql_id and child_number for statements that might be contributing to the problem. If you can think of a suitable attack then you can get all the execution plans (or SQL Monitor reports) for those statements and examine them further.

The strategy is incomplete on two counts – first that you won’t get find the perfect set of statements, you’ll get more than you need but still miss some that are relevant; second that you’ll probably have to run secondary queries to get extra details about statements that look like good culprits for the problem you’re trying to solve.

Who’s been using the temporary space

The way this question was actually posed was more like:

I’ve got a query that started crashing with Oracle error “ORA-01652: unable to extend temp segment by 256 in tablespace TEMP”, but when I check the contents of the temporary tablespace (TEMP) there’s plenty of space available. How do I fix this problem.

This often means your query has changed execution plan and picked a very bad join order with some hash (or merge) joins that have dumped huge amount of data to disc; but it may mean that the space was being taken up by some other activity that doesn’t usually happen when you’re running your query.

So if you find that the SQL Monitor report (or simple call to dbms_xplan.display_cursor) makes you think that your query wasn’t doing anything differently a step that may help is to find all the sql in the library cache that might have been using a lot of temporary space.

For temporary space we can always check v$sql_workarea to identify operations from plans ran as one-pass or multi-pass operations, and we can check their most recent “tempseg size” or maximum tempseg size. Hence:

rem
rem     Script:         check_full_2a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2021
rem
rem     Columns of interest for temporary space:
rem     onepass_executions, multipasses_executions, last_tempseg_size, max_tempseg_size

select  
        distinct sql_id, child_number
from    v$sql_workarea 
where   onepass_executions != 0
or      multipasses_executions != 0
-- or      max_tempseg_size > 1e7
/


select
        distinct
        sql.sql_id, sql.child_number
from
        v$sql           sql,
        v$sql_workarea  war
where
        (   war.onepass_executions != 0
         or war.multipasses_executions != 0
        )
and     sql.sql_id = war.sql_id
and     sql.child_number = war.child_number
and     sql.last_active_time > sysdate - 15/1440
;

The first query here simply scans through the v$sql_workarea structure (which means it will actually thrash its way through the library cache) looking for operations that spilled to disk or (commented out) have used at least some specified amount of memory.

The second variation joins v$sql_workarea to v$sql so that it can restrict the chosen statements to those that were active some time in the last 15 minutes.

Obviously you will be able to think of other ways of tweaking these statements, and once you’ve got a statement expressing a suitable set of criteria you can embed it into a query that calls dbms_xplan.display_cursor() – as I demontrated about 10 years ago – or dbms_sql_monitor.report_sql_monitor() if you’re suitably licensed.

set linesize 230
set trimspool on
set pagesize 90
set tab off

set long 20000


select
        plan_table_output
from    (
        select  
                distinct sql_id, child_number
        from    v$sql_workarea 
        where   onepass_executions != 0
        or      multipasses_executions != 0
        ) v,
        table(dbms_xplan.display_cursor(v.sql_id, v.child_number, 'memstats'))
;


select
        dbms_sql_monitor.report_sql_monitor(
                sql_id             => v.sql_id,
                start_time_filter  => sysdate - 15/1440,
                type               =>'TEXT'
        ) text_line
from    (
        select
                distinct sql.sql_id
        from
                v$sql           sql,
                v$sql_workarea  war
        where
                (   war.onepass_executions != 0
                or war.multipasses_executions != 0
                )
        and     sql.sql_id = war.sql_id
        and     sql.child_number = war.child_number
        and     sql.last_active_time > sysdate - 15/1440
        ) v
/



A couple of points to note. I’ve included the MEMSTATS format option in the call to dbms_xplan.display_cursor() so that it shows some summary information from v$sql_workarea. However this does have a defect, it doesn’t show space used in temp by materialized “with” subqueries (CTEs) – which is where the call to dbms_sql_monitor.report_sql_monitor() helps because if the execution was captured it will show writes to disc in the “LOAD AS SELECT” operation under the TEMPORARY TABLE TRANSFORMATION operation.

Here’s a sample of output I got from the two queries after forcing a nasty plan to do a big hash join that ultimately produced a small result set.

First the output from the query using package dbms_xplan:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
SQL_ID  1cwabt12zq6zb, child number 0
-------------------------------------
with ttemp as (  select /*+ materialize */ * from t1 ) select  /*+
no_partial_join(@sel$2 t1b) no_place_group_by(@sel$2) */
t1a.object_type,  max(t1a.object_name) from  ttemp t1a, ttemp t1b where
 t1a.object_id = t1b.object_id group by  t1a.object_type order by
t1a.object_type

Plan hash value: 1682228242

-----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                      | Starts | E-Rows | A-Rows |   A-Time   |  OMem |  1Mem |  O/1/M   | Max-Tmp |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                           |      1 |        |     45 |00:00:16.24 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                           |      1 |        |     45 |00:00:16.24 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6645_5FF4FE |      1 |        |      0 |00:00:02.53 |  2070K|  2070K|          |         |
|   3 |    TABLE ACCESS FULL                     | T1                        |      1 |   1154K|   1154K|00:00:00.55 |       |       |          |         |
|   4 |   SORT GROUP BY                          |                           |      1 |     45 |     45 |00:00:13.72 | 11264 | 11264 |     1/0/0|         |
|*  5 |    HASH JOIN                             |                           |      1 |     18M|     18M|00:00:08.63 |    58M|    12M|          |      71M|
|   6 |     VIEW                                 |                           |      1 |   1154K|   1154K|00:00:01.03 |       |       |          |         |
|   7 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6645_5FF4FE |      1 |   1154K|   1154K|00:00:01.03 |       |       |          |         |
|   8 |     VIEW                                 |                           |      1 |   1154K|   1154K|00:00:00.49 |       |       |          |         |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6645_5FF4FE |      1 |   1154K|   1154K|00:00:00.43 |       |       |          |         |
-----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")


30 rows selected.



Note the absence of any numbers in the Max-Tmp column for operation 2 (highlighted line 18).

Then compare with the results below of the the query using package dbms_sql_monitor:

TEXT_LINE
-----------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
with ttemp as ( select /*+ materialize */ * from t1 ) 
select /*+ no_partial_join(@sel$2 t1b) no_place_group_by(@sel$2) */ t1a.object_type, max(t1a.object_name) 
from ttemp t1a, ttemp t1b where t1a.object_id = t1b.object_id 
group by t1a.object_type order by t1a.object_type

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  SYS (10:47180)
 SQL ID              :  1cwabt12zq6zb
 SQL Execution ID    :  16777216
 Execution Started   :  08/05/2021 17:24:56
 First Refresh Time  :  08/05/2021 17:25:00
 Last Refresh Time   :  08/05/2021 17:25:12
 Duration            :  16s
 Module/Action       :  MyModule/MyAction
 Service             :  orclpdb
 Program             :  sqlplus@linux183.localdomain (TNS V1-V3)
 Fetch Calls         :  4

Global Stats
================================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes | Reqs  | Bytes |
================================================================================
|      16 |      14 |     2.23 |     4 |  84677 |  986 | 253MB |   866 | 219MB |
================================================================================

SQL Plan Monitoring Details (Plan Hash Value=1682228242)
=====================================================================================================================================================================================================================
| Id |                 Operation                  |           Name            |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Write | Write |  Mem  | Temp  | Activity | Activity Detail |
|    |                                            |                           | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | Reqs  | Bytes | (Max) | (Max) |   (%)    |   (# samples)   |
=====================================================================================================================================================================================================================
|  0 | SELECT STATEMENT                           |                           |         |       |         1 |    +16 |     1 |       45 |      |       |       |       |     . |     . |          |                 |
|  1 |   TEMP TABLE TRANSFORMATION                |                           |         |       |         1 |    +16 |     1 |       45 |      |       |       |       |     . |     . |          |                 |
|  2 |    LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D6645_5FF4FE |         |       |         5 |     +0 |     1 |        2 |      |       |   351 | 176MB |     . |     . |          |                 |
|  3 |     TABLE ACCESS FULL                      | T1                        |      1M |  3166 |         1 |     +4 |     1 |       1M |      |       |       |       |     . |     . |          |                 |
|  4 |    SORT GROUP BY                           |                           |      45 | 17727 |        11 |     +6 |     1 |       45 |      |       |       |       | 10240 |     . |          |                 |
|  5 |     HASH JOIN                              |                           |     18M |  9804 |        13 |     +4 |     1 |      18M |  561 |  66MB |   561 |  66MB |  59MB |  71MB |          |                 |
|  6 |      VIEW                                  |                           |      1M |  2966 |         1 |     +4 |     1 |       1M |      |       |       |       |     . |     . |          |                 |
|  7 |       TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6645_5FF4FE |      1M |  2966 |         2 |     +3 |     1 |       1M |  351 | 176MB |       |       |     . |     . |          |                 |
|  8 |      VIEW                                  |                           |      1M |  2966 |         3 |     +6 |     1 |       1M |      |       |       |       |     . |     . |          |                 |
|  9 |       TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6645_5FF4FE |      1M |  2966 |         5 |     +4 |     1 |       1M |   74 |  12MB |       |       |     . |     . |          |                 |
=====================================================================================================================================================================================================================


1 row selected.

In this report you can see that the materialization resulted in 176MB of data being written to temp at operation 2 (highlighted line 43), which is actually more than was written to temp as the hash join spilled over from memory.

There are a number of defects in this call to dbms_sql_monitor – including the need for an extra cost license. Most particularly (a) the plan is captured only if the query runs for more than 6 seconds or has a parallel componet and (b) we’re only passing in the SQL_ID with a time limit, so we could get several executions reported. We could refine the inputs, though, by including the sql_plan_hash_value in our query against v$sql. and we might include an end-time filter.

Whatever we do to minimise the number of plans reported the point will almost certainly come where we have to do eyeball the data to see if we can identify the queries which were almost certainly running and using the temp space we needed.

How do I find the execution plans for all the SQL that is called by a procedure?

To be written

What SQL is responsible for generating most redo?

To be written

Memory_target

Thu, 2021-08-05 07:10

When defining memory usage to an Oracle instance you can specify an “SGA target” and a separate “PGA target”; alternatively you could specify a single “Memory target”. I’ve not seen many people using the second option and there are reasons why it’s not a good idea – like the side effect it has on the use of large/huge pages – but a thread appeared on the Oracle Developers’ forum recently asking why the Buffer Cache Advisory section of an AWR report would suggest increasing the buffer cache when the Memory Statistics section showed the PGA+SGA usage to be just 6GB out a declared memory_target of 8GB – why was Oracle “wasting” (or losing) 2GB of memory that the report said could be put to good use.

My first thought was that there was a reporting error – maybe a coding error, maybe a problem of inconsistent definition, so my first piece of advice was to poke around for a little more data; and while that was going on I thought I’d run up an instance specifying a memory_target to see what funny numbers might appear.

My first attempt at instance startup failed because I was using a small machine and had allocated more than half the memory to huge pages – and Oracle didn’t want to use huge pages when the memory target was set, so it ran into various resource problems and let me know that I would need to change a few O/S settings 0 so I reduced the memory_target to something small enough to allow the instance to start.

This is where my memory/sga/pga settings ended when the instance finally started (cut-n-paste with some of the output deleted):

SQL> show parameter target

NAME                                 TYPE        VALUE                          
------------------------------------ ----------- ------------------------------ 
memory_max_target                    big integer 1808M                          
memory_target                        big integer 1808M                          
pga_aggregate_target                 big integer 0                              
sga_target                           big integer 0                              

The 1,808MB was what I’d set in the startup file, and here’s a little corroboration of the figures:

SQL> show sga

Total System Global Area 1895823408 bytes
Fixed Size                  9136176 bytes
Variable Size            1107296256 bytes
Database Buffers          771751936 bytes
Redo Buffers                7639040 bytes

The top line is the sum of the next 4 figures, and the numbers add up to 1,808 MB (with an error of 2,000 bytes) but isn’t it interesting that the total s reported as the Total System Global Area – apparently leaving nothing for the PGA.

How does this compare with information from the next available AWR report.

Memory Statistics
~~~~~~~~~~~~~~~~~                       Begin          End
                                 ------------ ------------
                  Host Mem (MB):      3,692.4      3,692.4
                   SGA use (MB):      1,072.0      1,072.0
                   PGA use (MB):        334.9        381.0
    % Host Mem used for SGA+PGA:        38.10        39.35

Suddenly the SGA is only 1,072MB, while the PGA is now reporting 334MB – which means that somewhere something is failing to report the roughly 400MB. Maybe the “Dynamic Memory Components” section of the AWR will help (I’ve deleted all the zero lines here):

                 Begin Snap     Current         Min         Max   Oper Last Op
Component         Size (Mb)   Size (Mb)   Size (Mb)   Size (Mb)  Count Typ/Mod
--------------- ----------- ----------- ----------- ----------- ------ -------
DEFAULT buffer       704.00      704.00      704.00      752.00      0 SHR/IMM
PGA Target           736.00      736.00      736.00      736.00      0 STA/
SGA Target         1,072.00    1,072.00    1,072.00    1,072.00      0 STA/
Shared IO Pool        48.00       48.00         .00       48.00      0 GRO/IMM
java pool             16.00       16.00       16.00       16.00      0 STA/
large pool            16.00       16.00       16.00       16.00      0 STA/
shared pool          272.00      272.00      272.00      272.00      0 STA/
                          ------------------------------------------------------

That seems to be a “Current Size” total of 2,864MB – but luckily we can spot that the presence of the SGA Target in the list means we’re double counting, and the total excluding that line is 1,792MB. That’s a shortfall of 16MB from our declared 1,808MB, but that’s okay because with this small memory target Oracle is operating in 16MB granules and one granule has been reserved for the fixed memory and public redo log buffers.

So we haven’t lost a big chunk of the memory_target – what we (or the OP) are seeing is a variation in the purpose of the two different parts of the AWR. The Memory Statistics tell us about PGA being used (at the moment of the snapshot) by current sessions, not about the memory that is reserved as an “internal” PGA target. Maybe some investigation of the Memory Resize Operations would show us that Oracle has reached this distribution of memory between SGA and PGA over a period of time as a fairly stable position with some movement of granules back and forth between the two. We could also check the summary of Process Memory that appears in the AWR.

Geek stuff

I had overlooked the possibility of simply looking at the Dynamic Memory Components section of the AWR to check whether the “missing” memory was allocated to the PGA,, so in the original thread on the developer forum I supplied a query that I wrote many years ago to run as sys against the x$ksmge (granule map) when suitable v$ didn’t exist, so just for completeness here’s that query with the results:

set linesize 156
set trimspool on
set pagesize  60
set tab off

column indx             format 999
column component        format a20
column cursize          format 9,999
column gransize         format 999,999,999,999
column grantype         format 999
column granstate        format a10

column ct               format 9,999
column total_memory     format 999,999,999,999

break on report
compute sum of total_memory on report

select
        sct.indx, sct.component, sct.cursize, 
        ge.gransize, ge.grantype, ge.granstate, ct,
        ge.gransize * sct.cursize total_memory
from
        x$kmgsct        sct,
        (
        select
                ge.grantype, ge.granstate, ge.gransize,
                count(*) ct
        from
                x$ksmge         ge
        group by
                ge.grantype, ge.granstate, ge.gransize
        )       ge
where
        ge.grantype(+) = sct.grantype
and     sct.cursize != 0
order by
        sct.indx, sct.component, ge.granstate
;

INDX COMPONENT            CURSIZE         GRANSIZE GRANTYPE GRANSTATE      CT     TOTAL_MEMORY
---- -------------------- ------- ---------------- -------- ---------- ------ ----------------
   0 shared pool               18       16,777,216        1 ALLOC          18      301,989,888
   1 large pool                 1       16,777,216        2 ALLOC           1       16,777,216
   2 java pool                  1       16,777,216        3 ALLOC           1       16,777,216
   5 SGA Target                67
   7 DEFAULT buffer cache      43       16,777,216        9 ALLOC          43      721,420,288
  15 Shared IO Pool             3       16,777,216       17 ALLOC           3       50,331,648
  20 PGA Target                46
                                                                              ----------------
sum                                                                              1,107,296,256


As you can see, this reports the SGA and PGA targets in terms of granules available, but neither reports any granules actually allocated. However, adding the 46 PGA Target granules of 16MB to the total memory of (approx) 1.1GB we can see that Oracle is clearly aware of the full 1,808 MB (less the one fixed granule), and the oddity of “lost memory” was simply about the choice of what to report.

Memory Target and O/S

Just for completeness – here’s a pair of extracts from the alert log of the instance startup. The first with memory_target set, the second using the sga_target and pga_aggregate_target and leaving the memory_target unset. I’ve removed a load of lines which were simply the timestamps (separated by microseconds):

Starting ORACLE instance (normal) (OS id: 2025)
****************************************************
 /dev/shm will be used for creating SGA
Large pages will not be used. Only standard 4K pages will be used
****************************************************
**********************************************************************
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)

 Per process system memlock (soft) limit = 128G
 Expected per process system memlock (soft) limit to lock
 instance MAX SHARED GLOBAL AREA (SGA) into memory: 1808M

 Available system pagesizes:
  4K, 2048K

 Supported system pagesize(s):
  PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)
        4K       Configured          462853          462853        NONE

 Reason for not supporting certain system pagesizes:
  2048K - Dynamic allocate and free memory regions

Note the warning: Large pages will not be used. Only standard 4K pages will be used. That’s half the machine memory unavailable, and lots of memory used by per-process memory maps

Starting ORACLE instance (normal) (OS id: 2968)
****************************************************
 Sys-V shared memory will be used for creating SGA
 ****************************************************
**********************************************************************
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)

 Per process system memlock (soft) limit = 128G
 Expected per process system memlock (soft) limit to lock
 instance MAX SHARED GLOBAL AREA (SGA) into memory: 1122M

 Available system pagesizes:
  4K, 2048K

 Supported system pagesize(s):
  PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)
        4K       Configured               4               4        NONE
     2048K             1200             561             561        NONE

Summary

While I’ve highlighted a couple of details about the differences between memory target and sga/pga target, the main point of this note comes back to something I’ve said many times in the past. When you’re looking at reports (AWR / Statspack / home-grown) you do need to understand the meaning of the figures in the report, and when they don’t seem to make sense you need to cross-check with other sources so that you can be confident that you understand what the figures are showing.

In this case the “Host Memory” usage figures for the PGA probably reflect the current (and instantaneous) usage by Oracle processes, not the granules allocated to the instance by the operating system. There are other parts of the AWR report that tell you how the granules are split between the SGA and PGA, and while I haven’t shown it, there’s a section of the report (Memory Resize Operations) that tells you if the instance has been under pressure to move memory granules between the SGA and PGA.

SQL Macro

Thu, 2021-07-22 04:18

A question came up recently on the Oracle Developer forum that tempted me into writing a short note about SQL Macro functions – a feature that was touted for 20c but which has been back-ported to the more recent releases of 19c. Specifically I set up this demo using 19.11.0.0.

The OP supplied a script to prepare some data. I’ll postpone that to the end of this note and start with variations of the query that could be used against that data set. I’ll be looking at the original query, a variant of the query that uses a pipelined function, then a variant that uses an SQL Macro function.

The requirement starts with a query to turn a pair of dates into a date range – which can be done in many ways but the OP had used a recursive “with subquery” (CTE/common table expression).

with calendar ( start_date, end_date ) as (
        select date '2021-07-01', date '2021-07-30' from dual
        union all
        select start_date + 1, end_date
        from   calendar
        where  start_date + 1 <= end_date
)
select start_date as day
from   calendar
;

Getting on to the full requirement we can use this subquery as if it were a table (or inline view) and join it to any other tables where we want data from a date range, for example:

select
        e.employee_id, c.day
from
        employees e
inner join
        (
                with calendar ( start_date, end_date ) as (
                        select date '2021-07-01', date '2021-07-30' from dual
                        union all
                        select start_date + 1, end_date
                        from   calendar
                        where  start_date + 1 <= end_date
                )
                select start_date as day
                from   calendar
        ) c
partition by
        (e.employee_id)
on      (substr(e.work_days, trunc(c.day) - trunc(c.day, 'IW') + 1, 1) = 'Y')
where
        not exists (
                select  1
                from    holidays h
                where   c.day = h.holiday_date
        )
and     not exists(
                select  1
                from    timeoff t
                where   e.employee_id = t.employee_id
                and     t.timeoff_date = c.day
        )
order by
        e.employee_id,
        c.day

If we want a report for a different month we just have to supply a different pair of dates, and we can probably work out a way of making it easy for the end-users to supply those dates as parameters to a report.

The pipelined function

However, we may want to use the same little “recursive CTE” (or similar) pattern in many different reports, and ad hoc queries that users might want to write for themselves. To avoid wasting time on logic, or basic typing errors, is it possible to hide some of the complexity of the subquery structure. The answer is yes, and for a long time we could have used a “pipelined function” to do this – though we have to create a simple object table and an object table type to do do. For example:

create or replace type obj_date is object (day date);
/

create or replace type nt_date is table of obj_date;
/

create or replace function generate_dates_pipelined(
        p_from  in date,
        p_to    in date
)
return nt_date 
pipelined
is
begin
        for c1 in (
                with calendar (start_date, end_date ) as (
                        select trunc(p_from), trunc(p_to) from dual
                        union all
                        select start_date + 1, end_date
                        from   calendar
                        where  start_date + 1 <= end_date
                )
                select start_date as day
                from   calendar
        ) loop
                pipe row (obj_date(c1.day));
        end loop;

        return;

end generate_dates_pipelined;
/

I’ve started by creating an object type with a single attribute called day of type date, and an object table type of that object type. This means I can use the object type and the object table type to pass data between SQL and PL/SQL. Then I’ve created a pl/sql function that returns the object table type, but in a pipelined fashion using the pipe row() mechanism to supply the data one object at a time.

In my final SQL I can now use the table() operator to cast the result of the function call from an object table to a relational table, implicitly mapping the object attributes to their basic Oracle data types.

select
        e.employee_id, c.day
from
        employees e
inner join
        table(generate_dates_pipelined(date '2021-07-01', date '2021-07-30')) c
partition by
        (e.employee_id)
on      (substr(e.work_days, trunc(c.day) - trunc(c.day, 'IW') + 1, 1) = 'Y')
where
        not exists (
                select  1
                from    holidays h
                where   c.day = h.holiday_date
        )
and     not exists(
                select  1
                from    timeoff t
                where   e.employee_id = t.employee_id
                and     t.timeoff_date = c.day
        )
order by
        e.employee_id,
        c.day
;

I’ve replaced the 9 lines of the inline “with subquery” by a single line call:

        table(generate_dates_pipelined(date '2021-07-01', date '2021-07-30')) c

In fact the table() operator hasn’t been needed since some time in the 12c timeline, but it might be useful as a little reminder of what’s going on behind the scenes. It’s also a reminder that the data really will behave as if it’s coming from a relational table rather then a pl/sql loop.

Although this pipelined function approach can be very effective another member of the forum pointed out that behind the scenes it is depending on a pl/sql loop walking through a cursor which, in this example, was row by row processing (though it could be changed to bulk collect with a limit to improve performance a little). So we might want to look at options for doing things differently.

The SQL Macro function

In many programming languages a “macro” is a symbol that is used as a short-hand for a longer piece of code. Even in environments like your favourite shell environment you can usually set up shorthand for longer texts that you use frequently, for example:

alias otr="cd /u01/app/oracle/diag/rdbms/or19/or19/trace"

The Oracle equivalent is a PL/SQL function (declared as a “SQL_Macro” function) that you include in your SQL statement, and at run-time Oracle will execute the function and use the text it returns to modify your statement. Here’s the macro strategy applied to the date range generation:

create or replace function generate_dates_macro(
        p_from  in date,
        p_to    in date
)
return varchar2
sql_macro
is
        v_sql varchar2(4000) := q'{
                with calendar (start_date, end_date ) as (
                        select
                                to_date('xxxx-xx-xx','yyyy-mm-dd'),
                                to_date('yyyy-yy-yy','yyyy-mm-dd')
                        from    dual
                        union all
                        select start_date + 1, end_date
                        from   calendar
                        where  start_date + 1 <= end_date
                )
                select start_date as day
                from   calendar
                }'
        ;

begin
        v_sql := replace(v_sql,'xxxx-xx-xx',to_char(p_from,'yyyy-mm-dd'));
        v_sql := replace(v_sql,'yyyy-yy-yy',to_char(p_to  ,'yyyy-mm-dd'));

--      dbms_output.put_line(v_sql);
        return v_sql;

end generate_dates_macro;
/

I’ve created a function, flagged as a sql_macro, that returns a varchar2. It has two input parameters which are declared as dates. The initial value of the variable v_sql looks very similar to the CTE I used in the original query except the two “dates” it uses are “xxxx-xx-xx” and “yyyy-yy-yy”, but in the body of the function I’ve replaced those with the text forms of the two incoming date parameters. There’s a call to dbms_output.put_line() that I’ve commented out that will show you that the final text returned by the function is:

                with calendar (start_date, end_date ) as (
                        select
                                to_date('2021-07-01','yyyy-mm-dd'),
                                to_date('2021-07-30','yyyy-mm-dd')
                        from    dual
                        union all
                        select start_date + 1, end_date
                        from   calendar

                 where  start_date + 1 <= end_date
                )
                select start_date as day
                from   calendar

So now we can rewrite the original statement as follows (with just a minor change from the pipelined version):

select
        e.employee_id, c.day
from
        employees e
inner join
        generate_dates_macro(date '2021-07-01', date '2021-07-30') c
partition by
        (e.employee_id)
on      (substr(e.work_days, trunc(c.day) - trunc(c.day, 'IW') + 1, 1) = 'Y')
where
        not exists (
                select  1
                from    holidays h
                where   c.day = h.holiday_date
        )
and     not exists(
                select  1
                from    timeoff t
                where   e.employee_id = t.employee_id
                and     t.timeoff_date = c.day
        )
order by
        e.employee_id,
        c.day
;

When we execute this statement Oracle evaluates the function, slots the generated text in place, then optimises and executes the resulting text. Interestingly the text reported by a call to dbms_xplan.display_cursor() shows the original text even though the plan clearly includes references to the table(s) in the SQL macro – a search of the library cache shows the same text, but also reveals an anonymous pl/sql block calling the SQL Macro function (in a style reminiscent of the way that row-level security (RLS, FGAC, VPD) calls a security predicate function) that is invisibly folded into a query.

declare
begin 
        :macro_ text := "GENERATE_DATES_MACRO"(
                TO_DATE(' 2021-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'),
                TO_DATE(' 2021-07-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
        );
end;

Here’s the execution plan for the query using the SQL Macro:

--------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                     | Name       | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                              |            |      1 |        |    41 (100)|     83 |00:00:00.01 |     130 |       |       |          |
|   1 |  SORT ORDER BY                                |            |      1 |      1 |    41   (5)|     83 |00:00:00.01 |     130 |  9216 |  9216 | 8192  (0)|
|*  2 |   FILTER                                      |            |      1 |        |            |     83 |00:00:00.01 |     130 |       |       |          |
|*  3 |    HASH JOIN ANTI                             |            |      1 |      1 |    39   (3)|     84 |00:00:00.01 |      46 |  1744K|  1744K| 1542K (0)|
|   4 |     NESTED LOOPS                              |            |      1 |      1 |    21   (0)|     88 |00:00:00.01 |      23 |       |       |          |
|   5 |      TABLE ACCESS FULL                        | EMPLOYEES  |      1 |      1 |    17   (0)|      4 |00:00:00.01 |      23 |       |       |          |
|*  6 |      VIEW                                     |            |      4 |      1 |     4   (0)|     88 |00:00:00.01 |       0 |       |       |          |
|   7 |       UNION ALL (RECURSIVE WITH) BREADTH FIRST|            |      4 |        |            |    120 |00:00:00.01 |       0 |  2048 |  2048 | 2048  (0)|
|   8 |        FAST DUAL                              |            |      4 |      1 |     2   (0)|      4 |00:00:00.01 |       0 |       |       |          |
|   9 |        RECURSIVE WITH PUMP                    |            |    120 |        |            |    116 |00:00:00.01 |       0 |       |       |          |
|  10 |     TABLE ACCESS FULL                         | HOLIDAYS   |      1 |      2 |    17   (0)|      1 |00:00:00.01 |      23 |       |       |          |
|* 11 |    INDEX UNIQUE SCAN                          | TIMEOFF_PK |     84 |      1 |     1   (0)|      1 |00:00:00.01 |      84 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( IS NULL)
   3 - access("START_DATE"="H"."HOLIDAY_DATE")
   6 - filter(SUBSTR("E"."WORK_DAYS",TRUNC(INTERNAL_FUNCTION("START_DATE"))-TRUNC(INTERNAL_FUNCTION("START_DATE"),'fmiw')+1,1)='Y')
  11 - access("T"."EMPLOYEE_ID"=:B1 AND "T"."TIMEOFF_DATE"=:B2)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

As you can see, even though the query as written didn’t include the recursive CTE, the recursive query against DUAL appears in the plan. In fact the plan is exactly the same as the plan for the original query with the embedded CTE, though there is one interesting little difference – the generated query block names differ between plans.

Pros and Cons

Given that this is a lightweight example of a simple use of the SQL macro there’s not really a lot that can be said when comparing pipelined functions with macro functions. Both hide complexity and give you the opportunity to optimise an awkward piece of the code that might be (in effect) a common sub-routine.

The pipelined function does have to deal with the PL/SQL to SQL interchange – but that’s not a significant feature here. The main benefits, perhaps, of the macro are that the plan shows you the table(s) that would be hidden by the pipelined function, and may allow the optimizer to get better estimates of data sizes because it will be examining real tables with real statistics rather than taking a guess at a “pickler fetch” from a collection with a block box function.

Update (pre-publication)

There is some pleasure to be had by making mistakes in public, because that’s when you can learn something new. In my example to the OP on the Developer forum I used a much messier piece of code to embed the date values into the macro string, with lots of doubled and trebled quotes, to_char() functions, and concatenation all over the place.

Alex Nuijten replied to my suggestion pointing out that this degree of complexity was not necessary, and you could reference the functions parameters to construct the string. The only problem with that was that it hadn’t worked when I had tried it. Alex’s comment, however, also mentioned the problem and supplied the explanation: Bug 32212976: USING SCALAR ARGUMENTS IN WITH CLAUSE IN SQL TABLE MACRO RAISES ORA-06553 PLS-306 ). This was exactly the problem that I had been getting (the error message was – wrong number or types of arguments in call to ‘GENERATE_DATES_MACRO’ and I hadn’t thought about searching for known bugs or patches, I just hacked my way around the problem.

Here’s an alternative macro function supplied by Alex (edited slightly to be consistent with the function and column names in my example):

create or replace function generate_dates_macro(
    p_from in date,
    p_to  in date
)
return varchar2
sql_macro
is
    v_sql varchar2(4000);
begin
  v_sql := 'select trunc (generate_dates_macro.p_from) - 1 + level as day
       from dual
       connect by level <= (generate_dates_macro.p_to - generate_dates_macro.p_from) + 1';

--  dbms_output.put_line(v_sql);
    return v_sql;

end generate_dates_macro;
/

Test Code

If you want to experiment further, here’s the code to create the tables used in this demo:

rem
rem     Script:         19c_macro_2.sql
rem     Author:         Jonathan Lewis / "BeefStu"
rem     Dated:          July 2021
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem
rem     Notes:
rem     A Macro solution to a problem that might
rem     otherwise be solved with a pipelined function
rem


drop table holidays;
drop table employees;
drop table timeoff;
drop table  emp_attendance;    
drop table absences;

drop function generate_dates_pipelined;
drop type nt_date;
drop type obj_date;

drop function generate_dates_macro;

-- @@setup

create table holidays(
        holiday_date    date,
        holiday_name    varchar2(20)
)
;

insert into holidays (holiday_date, holiday_name)
values ( to_date('2021/07/21 00:00:00', 'yyyy/mm/dd hh24:mi:ss'), 'July 21 2021') ;

create table employees(
        employee_id     number(6), 
        first_name      varchar2(20),
        last_name       varchar2(20),
        card_num        varchar2(10),
        work_days       varchar2(7)
)
;

alter table employees
        add constraint employees_pk primary key (employee_id)
;

insert into employees(employee_id, first_name, last_name, card_num, work_days)
with names as ( 
select 1, 'Jane', 'Doe', 'f123456', 'NYYYYYN' from dual 
union all 
select 2, 'Madison', 'Smith', 'r33432','NYYYYYN' from dual 
union all 
select 3, 'Justin', 'Case', 'c765341','NYYYYYN' from dual 
union all 
select 4, 'Mike', 'Jones', 'd564311','NYYYYYN' from dual 
) 
select * from names
;

create table timeoff(
        seq_num         integer generated by default as identity (start with 1) not null,
        employee_id     number(6),
        timeoff_date    date,
        timeoff_type    varchar2(1),
        constraint timeoff_chk check (timeoff_date = trunc(timeoff_date, 'dd')),
        constraint timeoff_pk primary key (employee_id, timeoff_date)
)
;

insert into timeoff (employee_id,timeoff_date,timeoff_type) 
with dts as ( 
select 1, to_date('20210726 00:00:00','yyyymmdd hh24:mi:ss'),'V'    from dual union all 
select 2, to_date('20210726 00:00:00','yyyymmdd hh24:mi:ss'),'V'    from dual union all 
select 2, to_date('20210727 00:00:00','yyyymmdd hh24:mi:ss'),'V'    from dual  
) 
select * from dts
;

create table  emp_attendance(    
        seq_num         integer  generated by default as identity (start with 1) not null,
        employee_id     number(6),
        start_date      date,
        end_date        date,
        week_number     number(2),
        create_date     date default sysdate
)
;

create table absences(
        seq_num         integer  generated by default as identity (start with 1) not null,
        employee_id     number(6),
        absent_date     date,
        constraint absence_chk check (absent_date=trunc(absent_date, 'dd')),
        constraint absence_pk primary key (employee_id, absent_date)
)
;

insert into emp_attendance (employee_id, start_date,end_date,week_number)
with dts as ( 
select 1, to_date('20210728 13:10:00','yyyymmdd hh24:mi:ss'), to_date('20210728 23:15:00','yyyymmdd hh24:mi:ss'), 30  from dual 
union all 
select 2, to_date('20210728 12:10:10','yyyymmdd hh24:mi:ss'), to_date('20210728 20:15:01','yyyymmdd hh24:mi:ss'), 30  from dual
)
select * from dts
;


Hex tip

Tue, 2021-07-20 11:40

A surprising amount of the work I do (or used to do) revolves around numbers; and once I’m outside the realm of the optimizer (i.e. getting away from simple arithmetic), one of the bits of playing with numbers that I do most often is conversion – usually decimal to hexadecimal, sometimes decimal to binary.

Here’s an example of how this helped me debug an Oracle error a few days ago. We start with someone trying to purge data from aud$ using the official dbms_audit_mgmt package, first setting the package’s db_delete_batch_size parameter to the value 100,000 then calling dbms_audit_mgmt.clean_audit_trail.

In theory this should have deleted (up to) 100,000 rows from aud$ starting from the oldest data. In practice it tried to delete far more rows, generating vast amounts of undo and redo, and locking up resources in the undo tablespace for ages. The SQL statement doing all the work looked like the following (after a little cosmetic work):

DELETE FROM SYS.AUD$ 
WHERE  DBID = 382813123 
AND    NTIMESTAMP# < to_timestamp('2020-12-17 00:00:00', 'YYYY-MM-DD HH24:MI:SS.FF')
AND    ROWNUM <= 140724603553440

That’s a rather large number in the rownum predicate, much larger than the expected 100,000. Whenever I am puzzled by very large numbers in places I’m not expecting to see them one of the first things I do to poke it around is to convert it to hexadecimal. (Although it seems a fairly random thing to do it doesn’t take very long and it produces an interesting result fairly frequently.)

140724603553440 (dec) = 0x7FFD000186A0

You may not think that the resulting hex number is very interesting – but there’s a string of zeros in the middle that is asking for a little extra poking. So let’s convert the last 8 digit (starting with those 3 zeros) back to decimal.

0x000186A0 = 100,000 (dec)

There’s an interesting coincidence – we’ve got back to the 100,000 that the OP had set as the db_delete_batch_size. Is this really a coincidence or does it tell us something about a bug? That’s easy enough to test, just try setting a couple of different values for the parameter and see if this affects the rownum predicate in a consistent fashion. Here are the results from two more test values:

1,000,000 ==> 140733194388032 (dec) = 0x7FFF000F4240 .... 0x000F4240 = 1,000,000 (dec)
   50,000 ==> 140728898470736 (dee) = 0x7FFE0000C350 .... 0x0000C350 =    50,000 (dec)

The top 4 digits (2 bytes) have changed, but the bottom 8 digits (4 bytes) do seem to hold the db_delete_batch_size requested. At this point I felt that we were probably seeing some sort of pointer error in a C library routine. If you examine the file $ORACLE_HOME/rdbms/admin/prvtamgt.plb) you’ll find that one of the few readable lines says:

CREATE OR REPLACE LIBRARY audsys.dbms_audit_mgmt_lib wrapped

My guess was that there were probably a couple of external C routines involved, with PL/SQL wrappers in the public package; and that there was a mismatch between the declarations in C and the declarations in the PL/SQL.

It turns out that I wasn’t quite right, but I was in the right olympic stadium. This is now (unpublished) bug 33136016, and if you’ve been seeing unexpected work patterns when purging the audit trail after upgrading to 19c or later then there may be a patch for you in the not too distant future.

Quiz Night

Sun, 2021-07-11 17:41

How do you explain the apparent inconsistency between the two outputs from this tiny fragment of an SQL*plus script (last tested 19.11.0.0):

describe t1
create table t2 as select * from t1;
describe t2

The results of the two describe commands are as follows (cut-n-paste, with no editing, including the feedback from the CTAS):

 Name                          Null?    Type
 ----------------------------- -------- --------------------
 N1                            NOT NULL NUMBER
 N2                            NOT NULL NUMBER
 V1                                     VARCHAR2(10)
 PADDING                                VARCHAR2(100)


Table created.

 Name                          Null?    Type
 ----------------------------- -------- --------------------
 N1                                     NUMBER
 N2                            NOT NULL NUMBER
 V1                                     VARCHAR2(10)
 PADDING                                VARCHAR2(100)

Answer, and comments on why it’s worth knowing, some time tomorrow (Monday)

.

19c tweak 2

Fri, 2021-07-09 10:40

Trying to find out why a plan had changed in the upgrade from 11g to 19c I came across this cunning little tweak that must have appeared in the 19c timeline. I’ll start with a simple query, then the execution plans (autotrace traceonly) from 19.11.0.0 – first with the parameter optimizer_features_enable set to 18.1.0, then with the it set to 19.1.0. The table t1 is a copy of the first 10,000 rows of view all_objects:

SQL> alter session set optimizer_features_enable = '18.1.0';
SQL> select count(data_object_id) from t1 where f1(object_id) = 'Y';

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     7 |    38  (37)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     7 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   100 |   700 |    38  (37)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("F1"("OBJECT_ID")='Y')



SQL> alter session set optimizer_features_enable = '19.1.0';
SQL> select count(data_object_id) from t1 where f1(object_id) = 'Y';

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     7 |    26   (8)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     7 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     5 |    35 |    26   (8)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("DATA_OBJECT_ID" IS NOT NULL AND "F1"("OBJECT_ID")='Y')

Optimising with the optimizer features set back to 18.1 the cardinality estimate is 100 (that’s 1% of the rows in the table, the standard guess for “function() = constant”) with a cost of 38, or which 37% is CPU cost.

Running with the optimizer features of 19c enabled the cardinality estimate drops to 5 and the cost drops to 26 with CPU making up 5% of the cost. Where does the difference come from?

As ever you have to look at the Predicate Information. Running as 18c Oracle has decided to call my function for every row in the table; running as 19c Oracle has decided that since I’m counting non-null entries of column data_object_id it need only call the function when data_object_id is not null, so it’s introduced an extra predicate to make that happen, and that extra predicate has reduced the cardinality and cost estimates. (In my sample data set there are 9,456 nulls and 544 distinct values for data_object_id – so the difference in workload is significant. And 1% of 544 is 5, which explains the cardinality estimate.)

This looks like fix control 24761824 “add is not null for high null column in set function” introduced in 19.1.0. The description suggests that the feature will only be used in cases where the column is “often” null, but we have no clue, yet, about what “often” means.

This means that there may be cases where an execution plan changes on an upgrade to 19c because a tablescan has become cheaper or a cardinality estimate has been reduced.

Just as a confirmation of how the change in plan is echoing reality, here are the execution plans pulled from memory after executing them with the statistics_level set to all to enable collection of the rowsource execution statistics. First the 18c plan, then the 19c plan:

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:11.14 |    1780K|
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:11.14 |    1780K|
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    100 |  10000 |00:00:11.14 |    1780K|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("F1"("OBJECT_ID")='Y')



-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.46 |   97010 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.46 |   97010 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |      5 |    544 |00:00:00.46 |   97010 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("DATA_OBJECT_ID" IS NOT NULL AND "F1"("OBJECT_ID")='Y'))

As you can see, the buffer gets has dropped from 1,780K in 18c to 97K in 19c (mainly because the function results in a tablescan of a table of 178 blocks and the number of calls has dropped from 10,000 to 544), and the run time has dropped from 11.14 seconds to 0.46 seconds.

Code

If you want to run and refine this test, here’s the code I used to generate the data.

rem
rem     Script:         19c_not_null_tweak.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2021
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem

create table t1 as select * from all_objects where rownum <= 10000;
create table t2 as select * from t1;

create or replace function f1(i_obj in number) return varchar2
is
        n1 number;
begin
        select count(*) into n1 from t2 where object_id = i_obj;

        if n1 = 0 then
                return 'N';
        else
                return 'Y';
        end if;
end;
/

set autotrace traceonly explain

alter session set optimizer_features_enable = '18.1.0';
select count(data_object_id) from t1 where f1(object_id) = 'Y';

alter session set optimizer_features_enable = '19.1.0';
select count(data_object_id) from t1 where f1(object_id) = 'Y';

set autotrace off

set serveroutput off
alter session set statistics_level = all;

alter session set optimizer_features_enable = '18.1.0';
select count(data_object_id) from t1 where f1(object_id) = 'Y';
select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

alter session set optimizer_features_enable = '19.1.0';
select count(data_object_id) from t1 where f1(object_id) = 'Y';
select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

alter session set statistics_level = typical;
set serveroutput on

Fussy FBIs

Mon, 2021-07-05 05:19

In a recent thread on the Oracle Developer Forum a user was seeing a significant increase in time spent waiting for row locks after the number of executions of a particular “select for update” had increased from a couple of hundred per hour to a thousand per hour.

It turned out that the locking was a deliberate queueing mechanism following the basic pattern:

lock a row in the "locks" table

do some work in "another table" to flag some rows (perhaps to "own" them).

commit;

The intent was to ensure that processes did not collide (and possibly deadlock) while working on “another table”. It turned out that the increased wait time was due to an increase in the time spent between the lock and the commit; and the reason for that increase was simply a change in the execution path of a key statement executed between the two steps. The core of the work was simply the execution of one or both of two statements:

UPDATE TRAN_TAB 
SET 
        PID     = :B3,
        LOCK_ID = :B2,
        STATUS  = 'I' 
WHERE
        PID    IS NULL 
AND     STATUS = 'W' 
AND     ROWNUM <= :B1
;

UPDATE TRAN_TAB 
SET 
        PID     = :B3,
        LOCK_ID = :B2,
        STATUS  = 'I' 
WHERE
        PID    IS NULL 
AND     STATUS = 'T' 
AND     ROWNUM <= :B1
;

Originally the query had been using an index range scan on an index defined as (status, id, lock_id) but it had switched to using a tablescan because the estimated cardinality had changed from 18 rows to 3.5 million rows.

When you notice that the leading column of the index is called status you might guess (correctly) that there are just a few distinct values for the status, and just a few rows each for values ‘T’ and ‘W’ and that something unexpected had happened during statistics collection that had made Oracle “lose” sight of the special cases and treat ‘T’ (or ‘W’) as an “average” case either using “total rows / num_distinct” or “half the least popular” to estimate the cardinality. [At the time of writing it looks as if the problem appears as a side effect of “real-time statistics”.]

One fix, of course, would be to ensure that the statistics for this column never ever went wrong – and there are various ways of doing that, some more complicated and fragile than others (it’s a partitioned table and needs a suitable frequency histogram in place to get good estimates – the combination isn’t nice). Another strategy would simply be to hint the code (or add an sql_plan_baseline or sql_patch) to use the relevant index.

The nicest strategy (especially given the update to two columns out of the three in the index) might be to take advantage of function-based indexes – creating an index that would be impossible for the optimizer to avoid for these queries, that is as small and efficient as possible, and is highly unlikely to be used in the wrong circumstances. For example, a “two-index” solution:

create index tt_ft on tran_tab(
        case when status = 'T' and pid is null then 0 end
);

create index tt_fw on tran_tab(
        case when status = 'W' and pid is null then 0 end
);

or “single-index” solution:

create index tt_ftw on tran_tab(
        case when status in ('W','T') and pid is null then status end
);

The indexes hold entries only for the very small number of interesting rows, and when the status is updated the entries disappear from the index (rather than being deleted from, and re-inserted to, a very large index). Given the number of partitions in the table (ca. 100) and the very small number of rows involved, and the time-critical nature of the requirement, there’s a good case for making this a global index to avoid the need for doing lots of index probes that will find no data.

The next critical issue is that the code has to be modified to use the index – and the code has to be very precisly written. Here, from a simple model (see footnote), are a couple of examples followed by their (actual) execution plans:

select  lock_id 
from    tran_tab 
where   case when status = 'T' and pid is null then 0 end = 0 
and     rownum <= 5;

select * from table(dbms_xplan.display_cursor);


-------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |       |       |     6 (100)|          |
|*  1 |  COUNT STOPKEY                       |          |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| TRAN_TAB |     5 |    30 |     6   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | TT_FT    |    10 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=5)
   3 - access("TRAN_TAB"."SYS_NC00006$"=0)


select  lock_id 
from    tran_tab 
where   case when status in ('W','T') and pid is null then status end = 'W'
;

select * from table(dbms_xplan.display_cursor);


------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |       |       |    11 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TRAN_TAB |    10 |    60 |    11   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | TT_FTW   |    10 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("TRAN_TAB"."SYS_NC00008$"='W')

Be Careful

The title of this piece is “Fussy FBI” – and the reason for writing it is a reminder that it’s nicer to create and index virtual columns rather than creating function-based indexes. And, if you’re on any recent version of Oracle (12c onwards) it’s a good idea to make the virtual columns invisible so that lazy code (select *, or insert without a specified list of columns, or pl/sql “insert row”) doesn’t result in an error due to the virtual column.

Take the two where clauses I’ve used above and change them slightly – in one case swapping the order of predicates, in the other swapping the order of the IN lists – and the execution paths change from index ranges scans to tablescans.

where   case when status = 'T' and pid is null then 0 end = 0 -- index range scan
where   case when pid is null and status = 'T' then 0 end = 0 -- tablescan


where   case when status in ('W','T') and pid is null then status end = 'W' -- index range scan
where   case when status in ('T','W') and pid is null then status end = 'W' -- tablescan

When you create the function-based index Oracle may rewrite the definition into a “normalised” form – for example when I query user_ind_expressions for my tt_ftw index it turns out that the stored definition is:

CASE  WHEN (("STATUS"='W' OR "STATUS"='T') AND "PID" IS NULL) THEN "STATUS" END

But when you write a query that looks as if it should match the predicate that’s visible in user_ind_expressions the optimizer won’t necessarily notice the match.

Summary

When you create a function-based index the expression you use in your queries must be a very good match for the expression that you used when creating the index. This is just one reason why it may be better to create a virtual column using the expression – then no-one has to remember exactly what the expression was in their queries.

Defining the virtual column as invisible is then a sensible strategy to avoid problems due to code that doesn’t specify explicit column names in all the cases where they should appear.

Footnote

The following script will create the table and indexes used in this note:

rem
rem     Script:         fussy_fbi.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2021
rem
rem     Last tested 
rem             19.3.0.0
rem
rem     Notes:
rem     You have to be careful with FBI definitions and usage.
rem     the match has to be very good.
rem

create table tran_tab (
        pid             number,
        id              number,
        lock_id         number,
        status          varchar2(1),
        padding         varchar2(100)
);

insert into tran_tab
select
        case when mod(rownum,10) = 0 then to_number(null) else rownum end,
        rownum,
        rownum,
        chr(65 + 8 * mod(rownum,4)),
        rpad('x',100)
from
        all_objects
where
        rownum <= 1e4
;

update tran_tab set status = 'T' where mod(lock_id,1000) = 0;
update tran_tab set status = 'W' where mod(lock_id, 990) = 0;

create index tt_ft on tran_tab(
        case when status = 'T' and pid is null then 0 end
);

create index tt_fw on tran_tab(
        case when status = 'W' and pid is null then 0 end
);

create index tt_ftw on tran_tab(
        case when status in ('W','T') and pid is null then status end
);

commit;

execute dbms_stats.gather_table_stats(user,'tran_tab')

set serveroutput off

prompt  ===========
prompt  Correct use
prompt  ===========

select  lock_id 
from    tran_tab 
where   case when status = 'T' and pid is null then 0 end = 0 
and     rownum <= 5;

select * from table(dbms_xplan.display_cursor);

select  lock_id 
from    tran_tab 
where   case when status in ('W','T') and pid is null then status end = 'W'
;

select * from table(dbms_xplan.display_cursor);

prompt  ==========
prompt  Failed use
prompt  ==========

select  lock_id 
from    tran_tab 
where   case when pid is null and status = 'T' then 0 end = 0 
and     rownum <= 5;

select * from table(dbms_xplan.display_cursor);

select  lock_id 
from    tran_tab 
where   case when status in ('T','W') and pid is null then status end = 'W'
;

select * from table(dbms_xplan.display_cursor);

Pages