2014년 7월 10일 목요일

Replacement SORT operation with index

<< 인덱스를 이용한 소트 연산 대체 >>


sort merge join에서 outer 테이블 조인 컬럼에 인덱스가 있을 때 sort join 오퍼레이션 생략
서브쿼리에 사용된 테이블이 Unique 인덱스를 갖는다면 Unnesting 되었을 때,
sort unique 오퍼레이션이 생략
PK 컬럼에 대한 distinct 연산시 sort unique 오퍼레이션 생략




select distinct empno from emp;


Execution Plan
----------------------------------------------------------
Plan hash value: 179099197

---------------------------------------------------------------------------
| Id  | Operation        | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |    16 |    64 |     1   (0)| 00:00:01 |
|   1 |  INDEX FULL SCAN | PK_EMP |    16 |    64 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        649  bytes sent via SQL*Net to client
        519  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed


- 알티베이스의 경우

------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 1, TUPLE_SIZE: 4, COST: 0.00 )
 GROUPING
  SCAN ( TABLE: EMPLOYEES, INDEX: __SYS_IDX_ID_646, FULL SCAN, ACCESS: 20, COST: 0.00 )
------------------------------------------------------------

select eno from employees group by eno;

------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 1, TUPLE_SIZE: 4, COST: 0.00 )
 GROUPING
  SCAN ( TABLE: EMPLOYEES, INDEX: __SYS_IDX_ID_646, FULL SCAN, ACCESS: 20, COST: 0.00 )
------------------------------------------------------------

distinct 연산 사용시 인덱스를 Full scan하지만, grouping 연산을 추가적으로 수행함.
두개 쿼리결과와 실행계획이 동일함.





1) sort order by 대체

[region + custid] 순으로 구성된 인덱스 사용하여 sort order by 연산 대체 가능

select custid, name, resno, status, tell
from customer
where region='A'
order by custid

--------------------------------------------------------------------
|Id| Operation | Name | Rows|Bytes|Cost(%CPU)|
--------------------------------------------------------------------
| 0|SELECT STATEMENT | |40000|3515K| 1372 (1)|
| 1|TABLE ACCESS BY INDEX ROWID|CUSTOMER |40000|3515K| 1372 (1)|
| 2|INDEX RANGE SCAN |CUSTOMER_X02|40000| | 258 (1)|
--------------------------------------------------------------------

sort해야 할 대상 레코드가 무수히 많고 그 중 일부만 읽고 멈출 수 있을 때만 유용

-알티베이스의 경우




iSQL> desc customers;
[ TABLESPACE : SYS_TBS_MEM_DATA ]
[ ATTRIBUTE ]                                                       
------------------------------------------------------------------------------
NAME                                     TYPE                        IS NULL
------------------------------------------------------------------------------
CNO                                      BIGINT          FIXED       NOT NULL
C_LASTNAME                               CHAR(20)        FIXED       NOT NULL
C_FIRSTNAME                              CHAR(20)        FIXED       NOT NULL
CUS_JOB                                  VARCHAR(20)     FIXED     
CUS_TEL                                  CHAR(15)        FIXED       NOT NULL
SEX                                      CHAR(1)         FIXED     
BIRTH                                    CHAR(6)         FIXED     
POSTAL_CD                                VARCHAR(9)      FIXED     
ADDRESS                                  VARCHAR(60)     FIXED     
[ INDEX ]                                                     
------------------------------------------------------------------------------
NAME                                     TYPE     IS UNIQUE     COLUMN
------------------------------------------------------------------------------
__SYS_IDX_ID_647                         BTREE    UNIQUE        CNO ASC
CUSTOMERS_IDX_01                         BTREE                  POSTAL_CD ASC,
                                                                CNO ASC
[ PRIMARY KEY ]                                               
------------------------------------------------------------------------------
CNO

select cno, C_LASTNAME,CUS_TEL
from customers
where POSTAL_CD='20037'
order by cno

------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 3, TUPLE_SIZE: 47, COST: 0.00 )
 SORT ( ITEM_SIZE: 16, ITEM_COUNT: 1, ACCESS: 1, COST: 0.00 )
  SCAN ( TABLE: CUSTOMERS, INDEX: CUSTOMERS_IDX_01, RANGE SCAN, ACCESS: 1, COST: 0.00 )
------------------------------------------------------------



2) sort group by 대체

customer 테이블에서 region이 선두 컬럼인 결합 인덱스나 단일 컬럼 인덱스를 사용할 경우
아래 쿼리에서 sort group by 연산 대체 가능


select region, avg(age), count(*)
from customer
group by region


---------------------------------------------------------------------------
| Id | Operation | Name |Rows |Bytes |Cost (%CPU)|
---------------------------------------------------------------------------
| 0 |SELECT STATEMENT | | 25 | 725 |30142 (1)|
| 1 |SORT GROUP BY NOSORT | | 25 | 725 |30142 (1)|
| 2 |TABLE ACCESS BY INDEX ROWID|CUSTOMER |1000K| 27M |30142 (1)|
| 3 |INDEX FULL SCAN |CUSTOMER_X01|1000K| |2337 (2)|
---------------------------------------------------------------------------




- 알티베이스의 경우

iSQL> select cus_job,count(*) from customers
    2 group by cus_job;
CUS_JOB               COUNT              
----------------------------------------------
PD                    1                  
WEBPD                 1                  
banker                2                  
designer              2                  
doctor                1                  
engineer              4                  
manager               2                  
planner               2                  
webmaster             2                  
                      3                  
10 rows selected.
------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 2, TUPLE_SIZE: 32, COST: 0.00 )
 AGGREGATION ( ITEM_SIZE: 16, GROUP_COUNT: 10, COST: 0.00 )
  GROUPING
   SCAN ( TABLE: CUSTOMERS, INDEX: CUSTOMERS_IDX_02, FULL SCAN, ACCESS: 20, COST: 0.00 )
------------------------------------------------------------




# 인덱스가 sort 연산을 대체하지 못하는 경우


SQL> create index sal_idx on emp(sal);

Index created.

select * from emp order by sal;



Optimizer mode: ALL_ROWS
Parsing user id: 84

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  SORT ORDER BY (cr=7 pr=0 pw=0 time=0 us cost=4 size=544 card=16)
     10   TABLE ACCESS FULL EMP (cr=7 pr=0 pw=0 time=360 us cost=3 size=544 card=16)




select /*+ first_rows */ * from emp order by sal;

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0          7          0          10
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0          7          0          10

Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 84

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  SORT ORDER BY (cr=7 pr=0 pw=0 time=0 us cost=4 size=544 card=16)
     10   TABLE ACCESS FULL EMP (cr=7 pr=0 pw=0 time=351 us cost=3 size=544 card=16)


sal 컬럼에 인덱스가 있는데도 sort order by 오퍼레이션을 수행한다.


SQL> desc emp;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 EMPNO                                     NOT NULL NUMBER(4)
 ENAME                                              VARCHAR2(10)
 JOB                                                VARCHAR2(9)
 MGR                                                NUMBER(4)
 HIREDATE                                           DATE
 SAL                                                NUMBER(7,2)
 COMM                                               NUMBER(7,2)
 DEPTNO                                             NUMBER(2)


sal 컬럼에 not null 제약 조건이 없다.
단일 컬럼 인덱스일 때 값이 null이면 인덱스 레코드에 포함되지 않는다.
따라서 인덱스를 이용해 sort 연산을 대체한다면, 결과 오류가 발생할 수 있어
옵티마이저는 table full scan을 수행한 것이다.



SQL> alter table emp modify sal number(7,2) constraint emp_sal_nn not null;
alter table emp modify sal number(7,2) constraint emp_sal_nn not null
                                                  *
ERROR at line 1:
ORA-02296: cannot enable (SCOTT.EMP_SAL_NN) - null values found


SQL> set linesize 1000
SQL> select * from emp;

     EMPNO ENAME      JOB              MGR HIREDATE               SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------------- ---------- ---------- ----------
      8000 SMITH      SALESMAN
      7369 SMITH      CLERK           7902 17-DEC-80             2800                    20
      7566 JONES      MANAGER         7839 02-APR-81             2975                    20
      7782 CLARK      MANAGER         7839 09-JUN-81             2450                    10
      7788 SCOTT      ANALYST         7566 19-APR-87             3000                    20
      7839 KING       PRESIDENT            17-NOV-81             5000                    10
      7876 ADAMS      CLERK           7788 23-MAY-87             1100                    20
      7902 FORD       ANALYST         7566 03-DEC-81             3000                    20
      7934 MILLER     CLERK           7782 23-JAN-82             1300                    10
        10            SALESMAN

10 rows selected.

SQL> update emp set sal=3400 where empno=8000;

1 row updated.

SQL>
SQL> commit;

Commit complete.

SQL> update emp set sal=3400 where empno=10;

1 row updated.

SQL> commit;

Commit complete.


Optimizer mode: FIRST_ROWS
Parsing user id: 84

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  TABLE ACCESS BY INDEX ROWID EMP (cr=6 pr=0 pw=0 time=0 us cost=2 size=544 card=16)
     10   INDEX FULL SCAN SAL_IDX (cr=2 pr=0 pw=0 time=45 us cost=1 size=0 card=16)(object id 124306)


- 알티베이스의 경우


iSQL> create index emp_idx2 on employees(salary);
Create success.
iSQL> select * from employees order by salary desc;
ENO         E_LASTNAME            E_FIRSTNAME           EMP_JOB          EMP_TEL          DNO         SALARY      SEX  BIRTH   JOIN_DATE    STATUS
----------------------------------------------------------------------------------------------------------------------------------------------------------------
20          Blake                 William               sales rep        01154112366      4002                    M          18-NOV-2006  H  
8           Wang                  Xiong                 manager          0178829663       4001                    M  810726  29-NOV-2009  H  
1           Moon                  Chan-seung            CEO              01195662365      3002                    M                       R  
10          Bae                   Elizabeth             programmer       0167452000       1003        4000        F  710213  05-JAN-2010  H
11          Liu                   Zhen                  webmaster        0114553206       1003        2750        M          28-APR-2011  H
5           Ghorbani              Farhad                PL               01145582310      3002        2500        M          20-DEC-2009  H
16          Chen                  Wei-Wei               manager          0195562100       1001        2300        F  780509               H
14          Miura                 Yuu                   PM               0197664120       1003        2003        M                       H
3           Kobain                Ken                   engineer         0162581369       1001        2000        M  650226  11-JAN-2010  H
18          Huxley                John                  planner          01755231044      4001        1900        M          30-OCT-2007  H
12          Hammond               Sandra                sales rep        0174562330       4002        1890        F  810211  14-DEC-2009  H
19          Marquez               Alvar                 sales rep        0185698550       4002        1800        M          18-NOV-2010  H
4           Foster                Aaron                 PL               0182563984       3001        1800        M  820730               H
6           Momoi                 Ryu                   programmer       0197853222       1002        1700        M  790822  09-SEP-2010  H
2           Davenport             Susan                 designer         0113654540                   1500        F  721219  18-NOV-2009  H
17          Fubuki                Takahiro              PM               0165293886       2001        1400        M  781026  07-MAY-2010  H
9           Diaz                  Curtis                planner          0165293668       4001        1200        M  660102  14-JUN-2010  H
15          Davenport             Jason                 webmaster        0119556884       1003        1000        M  901212               H
13          Jones                 Mitch                 PM               0187636550       1002        980         M  801102               H
7           Fleischer             Gottlieb              manager          0175221002       4002        500         M  840417  24-JAN-2004  H
20 rows selected.

------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 11, TUPLE_SIZE: 119, COST: 0.00 )
 SCAN ( TABLE: EMPLOYEES, INDEX: EMP_IDX2, FULL SCAN DESC, ACCESS: 20, COST: 0.00 )
------------------------------------------------------------

알티베이스는 NULL 값을 인덱스에 포함시킨다. 게다가 NULL값을 가진 레코드를 인덱스 맨뒤쪽에 위치시킨다는 것을 알 수 있다.

iSQL> select * from employees where salary is null;
ENO         E_LASTNAME            E_FIRSTNAME           EMP_JOB          EMP_TEL          DNO         SALARY      SEX  BIRTH   JOIN_DATE    STATUS
----------------------------------------------------------------------------------------------------------------------------------------------------------------
1           Moon                  Chan-seung            CEO              01195662365      3002                    M                       R
8           Wang                  Xiong                 manager          0178829663       4001                    M  810726  29-NOV-2009  H
20          Blake                 William               sales rep        01154112366      4002                    M          18-NOV-2006  H
3 rows selected.
------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 11, TUPLE_SIZE: 119, COST: 0.00 )
 SCAN ( TABLE: EMPLOYEES, INDEX: EMP_IDX2, RANGE SCAN, ACCESS: 3, COST: 0.00 )
------------------------------------------------------------









인덱스를 사용하지 못하는 두번째 사례

단일 컬럼 index일때 null값을 저장하지 않지만, 결합 인덱스일 때는 null값을 가진 레코드를
맨 뒤쪽에 저장함. null값부터 출력하라고 할 때는 인덱스를 이용하더라도 sort가 불가피함.

SQL> create index emp_deptno_ename on emp(deptno,ename);

Index created.

set autotrace traceonly exp;
select /*+ index(e emp_deptno_ename ) */ *
from emp e
where deptno=30
order by ename;




------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |     6 |   204 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP              |     6 |   204 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | EMP_DEPTNO_ENAME |     6 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DEPTNO"=30)


-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |     6 |   204 |     4  (25)| 00:00:01 |
|   1 |  SORT ORDER BY               |                  |     6 |   204 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| EMP              |     6 |   204 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | EMP_DEPTNO_ENAME |     6 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("DEPTNO"=30)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          1  consistent gets
          0  physical reads
          0  redo size
        799  bytes sent via SQL*Net to client
        508  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          0  rows processed







3) 인덱스를 활용한 Min, Max 구하기


주문 테이블에서 일자별 주문번호를 관리
[주문일자 + 주문번호]로 인덱스 구성되어 있음.

select nvl(max(주문번호), 0) + 1
from 주문
where 주문일자 = :주문일자

Execution Plan
-------------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS
1 0 SORT (AGGREGATE)
2 1 FIRST ROW
3 2 INDEX (RANGE SCAN (MIN/MAX)) OF '주문_PK' (INDEX (UNIQUE))




- max 함수 내 인덱스 컬럼에 상수 값을 더할 때 결과가 달라질 가능성은 없지만,
  인덱스 사용을 거부한다.

select nvl(max(주문번호 + 1), 1)
from 주문
where 주문일자 = :주문일자

Execution Plan
-------------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF '주문_PK' (INDEX (UNIQUE))


select nvl(max(salary),0) + 1
from employees;

NVL(MAX(SALARY),0) + 1
-------------------------
4001                 
1 row selected.
------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 1, TUPLE_SIZE: 8, COST: 0.00 )
 SCAN ( TABLE: EMPLOYEES, INDEX: EMP_IDX2, RANGE SCAN DESC, ACCESS: 1, COST: 0.00 )
------------------------------------------------------------

select nvl(max(salary+1),0) + 1
from employees;
------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 1, TUPLE_SIZE: 8, COST: 0.00 )
 GROUP-AGGREGATION ( ITEM_SIZE: 24, GROUP_COUNT: 1, BUCKET_COUNT: 1, ACCESS: 1, COST: 0.00 )
  SCAN ( TABLE: EMPLOYEES, FULL SCAN, ACCESS: 20, COST: 0.00 )
------------------------------------------------------------

EMP_IDX2 인덱스를 DNO+SALARY로 다시 구성함.
iSQL> desc employees;
[ TABLESPACE : SYS_TBS_MEM_DATA ]
[ ATTRIBUTE ]                                                       
------------------------------------------------------------------------------
NAME                                     TYPE                        IS NULL
------------------------------------------------------------------------------
ENO                                      INTEGER         FIXED       NOT NULL
E_LASTNAME                               CHAR(20)        FIXED       NOT NULL
E_FIRSTNAME                              CHAR(20)        FIXED       NOT NULL
EMP_JOB                                  VARCHAR(15)     FIXED     
EMP_TEL                                  CHAR(15)        FIXED     
DNO                                      SMALLINT        FIXED     
SALARY                                   NUMERIC(10, 2)  FIXED     
SEX                                      CHAR(1)         FIXED     
BIRTH                                    CHAR(6)         FIXED     
JOIN_DATE                                DATE            FIXED     
STATUS                                   CHAR(1)         FIXED     
[ INDEX ]                                                     
------------------------------------------------------------------------------
NAME                                     TYPE     IS UNIQUE     COLUMN
------------------------------------------------------------------------------
__SYS_IDX_ID_646                         BTREE    UNIQUE        ENO ASC
EMP_IDX1                                 BTREE                  DNO ASC
EMP_IDX2                                 BTREE                  DNO ASC,
                                                                SALARY ASC

EMP_IDX3                                 BTREE                  JOIN_DATE ASC,
                                                                SALARY ASC
[ PRIMARY KEY ]                                               
------------------------------------------------------------------------------

iSQL> select nvl(max(salary),0) + 1
    2 from employees
    3 where dno=4002;
NVL(MAX(SALARY),0) + 1
-------------------------
1891                 
1 row selected.
------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 1, TUPLE_SIZE: 8, COST: 0.00 )
 GROUP-AGGREGATION ( ITEM_SIZE: 32, GROUP_COUNT: 1, BUCKET_COUNT: 1, ACCESS: 1, COST: 0.00 )
  SCAN ( TABLE: EMPLOYEES, INDEX: EMP_IDX2, RANGE SCAN, ACCESS: 4, COST: 0.00 )
------------------------------------------------------------

iSQL> select nvl(max(salary+1),0) + 1
    2 from employees
    3 where dno=4002;
NVL(MAX(SALARY+1),0) + 1
---------------------------
1892                 
1 row selected.
------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 1, TUPLE_SIZE: 8, COST: 0.00 )
 GROUP-AGGREGATION ( ITEM_SIZE: 24, GROUP_COUNT: 1, BUCKET_COUNT: 1, ACCESS: 1, COST: 0.00 )
  SCAN ( TABLE: EMPLOYEES, INDEX: EMP_IDX2, RANGE SCAN, ACCESS: 4, COST: 0.00 )
------------------------------------------------------------




<< Sort Area를 적게 사용하도록 SQL 작성 >>


1) sort를 완료하고 나서 데이터 가공하기

select lpad(상품번호, 30) || lpad(상품명, 30) || lpad(고객ID, 10)
|| lpad(고객명, 20) || to_char(주문일시, 'yyyymmdd hh24:mi:ss')
from 주문상품
where 주문일시 between :start and :end
order by 상품번호



select lpad(상품번호, 30) || lpad(상품명, 30) || lpad(고객ID, 10)
|| lpad(상품명, 20) || to_char(주문일시, 'yyyymmdd hh24:mi:ss')
from (
select 상품번호, 상품명, 고객ID, 고객명, 주문일시
from 주문상품
where 주문일시 between :start and :end
order by 상품번호
)




2)  Top-N 쿼리


select * from (
select 거래일시, 체결건수, 체결수량, 거래대금
from 시간별종목거래
where 종목코드 = 'KR123456'
and 거래일시 >= '20080304'
order by 거래일시
)
where rownum = 10

[종목코드 + 거래일시] 순으로 인덱스 구성시 Sort order by 연산 대체함.


Execution Plan
-------------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS
1 0 COUNT (STOPKEY)
2 1 VIEW
3 2 TABLE ACCESS (BY INDEX ROWID) OF '시간별종목거래' (TABLE)
4 3 INDEX (RANGE SCAN) OF ' 시간별종목거래_PK' (INDEX (UNIQUE))

- 알티베이스의 경우


rownum 사용할 경우
iSQL> select * from employees where salary > 1 and rownum <= 3;
ENO         E_LASTNAME            E_FIRSTNAME           EMP_JOB          EMP_TEL          DNO         SALARY      SEX  BIRTH   JOIN_DATE    STATUS
----------------------------------------------------------------------------------------------------------------------------------------------------------------
7           Fleischer             Gottlieb              manager          0175221002       4002        500         M  840417  24-JAN-2004  H
13          Jones                 Mitch                 PM               0187636550       1002        980         M  801102               H
15          Davenport             Jason                 webmaster        0119556884       1003        1000        M  901212               H
3 rows selected.
------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 11, TUPLE_SIZE: 119, COST: 0.00 )
 COUNTER STOPKEY
  SCAN ( TABLE: EMPLOYEES, INDEX: EMP_IDX2, RANGE SCAN, ACCESS: 3, COST: 0.00 )
------------------------------------------------------------


limit 사용할 경우
iSQL> select * from employees where salary > 1 limit 3;
ENO         E_LASTNAME            E_FIRSTNAME           EMP_JOB          EMP_TEL          DNO         SALARY      SEX  BIRTH   JOIN_DATE    STATUS
----------------------------------------------------------------------------------------------------------------------------------------------------------------
7           Fleischer             Gottlieb              manager          0175221002       4002        500         M  840417  24-JAN-2004  H
13          Jones                 Mitch                 PM               0187636550       1002        980         M  801102               H
15          Davenport             Jason                 webmaster        0119556884       1003        1000        M  901212               H
3 rows selected.
------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 11, TUPLE_SIZE: 119, COST: 0.00 )
 SCAN ( TABLE: EMPLOYEES, INDEX: EMP_IDX2, RANGE SCAN, ACCESS: 3, COST: 0.00 )
------------------------------------------------------------




# TOP-N 쿼리의 sort 부하 경감 원리

[종목코드 + 거래일시] 순으로 구성된 인덱스가 없을 경우

rownum >= 10일 때, 우선 10개 레코드를 담을 배열을 할당하고, 처음 읽은 10개 레코드를 정렬된 상태로 담는다.

이후 읽는 레코드에 대해서 맨 우측에 있는 값과 비교해서 그보다 작은 값이 나타날때만 배열내에서 다시 정렬 시도.

물론 맨 우측에 있던 값은 버림.






# 효과 측정 : Top-N 쿼리가 작동할 때


create table t as select * from all_objects;

alter session set workarea_size_policy = manual;

alter session set sort_area_size = 524288;

set autotrace traceonly statistics

select count(*) from t;

select *
from (
  select * from t
  order by object_name
)
where rownum <= 10 ;









SQL> alter session set sql_trace=true;

Session altered.

SQL> select count(*) from t;

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.07          0          1          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0       1106          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.01       0.08          0       1107          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 84

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=1106 pr=0 pw=0 time=0 us)
  77606   TABLE ACCESS FULL T (cr=1106 pr=0 pw=0 time=23396 us cost=308 size=0 card=84006)


Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
       1179  consistent gets
          0  physical reads
          0  redo size
        528  bytes sent via SQL*Net to client
        519  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select *
  2  from (
  3    select * from t
  4    order by object_name
  5  )
  6  where rownum <= 10 ;

10 rows selected.


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          2          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.05       0.05          0       1106          0          10
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.05       0.05          0       1108          0          10

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 84

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  COUNT STOPKEY (cr=1106 pr=0 pw=0 time=0 us)
     10   VIEW  (cr=1106 pr=0 pw=0 time=0 us cost=4259 size=13272948 card=84006)
     10    SORT ORDER BY STOPKEY (cr=1106 pr=0 pw=0 time=0 us cost=4259 size=13272948 card=84006)
  77606     TABLE ACCESS FULL T (cr=1106 pr=0 pw=0 time=45898 us cost=308 size=13272948 card=84006)


Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
       1252  consistent gets
          0  physical reads
          0  redo size
       2259  bytes sent via SQL*Net to client
        519  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL> alter session set sql_trace=false;

Session altered.


- 알티베이스의 경우
create table t as select * from system_.sys_tables_;c

select *
from (
  select * from t
  order by TABLE_NAME
)
where rownum <= 10 ;

------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 23, TUPLE_SIZE: 172, COST: 0.01 )
 COUNTER STOPKEY
  VIEW ( ACCESS: 10, COST: 0.19 )
   PROJECT ( COLUMN_COUNT: 23, TUPLE_SIZE: 172, COST: 0.16 )
    SORT ( ITEM_SIZE: 16, ITEM_COUNT: 284, ACCESS: 10, COST: 0.05 )
     SCAN ( TABLE: T, FULL SCAN, ACCESS: 284, COST: 0.03 )
------------------------------------------------------------

select * from t order by TABLE_NAME limit 10;

------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 23, TUPLE_SIZE: 172, COST: 0.04 )
 LIMIT-SORT ( ITEM_SIZE: 16, ITEM_COUNT: 284, STORE_COUNT: 10, ACCESS: 10, COST: 0.03 )
  SCAN ( TABLE: T, FULL SCAN, ACCESS: 284, COST: 0.03 )
------------------------------------------------------------

limit 절을 명시했을 경우 LIMIT-SORT 연산자로 처리되고, SORT 연산자보다 cost가 낮다.




# 효과 측정 : Top-N 쿼리가 작동하지 않을 때

select *
from (
  select a.*, rownum no
  from (
    select * from t order by object_name
  ) a
)
where no <= 10 ;


Statistics
----------------------------------------------------------
         16  recursive calls
         21  db block gets
       1252  consistent gets
       1139  physical reads
          0  redo size
       2323  bytes sent via SQL*Net to client
        519  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          1  sorts (disk)
         10  rows processed



call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.02          0          2          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.20       0.23       1139       1106         21          10
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.20       0.26       1139       1108         21          10

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 84

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  VIEW  (cr=1106 pr=1139 pw=1139 time=0 us cost=4259 size=14365026 card=84006)
  77606   COUNT  (cr=1106 pr=1139 pw=1139 time=158411 us)
  77606    VIEW  (cr=1106 pr=1139 pw=1139 time=118968 us cost=4259 size=13272948 card=84006)
  77606     SORT ORDER BY (cr=1106 pr=1139 pw=1139 time=58011 us cost=4259 size=13272948 card=84006)
  77606      TABLE ACCESS FULL T (cr=1106 pr=0 pw=0 time=31578 us cost=308 size=13272948 card=84006)


- 알티베이스의 경우

select *
from (
  select a.*, rownum no from (
  select * from t order by TABLE_NAME) a
)
where no <= 10 ;

------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 24, TUPLE_SIZE: 184, COST: 0.40 )
 FILTER
  VIEW ( ACCESS: 284, COST: 0.36 )
   PROJECT ( COLUMN_COUNT: 24, TUPLE_SIZE: 184, COST: 0.32 )
    COUNTER
     VIEW ( ACCESS: 284, COST: 0.19 )
      PROJECT ( COLUMN_COUNT: 23, TUPLE_SIZE: 172, COST: 0.16 )
       SORT ( ITEM_SIZE: 16, ITEM_COUNT: 284, ACCESS: 284, COST: 0.05 )
        SCAN ( TABLE: T, FULL SCAN, ACCESS: 284, COST: 0.03 )
------------------------------------------------------------
오라클처럼 physical read와 physical write 항목은 없다.
SORT 연산 처리시 cost는 동일하지만, no <= 10 조건에 대한 fileter 비용이 추가되
TOP-N 알고리즘이 작동되는 쿼리보다 cost가 높은 편이다.

select *
from (
  select a.*, rownum no from (
  select * from t order by TABLE_NAME) a
) limit 10;

------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 24, TUPLE_SIZE: 184, COST: 0.36 )
 VIEW ( ACCESS: 10, COST: 0.35 )
  PROJECT ( COLUMN_COUNT: 24, TUPLE_SIZE: 184, COST: 0.32 )
   COUNTER
    VIEW ( ACCESS: 10, COST: 0.19 )
     PROJECT ( COLUMN_COUNT: 23, TUPLE_SIZE: 172, COST: 0.16 )
      SORT ( ITEM_SIZE: 16, ITEM_COUNT: 284, ACCESS: 10, COST: 0.05 )
       SCAN ( TABLE: T, FULL SCAN, ACCESS: 284, COST: 0.03 )
------------------------------------------------------------







3) Window 함수에서 Top-N 쿼리

window sort시에도 rank()나 row_number()를 쓰면 Top-N 쿼리 알고리즘이 작동해
max() 함수를 쓸 때보다 sort 부하를 경감시켜 줌.




- 마지막 이력 레코드를 찾는 예제


# 테스트 데이터 생성
create table t
as
select 1 id, rownum seq, owner, object_name, object_type, created, status
from   all_objects ;

begin
  for i in 1..9
  loop
    insert into t
    select i+1 id, rownum seq
         , owner, object_name, object_type, created, status
    from   t
    where id  = 1;
    commit;
  end loop;
end;
/

# Disk sort를 발생시키기 위해 sort_area_size를 줄임.
alter session set workarea_size_policy = manual;
alter session set sort_area_size = 1048576;




# max 함수 사용
select id, seq, owner, object_name, object_type, created, status
from  (select id, seq
            , max(seq) over (partition by id) last_seq
            , owner, object_name, object_type, created, status
       from t)
where  seq = last_seq;


------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   749K|    70M|       | 21831   (1)| 00:04:22 |
|*  1 |  VIEW               |      |   749K|    70M|       | 21831   (1)| 00:04:22 |
|   2 |   WINDOW SORT       |      |   749K|    60M|    75M| 21831   (1)| 00:04:22 |
|   3 |    TABLE ACCESS FULL| T    |   749K|    60M|       |  1924   (1)| 00:00:24 |
------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         55  recursive calls
         11  db block gets
       7031  consistent gets
      20876  physical reads
          0  redo size
       1163  bytes sent via SQL*Net to client
        519  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          1  sorts (disk)
         10  rows processed



select id, seq, owner, object_name, object_type, created, status
from  (select id, seq
            , max(seq) over (partition by id) last_seq
            , owner, object_name, object_type, created, status
       from t)
where  seq = last_seq

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.04          0          1          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        4      4.08       5.34      41752      14062         22          20
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        8      4.09       5.38      41752      14063         22          20

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 84

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  VIEW  (cr=7031 pr=20876 pw=13907 time=0 us cost=21831 size=73411408 card=749096)
 776060   WINDOW SORT (cr=7031 pr=20876 pw=13907 time=1051897 us cost=21831 size=63673160 card=749096)
 776060    TABLE ACCESS FULL T (cr=7031 pr=0 pw=0 time=455933 us cost=1924 size=63673160 card=749096)






# rank 함수 사용

select id, seq, owner, object_name, object_type, created, status
from  (select id, seq
            , rank() over (partition by id order by seq desc) rnum
            , owner, object_name, object_type, created, status
       from   t)
where rnum = 1;




-----------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |   749K|    70M|       | 21831   (1)| 00:04:22 |
|*  1 |  VIEW                    |      |   749K|    70M|       | 21831   (1)| 00:04:22 |
|*  2 |   WINDOW SORT PUSHED RANK|      |   749K|    60M|    75M| 21831   (1)| 00:04:22 |
|   3 |    TABLE ACCESS FULL     | T    |   749K|    60M|       |  1924   (1)| 00:00:24 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RNUM"=1)
   2 - filter(RANK() OVER ( PARTITION BY "ID" ORDER BY INTERNAL_FUNCTION("SEQ")
              DESC )<=1)

Statistics
----------------------------------------------------------
          5  recursive calls
         70  db block gets
       7105  consistent gets
         68  physical reads
          0  redo size
       1163  bytes sent via SQL*Net to client
        519  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          1  sorts (disk)
         10  rows processed


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.02          0          1          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      1.31       1.34         68       7031         70          10
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      1.31       1.37         68       7032         70          10

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 84

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  VIEW  (cr=7031 pr=68 pw=68 time=0 us cost=21831 size=73411408 card=749096)
    180   WINDOW SORT PUSHED RANK (cr=7031 pr=68 pw=68 time=358 us cost=21831 size=63673160 card=749096)
 776060    TABLE ACCESS FULL T (cr=7031 pr=0 pw=0 time=243198 us cost=1924 size=63673160 card=749096)



- 알티베이스의 경우

iSQL> select dno, salary from
    2 (select dno, salary, max(salary) over (partition by dno) max_sal
    3 from employees
    4 ) where salary=max_sal;
DNO         SALARY    
---------------------------
1001        2300      
1002        1700      
1003        4000      
2001        1400      
3001        1800      
3002        2500      
4001        1900      
4002        1890      
            1500      
9 rows selected.
------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 2, TUPLE_SIZE: 11, COST: 0.01 )
 FILTER
  VIEW ( ACCESS: 20, COST: 0.01 )
   PROJECT ( COLUMN_COUNT: 3, TUPLE_SIZE: 20, COST: 0.00 )
    WINDOW SORT ( ITEM_SIZE: 32, ITEM_COUNT: 20, ACCESS: 59, SORT_COUNT: 1, COST: 0.00 )
     SCAN ( TABLE: EMPLOYEES, FULL SCAN, ACCESS: 20, COST: 0.00 )
------------------------------------------------------------

iSQL> select dno, salary from
    2 (select dno, salary, rank() over (partition by dno order by salary) rnum
    3 from employees
    4 ) where rnum=1;
DNO         SALARY    
---------------------------
1001        2000      
1002        980       
1003        1000      
2001        1400      
3001        1800      
3002        2500      
4001        1200      
4002        500       
            1500      
9 rows selected.
------------------------------------------------------------
PROJECT ( COLUMN_COUNT: 2, TUPLE_SIZE: 11, COST: 0.01 )
 FILTER
  VIEW ( ACCESS: 20, COST: 0.01 )
   PROJECT ( COLUMN_COUNT: 3, TUPLE_SIZE: 24, COST: 0.00 )
    WINDOW SORT ( ITEM_SIZE: 24, ITEM_COUNT: 20, ACCESS: 59, SORT_COUNT: 1, COST: 0.00 )
     SCAN ( TABLE: EMPLOYEES, FULL SCAN, ACCESS: 20, COST: 0.00 )
------------------------------------------------------------

위 두 개의 실행계획을 비교하면 크게 다른 건 없다. ITEM_SIZE가 rank() 사용시 더 줄었음.

# SORT관련 연산자에 대한 설명

ITEM_SIZE : sorting을 위한 레코드 크기
ITEM_COUNT : sorting에 포함된 레코드의 개수
DISK_PAGE_COUNT : 임시 저장 테이블의 디스크 페이지 개수 (메모리 임시 테이블은 해당 정보 없음)
ACCESS : 저장된 레코드에 대한 접근 횟수
COST : 추산 비용




<< Sort Area 크기 조정 >>

1차 목표 : disk sort가 발생하지 않도록 한다.
2차 목표 : disk sort가 불가피하면 one pass sort로 처리되도록 한다.


1) PGA 메모리 관리 방식 선택
work area란?
데이터 정렬, 해시조인, 비트맵 머지, 비트맵 생성 등을 위해 사용되는 메모리 공간
관련 파라미터 : sort_area_size, hash_area_size, bitmap_merge_area_size, create_bitmap_area_size


자동 PGA메모리 관린 기능을 사용하려면 workarea_size_policy를 auto로 설정
pga_aggregate_target(인스턴스 전체적으로 이용 가능한 PGA 메모리 총량을 지정)로 지정된
수치안에서 오라클이 알아서 할당함.


alter session set workarea_size_policy = auto; 자동 PGA 메모리 관리 기능 활성화
*_area_size 파라미터 수치 모두 무시됨.


alter session set workarea_size_policy = manual;

alter session set sort_area_size = 10485760;


댓글 없음:

댓글 쓰기