Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted March 11, 2013

Oracle Contiguous Data Range via Full Outer Joins

By Alexander Polishchuk

Selecting a contiguous data range, when the source data range is incomplete, requires producing the missing data and merging it with the source data set. The following article describes a method of generating the required data in Oracle and merging it via full outer join or union data set operator.

Consider an example when the source table has timestamp and item count columns and contains 200,000 rows. The timestamp column values are not contiguous. Here is a data sample from this table:

 

 

Timestamp

Item Count

10/01/2012 10:00

10

10/01/2012 10:01

3

10/01/2012 10:02

4

10/01/2012 10:05

1

We can use Oracle hierarchical query to generate contiguous timestamps between the start and end timestamp. In this query pseudo column LEVEL is used as minute increment from the starting timestamp, while the timestamp difference multiplied by number of minutes in a day (1440) serves as a selection range.

SELECT TO_DATE('10-JAN-2012') + (LEVEL - 1)/1440
FROM dual
CONNECT BY LEVEL <= 1440 * (TO_DATE('11-JAN-2012') - TO_DATE('10-JAN-2012'))

Now both data sets have to be merged, so that all the data from the existing table is used and only the missing one is used from the generated one. Here is an example of such a query utilizing union data set operator.

SELECT time_stamp, item_count

FROM   items

WHERE  time_stamp >= TO_DATE('10-JAN-2012')

AND    time_stamp < TO_DATE('11-JAN-2012')

UNION ALL

SELECT ts, 0

FROM ( SELECT TO_DATE('10-JAN-2012') + (LEVEL - 1)/1440 AS ts

     FROM dual

     CONNECT BY LEVEL <= 1440 * (TO_DATE('11-JAN-2012') - TO_DATE('10-JAN-2012')))

WHERE NOT EXISTS (

     SELECT 1

     FROM   items

     WHERE  time_stamp = ts)

ORDER BY 1

Here is a solution using full outer join to merge both data sets.

SELECT ts AS time_stamp, NVL(item_count, 0) AS item_count

FROM ( SELECT TO_DATE('10-JAN-2012') + (LEVEL - 1)/1440 AS ts

     FROM dual

     CONNECT BY LEVEL <= 1440 * (TO_DATE('11-JAN-2012') - TO_DATE('10-JAN-2012')))

FULL OUTER JOIN items ON ( time_stamp = ts AND

     time_stamp >= TO_DATE('10-JAN-2012') AND

     time_stamp < TO_DATE('11-JAN-2012'))

WHERE  ts >= TO_DATE('10-JAN-2012')

AND    ts < TO_DATE('11-JAN-2012')

ORDER BY 1

Both statements return the following result set with contiguous timestamp values when running on Oracle 11gR2.

Timestamp

Item Count

10/01/2012 10:00

10

10/01/2012 10:01

3

10/01/2012 10:02

4

10/01/2012 10:03

0

10/01/2012 10:04

0

10/01/2012 10:05

1

However, it is not the case on all Oracle releases. For example, Oracle 9i full outer join returns extra rows that should not be there. A workaround for this is to use the WITH clause, but it will create temporary tables and cause additional I/O.

The following execution plan is for the union statement:

Id

Operation

Name

Rows

Bytes

Cost

%CPU

Time

0

SELECT STATEMENT

 

554

6089

9

12

00:00:01

1

 SORT ORDER BY

 

554

6089

8

38

00:00:01

2

  UNION-ALL

 

 

 

 

 

 

3

   TABLE ACCESS BY INDEX ROWID

ITEMS

553

6083

5

0

00:00:01

*4

    INDEX RANGE SCAN

IND1

553

 

3

0

00:00:01

*5

   FILTER

 

 

 

 

 

 

6

    VIEW

 

1

6

2

0

00:00:01

*7

     CONNECT BY WITHOUT FILTERING

 

 

 

 

 

 

8

      FAST DUAL

 

1

 

2

0

00:00:01

*9

    INDEX RANGE SCAN

IND1

1

8

1

0

00:00:01

Predicate Information (identified by operation id):

 

4 - access("TIME_STAMP">=TO_DATE('2012-01-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND

     "TIME_STAMP"<TO_DATE('2012-01-11 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

5 - filter( NOT EXISTS (SELECT 0 FROM "ITEMS" "ITEMS" WHERE "TIME_STAMP"=:B1))

7 - filter(LEVEL<=1440)

9 - access("TIME_STAMP"=:B1)

The following execution plan is for the full outer join statement:

Id

Operation

Name

Rows

Bytes

Cost

%CPU

Time

0

SELECT STATEMENT

 

1

17

5

20

00:00:01

1

  SORT ORDER BY

 

1

17

5

20

00:00:01

2

   NESTED LOOPS OUTER

 

1

17

4

0

00:00:01

*3

    VIEW

 

1

6

2

0

00:00:01

*4

     CONNECT BY WITHOUT FILTERING

 

 

 

 

 

 

5

      FAST DUAL

 

1

 

2

0

00:00:01

6

    TABLE ACCESS BY INDEX ROWID

ITEMS

1

11

2

0

00:00:01

*7

     INDEX RANGE SCAN

IND1

1

 

1

0

00:00:01

 

Predicate Information (identified by operation id):

 

3 - filter(INTERNAL_FUNCTION("from$_subquery$_001"."TS")>=

                     TO_DATE('2012-01-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND

     INTERNAL_FUNCTION("from$_subquery$_001"."TS")<

                     TO_DATE('2012-01-11 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

4 - filter(LEVEL<=1440)

7 - access("TIME_STAMP"(+)=INTERNAL_FUNCTION("TS"))

     filter("TIME_STAMP"(+)<TO_DATE('2012-01-11 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND

          "TIME_STAMP"(+)>=TO_DATE('2012-01-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

These execution plans show that the full outer join statement is better, than the union one. However, be aware that the execution plan may change in favor of union depending on data sets, query constraints, and indexes.

Using this approach allows efficient dynamic data range generation and saving the space, rather than storing unnecessary repetitive data in the database. Also, using the appropriate merging technique allows minimizing the required resources for the query execution.

About the author

Alex Polishchuk is the founder and president of Advanced Computer Consulting (www.advcomputerconsulting.com). Alex has over twenty years of professional experience designing, developing, and implementing database applications in various industries and companies ranging from small to Fortune 50 corporations. Alex’s primary area of expertise is in database performance optimization and tuning. He can be contacted at alex@advcomputerconsulting.com.



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM


















Thanks for your registration, follow us on our social networks to keep up-to-date