SHARE

Oracle Contiguous Data Range via Full Outer Joins

Written By

Mar 11, 2013

3 minute read

By Alexander Polishchuk

Selecting a contiguous data range, when the source data range is incomplete, requires producing the missing data and merging it with the source data set. The following article describes a method of generating the required data in Oracle and merging it via full outer join or union data set operator.

Consider an example when the source table has timestamp and item count columns and contains 200,000 rows. The timestamp column values are not contiguous. Here is a data sample from this table:

Timestamp	Item Count
10/01/2012 10:00	10
10/01/2012 10:01	3
10/01/2012 10:02	4
10/01/2012 10:05	1

We can use Oracle hierarchical query to generate contiguous timestamps between the start and end timestamp. In this query pseudo column LEVEL is used as minute increment from the starting timestamp, while the timestamp difference multiplied by number of minutes in a day (1440) serves as a selection range.

SELECT TO_DATE('10-JAN-2012') + (LEVEL - 1)/1440
FROM dual
CONNECT BY LEVEL <= 1440 * (TO_DATE('11-JAN-2012') - TO_DATE('10-JAN-2012'))

Now both data sets have to be merged, so that all the data from the existing table is used and only the missing one is used from the generated one. Here is an example of such a query utilizing union data set operator.

SELECT time_stamp, item_count

FROM   items

WHERE  time_stamp >= TO_DATE('10-JAN-2012')

AND    time_stamp < TO_DATE('11-JAN-2012')

UNION ALL

SELECT ts, 0

FROM ( SELECT TO_DATE('10-JAN-2012') + (LEVEL - 1)/1440 AS ts

     FROM dual

     CONNECT BY LEVEL <= 1440 * (TO_DATE('11-JAN-2012') - TO_DATE('10-JAN-2012')))

WHERE NOT EXISTS (

     SELECT 1

     FROM   items

     WHERE  time_stamp = ts)

ORDER BY 1

Here is a solution using full outer join to merge both data sets.

SELECT ts AS time_stamp, NVL(item_count, 0) AS item_count

FROM ( SELECT TO_DATE('10-JAN-2012') + (LEVEL - 1)/1440 AS ts

     FROM dual

     CONNECT BY LEVEL <= 1440 * (TO_DATE('11-JAN-2012') - TO_DATE('10-JAN-2012')))

FULL OUTER JOIN items ON ( time_stamp = ts AND

     time_stamp >= TO_DATE('10-JAN-2012') AND

     time_stamp < TO_DATE('11-JAN-2012'))

WHERE  ts >= TO_DATE('10-JAN-2012')

AND    ts < TO_DATE('11-JAN-2012')

ORDER BY 1

Both statements return the following result set with contiguous timestamp values when running on Oracle 11gR2.

Timestamp	Item Count
10/01/2012 10:00	10
10/01/2012 10:01	3
10/01/2012 10:02	4
10/01/2012 10:03	0
10/01/2012 10:04	0
10/01/2012 10:05	1

However, it is not the case on all Oracle releases. For example, Oracle 9i full outer join returns extra rows that should not be there. A workaround for this is to use the WITH clause, but it will create temporary tables and cause additional I/O.

The following execution plan is for the union statement:

Id	Operation	Name	Rows	Bytes	Cost	%CPU	Time
0	SELECT STATEMENT		554	6089	9	12	00:00:01
1	SORT ORDER BY		554	6089	8	38	00:00:01
2	UNION-ALL
3	TABLE ACCESS BY INDEX ROWID	ITEMS	553	6083	5	0	00:00:01
*4	INDEX RANGE SCAN	IND1	553		3	0	00:00:01
*5	FILTER
6	VIEW		1	6	2	0	00:00:01
*7	CONNECT BY WITHOUT FILTERING
8	FAST DUAL		1		2	0	00:00:01
*9	INDEX RANGE SCAN	IND1	1	8	1	0	00:00:01

Predicate Information (identified by operation id):

 

4 - access("TIME_STAMP">=TO_DATE('2012-01-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND

     "TIME_STAMP"<TO_DATE('2012-01-11 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

5 - filter( NOT EXISTS (SELECT 0 FROM "ITEMS" "ITEMS" WHERE "TIME_STAMP"=:B1))

7 - filter(LEVEL<=1440)

9 - access("TIME_STAMP"=:B1)

The following execution plan is for the full outer join statement:

Id	Operation	Name	Rows	Bytes	Cost	%CPU	Time
0	SELECT STATEMENT		1	17	5	20	00:00:01
1	SORT ORDER BY		1	17	5	20	00:00:01
2	NESTED LOOPS OUTER		1	17	4	0	00:00:01
*3	VIEW		1	6	2	0	00:00:01
*4	CONNECT BY WITHOUT FILTERING
5	FAST DUAL		1		2	0	00:00:01
6	TABLE ACCESS BY INDEX ROWID	ITEMS	1	11	2	0	00:00:01
*7	INDEX RANGE SCAN	IND1	1		1	0	00:00:01

 

Predicate Information (identified by operation id):

 

3 - filter(INTERNAL_FUNCTION("from$_subquery$_001"."TS")>=

                     TO_DATE('2012-01-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND

     INTERNAL_FUNCTION("from$_subquery$_001"."TS")<

                     TO_DATE('2012-01-11 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

4 - filter(LEVEL<=1440)

7 - access("TIME_STAMP"(+)=INTERNAL_FUNCTION("TS"))

     filter("TIME_STAMP"(+)<TO_DATE('2012-01-11 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND

          "TIME_STAMP"(+)>=TO_DATE('2012-01-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

These execution plans show that the full outer join statement is better, than the union one. However, be aware that the execution plan may change in favor of union depending on data sets, query constraints, and indexes.

Using this approach allows efficient dynamic data range generation and saving the space, rather than storing unnecessary repetitive data in the database. Also, using the appropriate merging technique allows minimizing the required resources for the query execution.

About the author

Alex Polishchuk is the founder and president of Advanced Computer Consulting (www.advcomputerconsulting.com). Alex has over twenty years of professional experience designing, developing, and implementing database applications in various industries and companies ranging from small to Fortune 50 corporations. Alex’s primary area of expertise is in database performance optimization and tuning. He can be contacted at alex@advcomputerconsulting.com.

Oracle Contiguous Data Range via Full Outer Joins

About the author

DatabaseJournal.com Staff

Company

Categories