Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Slideshows
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted October 13, 2016

A New Way To Pattern Match In Oracle 12c

By David Fitzjarrell

Oracle 12.1.0.2 provides new functionality for finding pattern matches in a data set; using the MATCH_RECOGNIZE function it's fairly easy to generate results based on a defined pattern in the data. Let's look at some examples and see how to use MATCH_RECOGNIZE to find and report user-defined patterns in a data set. The examples provided were taken from the Live SQL website (https://livesql.oracle.com) and from the Oracle documentation (https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8981), where MATCH_RECOGNIZE is more thoroughly explained. This is not an exhaustive treatise on MATCH_RECOGNIZE, but an introduction to the function and some of what it can accomplish. Keeping that in mind, let's begin.

These examples begin with a table of click data, similar to what an advertising agency might collect from users clicking on an advertisement. Sessions are defined as being 10 minutes or more apart; all data within a 10-minute window for the same user is considered to be the same session. We create the table and load the data:


SQL> 
SQL> --
SQL> -- Create table and populate
SQL> --
SQL> CREATE TABLE clickdata (tstamp integer, userid varchar2(15));

Table created.

SQL> INSERT INTO clickdata VALUES(1, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(2, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(11, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(12, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(22, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(23, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(32, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(34, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(43, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(44, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(47, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(48, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(53, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(59, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(60, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(63, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(68, 'Sam');

1 row created.

SQL> commit;

Commit complete.

Using a very basic MATCH_RECOGNIZE query we return the timestamp and the user id to see how many sessions each user had:


SQL> 
SQL> --
SQL> -- Return rows that are part of the same
SQL> -- session for a login
SQL> --
SQL> -- Any session tstamp value within 10
SQL> -- minutes of the prior tstamp is considered
SQL> -- part of the same session
SQL> --
SQL> -- In MATCH_RECOGNIZE the PATTERN is the
SQL> -- sequence or data pattern we are looking for
SQL> --
SQL> -- In this case b maps to any row, which gives
SQL> -- the ability to check for matches
SQL> --
SQL> -- s* states that 0 or more rows must be mapped
SQL> -- to confirm a match
SQL> --
SQL> -- s is defined as the difference between the
SQL> -- current tstamp value and the prior tstamp
SQL> -- value and the difference must be less than
SQL> -- or equal to 10
SQL> --
SQL> -- This will give a list of the activity of the
SQL> -- different continuous sessions
SQL> --
SQL> -- The ALL ROWS PER MATCH returns every matching
SQL> -- row
SQL> --
SQL> 
SQL> SELECT
  2   tstamp,
  3   userid
  4  FROM clickdata MATCH_RECOGNIZE(
  5  	ALL ROWS PER MATCH
  6  	PATTERN (b s*)
  7  	DEFINE
  8  	    s as (s.tstamp - prev(s.tstamp) <= 10)
  9   );

    TSTAMP USERID                                                                                                                                     
---------- ---------------                                                                                                                            
         1 Mary                                                                                                                                       
         2 Sam                                                                                                                                        
        11 Mary                                                                                                                                       
        12 Sam                                                                                                                                        
        22 Sam                                                                                                                                        
        23 Mary                                                                                                                                       
        32 Sam                                                                                                                                        
        34 Mary                                                                                                                                       
        43 Sam                                                                                                                                        
        44 Mary                                                                                                                                       
        47 Sam                                                                                                                                        

    TSTAMP USERID                                                                                                                                     
---------- ---------------                                                                                                                            
        48 Sam                                                                                                                                        
        53 Mary                                                                                                                                       
        59 Sam                                                                                                                                        
        60 Sam                                                                                                                                        
        63 Mary                                                                                                                                       
        68 Sam                                                                                                                                        

17 rows selected.

Two interesting things are found in this construct, the PATTERN keyword and the DEFINE keyword. PATTERN defines a generalized pattern, using Perl-style regular expression syntax, to declare a particular pattern in the data. DEFINE sets up the definitions for the symbols used in the PATTERN statement. Since b has no conditions on it every row is returned; the conditions on s state that the difference between the current timestamp and the previous timestamp must be less than or equal to 10. There are 17 rows of data in the table and all 17 are returned with this MATCH_RECOGNIZE query. Modifying that query to find the unique session identifier for each session fails because not enough pieces of the MATCH_RECOGNZE syntax are included; it 'fails' by returning 1 for the count of each session and that isn't true:


SQL> 
SQL> --
SQL> -- The MATCH_NUMBER() Function should return the
SQL> -- session id for each different session
SQL> --
SQL> -- In this case it doesn't as we are missing a
SQL> -- piece of the puzzle
SQL> --
SQL> SELECT
  2   tstamp,
  3   userid,
  4   session_id
  5  FROM clickdata MATCH_RECOGNIZE(
  6  	MEASURES match_number() as session_id
  7  	ALL ROWS PER MATCH
  8  	PATTERN (b s*)
  9  	DEFINE
 10  	    s as (s.tstamp - prev(s.tstamp) <= 10)
 11   );

    TSTAMP USERID          SESSION_ID                                                                                                                 
---------- --------------- ----------                                                                                                                 
         1 Mary                     1                                                                                                                 
         2 Sam                      1                                                                                                                 
        11 Mary                     1                                                                                                                 
        12 Sam                      1                                                                                                                 
        22 Sam                      1                                                                                                                 
        23 Mary                     1                                                                                                                 
        32 Sam                      1                                                                                                                 
        34 Mary                     1                                                                                                                 
        43 Sam                      1                                                                                                                 
        44 Mary                     1                                                                                                                 
        47 Sam                      1                                                                                                                 

    TSTAMP USERID          SESSION_ID                                                                                                                 
---------- --------------- ----------                                                                                                                 
        48 Sam                      1                                                                                                                 
        53 Mary                     1                                                                                                                 
        59 Sam                      1                                                                                                                 
        60 Sam                      1                                                                                                                 
        63 Mary                     1                                                                                                                 
        68 Sam                      1                                                                                                                 

17 rows selected.

Proving Mary has three sessions is fairly easy:


SQL> 
SQL> --
SQL> -- As all records show 1 session this isn't accurate
SQL> -- since Mary has 3 separate sessions according to
SQL> -- the previously established criteria
SQL> --
SQL> SELECT *
  2  FROM clickdata
  3  WHERE userid = 'Mary';

    TSTAMP USERID                                                                                                                                     
---------- ---------------                                                                                                                            
         1 Mary                                                                                                                                       
        11 Mary                                                                                                                                       
        23 Mary                                                                                                                                       
        34 Mary                                                                                                                                       
        44 Mary                                                                                                                                       
        53 Mary                                                                                                                                       
        63 Mary                                                                                                                                       

7 rows selected.

The first two records belong to the first session, the third is another session and the remaining records make up the third session. Correcting the MATCH_RECOGNIZE query is a simple matter of partitioning (grouping) by userid and ordering each group by the tstamp values:


SQL> 
SQL> --
SQL> -- To 'fix' this we need to add some instructions
SQL> -- to the MATCH_RECOGNIZE function call
SQL> --
SQL> -- By partitioning  by userid  (collecting the data
SQL> -- based on the userid column) and ordering each
SQL> -- grouping by tstamp the MATCH_RECOGNIZE() call
SQL> -- can  accurately return the session id
SQL> --
SQL> SELECT
  2   tstamp,
  3   userid,
  4   session_id
  5  FROM clickdata MATCH_RECOGNIZE(
  6  	PARTITION BY userid ORDER BY tstamp
  7  	MEASURES match_number() as session_id
  8  	ALL ROWS PER MATCH
  9  	PATTERN (b s*)
 10  	DEFINE
 11  	    s as (s.tstamp - prev(s.tstamp) <=10)
 12   );

    TSTAMP USERID          SESSION_ID                                                                                                                 
---------- --------------- ----------                                                                                                                 
         1 Mary                     1                                                                                                                 
        11 Mary                     1                                                                                                                 
        23 Mary                     2                                                                                                                 
        34 Mary                     3                                                                                                                 
        44 Mary                     3                                                                                                                 
        53 Mary                     3                                                                                                                 
        63 Mary                     3                                                                                                                 
         2 Sam                      1                                                                                                                 
        12 Sam                      1                                                                                                                 
        22 Sam                      1                                                                                                                 
        32 Sam                      1                                                                                                                 

    TSTAMP USERID          SESSION_ID                                                                                                                 
---------- --------------- ----------                                                                                                                 
        43 Sam                      2                                                                                                                 
        47 Sam                      2                                                                                                                 
        48 Sam                      2                                                                                                                 
        59 Sam                      3                                                                                                                 
        60 Sam                      3                                                                                                                 
        68 Sam                      3                                                                                                                 

17 rows selected.

Now the report correctly displays that there were three sessions for each user.

Before we go further let's look at some parts of the MATCH_RECOGNIZE statement by defining the keywords being used. PARTITION groups the data by the column or columns specified, similar to the way the analytic functions work. ORDER BY orders the data in each group by the column or columns provided. The MEASURES keyword allows you to define the values you want to return and how to compute them based on the pattern being matched. The MATCH_NUMBER() function returns the number of the pattern match, starting with 1 and increasing incrementally as each pattern match is found. The FIRST and LAST keywords return the first and last values for a given pattern match, and can be used in calculations. Finally the ALL ROWS PER MATCH directive returns one row for each criteria match in the data. Using these in the following modifications to the MATCH_RECOGNIZE query produces a report of the session number, the number of events in each session, the start time and the session duration:


SQL> 
SQL> --
SQL> -- Let's expand this report
SQL> --
SQL> -- Setting up the MEASURES clause allows us
SQL> -- to return:
SQL> --
SQL> --  * the number of events per session
SQL> --  * the start time of each session
SQL> --  * the session duration
SQL> --
SQL> -- We have already successfully returned the session id
SQL> -- so now we add the count(*) for each group, the
SQL> -- start time for each group and the elapsed time for
SQL> -- each group calculated from the last tstamp for
SQL> -- each group
SQL> --
SQL> -- In one group first and last are the same so the difference
SQL> -- is 0
SQL> --
SQL> SELECT
  2   tstamp,
  3   userid,
  4   session_id,
  5   no_of_events,
  6   start_time,
  7   session_duration
  8  FROM clickdata MATCH_RECOGNIZE(
  9  	PARTITION BY userid ORDER BY tstamp
 10  	MEASURES match_number() as session_id,
 11  		 count(*) as no_of_events,
 12  		 first(tstamp) start_time,
 13  		 last(tstamp) - first(tstamp) session_duration
 14  	ALL ROWS PER MATCH
 15  	PATTERN (b s*)
 16  	DEFINE
 17  	    s as (s.tstamp - prev(s.tstamp) <= 10)
 18   );

    TSTAMP USERID          SESSION_ID NO_OF_EVENTS START_TIME SESSION_DURATION                                                                        
---------- --------------- ---------- ------------ ---------- ----------------                                                                        
         1 Mary                     1            1          1                0                                                                        
        11 Mary                     1            2          1               10                                                                        
        23 Mary                     2            1         23                0                                                                        
        34 Mary                     3            1         34                0                                                                        
        44 Mary                     3            2         34               10                                                                        
        53 Mary                     3            3         34               19                                                                        
        63 Mary                     3            4         34               29                                                                        
         2 Sam                      1            1          2                0                                                                        
        12 Sam                      1            2          2               10                                                                        
        22 Sam                      1            3          2               20                                                                        
        32 Sam                      1            4          2               30                                                                        

    TSTAMP USERID          SESSION_ID NO_OF_EVENTS START_TIME SESSION_DURATION                                                                        
---------- --------------- ---------- ------------ ---------- ----------------                                                                        
        43 Sam                      2            1         43                0                                                                        
        47 Sam                      2            2         43                4                                                                        
        48 Sam                      2            3         43                5                                                                        
        59 Sam                      3            1         59                0                                                                        
        60 Sam                      3            2         59                1                                                                        
        68 Sam                      3            3         59                9                                                                        

17 rows selected.

As mentioned above using ALL ROWS PER MATCH can produce a long report. Taking this one step further and changing ALL ROWS PER MATCH to ONE ROW PER MATCH produces a shorter, summary report of the activity:


SQL> 
SQL> --
SQL> -- One of the nice aspects of MATCH_RECOGNIZE is the
SQL> -- ability to generate summary reports
SQL> --
SQL> -- The last report was nice but a bit lengthy
SQL> --
SQL> -- Using the ONE ROW PER MATCH directive and removing
SQL> -- the tstamp column from the select statement
SQL> -- allows MATCH_RECOGNIZE to return only the summary
SQL> -- information, making a more compact and, in this case,
SQL> -- usable report
SQL> --
SQL> SELECT
  2   userid,
  3   session_id,
  4   no_of_events,
  5   start_time,
  6   session_duration
  7  FROM clickdata MATCH_RECOGNIZE(
  8  	PARTITION BY userid ORDER BY tstamp
  9  	MEASURES match_number() as session_id,
 10  		 count(*) as no_of_events,
 11  		 first(tstamp) start_time,
 12  		 last(tstamp) - first(tstamp) session_duration
 13  	ONE ROW PER MATCH
 14  	PATTERN (b s*)
 15  	DEFINE
 16  	    s as (s.tstamp - prev(s.tstamp) <= 10)
 17   );

USERID          SESSION_ID NO_OF_EVENTS START_TIME SESSION_DURATION                                                                                   
--------------- ---------- ------------ ---------- ----------------                                                                                   
Mary                     1            2          1               10                                                                                   
Mary                     2            1         23                0                                                                                   
Mary                     3            4         34               29                                                                                   
Sam                      1            4          2               30                                                                                   
Sam                      2            3         43                5                                                                                   
Sam                      3            3         59                9                                                                                   

6 rows selected.

SQL> 

Let's create another table and data set, this time for stock prices for the month of April:


SQL> 
SQL> --
SQL> -- Create table for stock data
SQL> --
SQL> -- Populate the table with data
SQL> -- for April
SQL> --
SQL> CREATE TABLE Ticker (SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER);

Table created.

SQL> 
SQL> INSERT INTO Ticker VALUES('ACME', '01-Apr-11', 12);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '02-Apr-11', 17);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '03-Apr-11', 19);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '04-Apr-11', 21);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '05-Apr-11', 25);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '06-Apr-11', 12);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '07-Apr-11', 15);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '08-Apr-11', 20);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '09-Apr-11', 24);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '10-Apr-11', 25);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '11-Apr-11', 19);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '12-Apr-11', 15);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '13-Apr-11', 25);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '14-Apr-11', 25);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '15-Apr-11', 14);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '16-Apr-11', 12);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '17-Apr-11', 14);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '18-Apr-11', 24);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '19-Apr-11', 23);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '20-Apr-11', 22);

1 row created.

SQL> 
SQL> COMMIT;

Commit complete.

This data has a different pattern structure, where V-shaped patterns are generated by the changing stock prices. Let's use MATCH_RECOGNIZE to find and report those patterns. This first query will return the total number of patterns found in the data:


SQL> 
SQL> --
SQL> -- Use MATCH_RECOGNIZE to find areas
SQL> -- in the data where the stock price
SQL> -- 'dipped' then rose
SQL> --
SQL> -- The data provided describes three
SQL> -- patterns of dipping and rising prices
SQL> --
SQL> -- The query will return three rows,
SQL> -- one per matching pattern
SQL> --
SQL> -- The only stock symbol in the table
SQL> -- is 'ACME' but this query would return
SQL> -- one row for each pattern matched per
SQL> -- stock symbol
SQL> --
SQL> -- Here we group by symbol and order
SQL> -- by the tstamp values
SQL> --
SQL> -- The MEASURES section defines the
SQL> -- starting timestamp of the pattern,
SQL> -- the lowest value of the group and
SQL> -- the highest value after the low has
SQL> -- been reached
SQL> --
SQL> -- Note that in one case the ending
SQL> -- tstamp for one pattern is the starting
SQL> -- tstamp for the	next pattern
SQL> --
SQL> -- This query introduces a new wrinkle.
SQL> -- AFTER MATCH SKIP TO LAST UP, which
SQL> -- causes the pattern matching for the
SQL> -- next group to begin at the last UP
SQL> -- value from the current pattern
SQL> --
SQL> SELECT *
  2  FROM Ticker MATCH_RECOGNIZE (
  3  	  PARTITION BY symbol
  4  	  ORDER BY tstamp
  5  	  MEASURES  STRT.tstamp AS start_tstamp,
  6  		    LAST(DOWN.tstamp) AS bottom_tstamp,
  7  		    LAST(UP.tstamp) AS end_tstamp
  8  	  ONE ROW PER MATCH
  9  	  AFTER MATCH SKIP TO LAST UP
 10  	  PATTERN (STRT DOWN+ UP+)
 11  	  DEFINE
 12  	     DOWN AS DOWN.price < PREV(DOWN.price),
 13  	     UP AS UP.price > PREV(UP.price)
 14  	  ) MR
 15  ORDER BY MR.symbol, MR.start_tstamp;

SYMBOL     START_TST BOTTOM_TS END_TSTAM                                                                                                              
---------- --------- --------- ---------                                                                                                              
ACME       05-APR-11 06-APR-11 10-APR-11                                                                                                              
ACME       10-APR-11 12-APR-11 13-APR-11                                                                                                              
ACME       14-APR-11 16-APR-11 18-APR-11                                                                                                              

We expand the report to return more data. The comments explain what is being returned and how the MEASURES are coded:


SQL> 
SQL> --
SQL> -- This next query returns all rows for
SQL> -- each pattern match and returns a bit more
SQL> -- data
SQL> --
SQL> -- There are 5 MEASURES being used
SQL> --
SQL> -- The start timestamp
SQL> --
SQL> -- The bottom of the v-shaped pattern
SQL> --
SQL> -- The top of the v-shaped pattern
SQL> --
SQL> -- The row number to know which values
SQL> -- belong to a given pattern match
SQL> --
SQL> -- The classifier (UP, DOWN, STRT)
SQL> -- to know where the value lies in the
SQL> -- pattern
SQL> --
SQL> SELECT *
  2  FROM Ticker MATCH_RECOGNIZE (
  3  	  PARTITION BY symbol
  4  	  ORDER BY tstamp
  5  	  MEASURES  STRT.tstamp AS start_tstamp,
  6  		    FINAL LAST(DOWN.tstamp) AS bottom_tstamp,
  7  		    FINAL LAST(UP.tstamp) AS end_tstamp,
  8  		    MATCH_NUMBER() AS match_num,
  9  		    CLASSIFIER() AS var_match
 10  	  ALL ROWS PER MATCH
 11  	  AFTER MATCH SKIP TO LAST UP
 12  	  PATTERN (STRT DOWN+ UP+)
 13  	  DEFINE
 14  	     DOWN AS DOWN.price < PREV(DOWN.price),
 15  	     UP AS UP.price > PREV(UP.price)
 16  	  ) MR
 17  ORDER BY MR.symbol, MR.match_num, MR.tstamp;

SYMBOL     TSTAMP    START_TST BOTTOM_TS END_TSTAM  MATCH_NUM VAR_MATCH                           PRICE                                               
---------- --------- --------- --------- --------- ---------- ------------------------------ ----------                                               
ACME       05-APR-11 05-APR-11 06-APR-11 10-APR-11          1 STRT                                   25                                               
ACME       06-APR-11 05-APR-11 06-APR-11 10-APR-11          1 DOWN                                   12                                               
ACME       07-APR-11 05-APR-11 06-APR-11 10-APR-11          1 UP                                     15                                               
ACME       08-APR-11 05-APR-11 06-APR-11 10-APR-11          1 UP                                     20                                               
ACME       09-APR-11 05-APR-11 06-APR-11 10-APR-11          1 UP                                     24                                               
ACME       10-APR-11 05-APR-11 06-APR-11 10-APR-11          1 UP                                     25                                               
ACME       10-APR-11 10-APR-11 12-APR-11 13-APR-11          2 STRT                                   25                                               
ACME       11-APR-11 10-APR-11 12-APR-11 13-APR-11          2 DOWN                                   19                                               
ACME       12-APR-11 10-APR-11 12-APR-11 13-APR-11          2 DOWN                                   15                                               
ACME       13-APR-11 10-APR-11 12-APR-11 13-APR-11          2 UP                                     25                                               
ACME       14-APR-11 14-APR-11 16-APR-11 18-APR-11          3 STRT                                   25                                               

SYMBOL     TSTAMP    START_TST BOTTOM_TS END_TSTAM  MATCH_NUM VAR_MATCH                           PRICE                                               
---------- --------- --------- --------- --------- ---------- ------------------------------ ----------                                               
ACME       15-APR-11 14-APR-11 16-APR-11 18-APR-11          3 DOWN                                   14                                               
ACME       16-APR-11 14-APR-11 16-APR-11 18-APR-11          3 DOWN                                   12                                               
ACME       17-APR-11 14-APR-11 16-APR-11 18-APR-11          3 UP                                     14                                               
ACME       18-APR-11 14-APR-11 16-APR-11 18-APR-11          3 UP                                     24                                               

15 rows selected.

Going even further let's generate a report that displays where the row falls in the pattern (STRT, DOWN, UP), the total number of UP days in the pattern, the total number of days for each pattern, the day number in each pattern, the running price difference and the starting price for the pattern. Remember that the FINAL keyword returns the last value for each pattern:


SQL> 
SQL> --
SQL> -- This query aggregates data and returns
SQL> -- the number of days mapped to the UP pattern
SQL> -- (up_days), the total number of days each
SQL> -- for each pattern found, the day number within
SQL> -- each match (a running count that resets when
SQL> -- a new pattern match begins) and the running
SQL> -- price difference between each row in a match
SQL> --
SQL> SELECT *
  2  FROM Ticker MATCH_RECOGNIZE (
  3    PARTITION BY symbol
  4    ORDER BY tstamp
  5    MEASURES
  6  	 MATCH_NUMBER() AS match_num,
  7  	 CLASSIFIER() AS var_match,
  8  	 FINAL COUNT(UP.tstamp) AS up_days,
  9  	 FINAL COUNT(tstamp) AS total_days,
 10  	 RUNNING COUNT(tstamp) AS cnt_days,
 11  	 price - STRT.price AS price_dif
 12    ALL ROWS PER MATCH
 13    AFTER MATCH SKIP TO LAST UP
 14    PATTERN (STRT DOWN+ UP+)
 15    DEFINE
 16  	 DOWN AS DOWN.price < PREV(DOWN.price),
 17  	 UP AS UP.price > PREV(UP.price)
 18    ) MR
 19  ORDER BY MR.symbol, MR.match_num, MR.tstamp;

SYMBOL     TSTAMP     MATCH_NUM VAR_MATCH                         UP_DAYS TOTAL_DAYS   CNT_DAYS  PRICE_DIF      PRICE                                 
---------- --------- ---------- ------------------------------ ---------- ---------- ---------- ---------- ----------                                 
ACME       05-APR-11          1 STRT                                    4          6          1          0         25                                 
ACME       06-APR-11          1 DOWN                                    4          6          2        -13         12                                 
ACME       07-APR-11          1 UP                                      4          6          3        -10         15                                 
ACME       08-APR-11          1 UP                                      4          6          4         -5         20                                 
ACME       09-APR-11          1 UP                                      4          6          5         -1         24                                 
ACME       10-APR-11          1 UP                                      4          6          6          0         25                                 
ACME       10-APR-11          2 STRT                                    1          4          1          0         25                                 
ACME       11-APR-11          2 DOWN                                    1          4          2         -6         19                                 
ACME       12-APR-11          2 DOWN                                    1          4          3        -10         15                                 
ACME       13-APR-11          2 UP                                      1          4          4          0         25                                 
ACME       14-APR-11          3 STRT                                    2          5          1          0         25                                 

SYMBOL     TSTAMP     MATCH_NUM VAR_MATCH                         UP_DAYS TOTAL_DAYS   CNT_DAYS  PRICE_DIF      PRICE                                 
---------- --------- ---------- ------------------------------ ---------- ---------- ---------- ---------- ----------                                 
ACME       15-APR-11          3 DOWN                                    2          5          2        -11         14                                 
ACME       16-APR-11          3 DOWN                                    2          5          3        -13         12                                 
ACME       17-APR-11          3 UP                                      2          5          4        -11         14                                 
ACME       18-APR-11          3 UP                                      2          5          5         -1         24                                 

15 rows selected.

Remember that the original pattern to match was a V-shape, and there were three. Changing the query to look for W shapes, where the end of the previous V shape is the start of the next V shape, we find only one pattern in the data:


SQL> 
SQL> --
SQL> -- This query looks for W shapes, of which
SQL> -- there is one in the data
SQL> --
SQL> -- The query was changed to look for two
SQL> -- consecutive down/up patterns where the
SQL> -- ending value of the first down/up is the
SQL> -- starting value of the next down/up
SQL> -- pattern
SQL> --
SQL> SELECT *
  2  FROM Ticker MATCH_RECOGNIZE (
  3    PARTITION BY symbol
  4    ORDER BY tstamp
  5    MEASURES
  6  	 MATCH_NUMBER() AS match_num,
  7  	 CLASSIFIER()  AS  var_match,
  8  	 STRT.tstamp AS start_tstamp,
  9  	 FINAL LAST(UP.tstamp) AS end_tstamp
 10    ALL ROWS PER MATCH
 11    AFTER MATCH SKIP TO LAST UP
 12    PATTERN (STRT DOWN+ UP+ DOWN+ UP+)
 13    DEFINE
 14  	 DOWN AS DOWN.price < PREV(DOWN.price),
 15  	 UP AS UP.price > PREV(UP.price)
 16    ) MR
 17  ORDER BY MR.symbol, MR.match_num, MR.tstamp;

SYMBOL     TSTAMP     MATCH_NUM VAR_MATCH                      START_TST END_TSTAM      PRICE                                                         
---------- --------- ---------- ------------------------------ --------- --------- ----------                                                         
ACME       05-APR-11          1 STRT                           05-APR-11 13-APR-11         25                                                         
ACME       06-APR-11          1 DOWN                           05-APR-11 13-APR-11         12                                                         
ACME       07-APR-11          1 UP                             05-APR-11 13-APR-11         15                                                         
ACME       08-APR-11          1 UP                             05-APR-11 13-APR-11         20                                                         
ACME       09-APR-11          1 UP                             05-APR-11 13-APR-11         24                                                         
ACME       10-APR-11          1 UP                             05-APR-11 13-APR-11         25                                                         
ACME       11-APR-11          1 DOWN                           05-APR-11 13-APR-11         19                                                         
ACME       12-APR-11          1 DOWN                           05-APR-11 13-APR-11         15                                                         
ACME       13-APR-11          1 UP                             05-APR-11 13-APR-11         25                                                         

9 rows selected.

SQL> 

Why weren't there two W-shaped patterns? The last V-shape didn't start at the same point as the end of the prior V-shape so it didn't make a second W-shaped pattern in the data.

MATCH_RECOGNIZE is a powerful and useful addition to Oracle 12.1.0.x as it can enable you to define and report patterns in your data. It's ideal for data warehousing applications and as an analytical tool to report repeating patterns in user data. Everyone may not get benefit from MATCH_RECOGNIZE but, for those who do, it's a welcome change to an already exceptional RDBMS.

See all articles by David Fitzjarrell



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM