Synopsis. Oracle 10g has extended the already robust features of Structured Query Language (SQL) with a plethora of intriguing new capabilities. This article – the final in this series – reviews the new features that Oracle 10g provides for advanced data modeling and inter-row calculations via the new MODEL clause.
The previous article in this series reviewed Oracle 10g's enhancements to analytical functions, including the use of partitioned outer joins to “densify” data within multiple dimensions. As that article promised, I will now delve into the new MODEL clause that grants any Oracle developer or DBA the power to produce complex inter-row calculations using just the database engine, thus eliminating the need to massage data output via a third-party tool.
As I mentioned in my previous article, these two scenarios were among the more complex I had seen recently:
My first inclination would have been to tell my developers that they should only use Oracle to aggregate the required data, then export the data and whip up the final results through a third-party tool - perhaps nothing more complex than a Microsoft Excel spreadsheet - to accomplish these analyses. However, that was before I explored Oracle 10g's new MODEL clause.
Let's explored how I used the MODEL clause to solve the first scenario above. I will modify the SH.SALES_MIDWEST_ONLY view created for the previous article so that view now selects a slightly different subset of products, but still within the Midwestern states. Listing 3.1 shows the revised view.
Next, I will construct some sample SQL to demonstrate the new MODEL clause:
SELECT state, prod, total_sales
FROM sh.sales_midwest_only
WHERE prod BETWEEN 125 AND 130
GROUP BY state, prod
MODEL
PARTITION BY (state)
DIMENSION BY (prod)
MEASURES (SUM(sold) AS total_sales)
IGNORE NAV
UNIQUE DIMENSION
RULES UPSERT
(
total_sales[99910] =
total_sales[prod=125] + total_sales[prod=126],
total_sales[99920] =
total_sales[prod=127]
+ total_sales[prod=128]
+ total_sales[prod=129]
+ total_sales[prod=130],
total_sales[99999] =
total_sales[99920]
- total_sales[99910]
)
ORDER BY state, prod;
When this SQL statement is executed, Oracle retrieves the selected result set, places the result set into memory and then allows the MODEL clause to divvy up all returned rows into a multi-dimensional array. For example, a three-dimensional array can be visualized as a cube that has been cut into several horizontal slices, vertical slices, and layers. Continuing this cubist analogy (apologies to Pablo Picasso!), the MODEL clause also lets me apply rules that perform inter-row calculations on the data that is stored at each intersection of these slices and layers. (For the sake of simplicity and my own sanity, I promise to keep all examples under four dimensions.)
But wait – it gets better! MODEL also lets me create additional sets of values – for example, brand new rows that include the same columns as the other rows in the result set – by direct assignment of values to the new row's cells. I find it useful to picture this feature just as if I'd inserted a new row into an Excel spreadsheet, and then added a formula, perhaps a SUM() or just simple addition or subtraction, to calculate values into that new row's cells based on the contents of other cells in the spreadsheet.
Since this may be the first time that you have seen the MODEL clause in action, I will break down this rather complex statement into its components. Let's first take a look at the statements that classify the returned columns of this query into three distinct types:
Enforcing Result Set Uniqueness. Two statements let me control how to handle uniqueness within the returned result set. UNIQUE DIMENSION, the default, tells Oracle that the columns specified in the PARTITION BY and DIMENSION BY columns must form a unique key for the query. In this example, however, I have specified UNIQUE SINGLE REFERENCE to tell Oracle to only check single-cell references on the right-hand side of the rule instead of the entire result set.
Handling Missing Values. What happens when there are no values present at an intersection? MODEL also provides the IGNORE_NAV statement to tell Oracle to ignore missing (i.e. null and absent) values. Since I specified IGNORE_NAV in this example, Oracle will place a zero in any numeric columns, 01-JAN-2000 for dates, an empty string (not a NULL!) for character datatypes, and a NULL value for any other datatypes. If I had not specified IGNORE_NAV, Oracle would instead assume the default, KEEP_NAV, was in force and would instead return NULLs for both null and absent values.
Next, I will turn my attention to how the MODEL clause defines the business rules for calculating values, and how those rules will be applied to the result set:
Applying Rules to Measure Columns. Each value that is found in a MEASURE column can be operated upon by one or more rules. Much like a simple algebraic equation, a rule has two components:
Rule Processing Order. MODEL also lets me define the order in which rules can be applied to the result set. Since I specified the default method, SEQUENTIAL ORDER, in this query, Oracle will apply the rules to the result set in the order that the rules have been specified. However, if AUTOMATIC ORDER is picked instead, Oracle will select which rule is to be processed based on the order of the individual rules' dependency. It is important to note that if AUTOMATIC ORDER is selected, the end result is that a cell will be assigned a value just once. On the other hand, when SEQUENTIAL ORDER is specified, a cell could be assigned a value several times based on the order of the rules being applied, and the value based on the last applied rule becomes the final value.
Processing Rule Assignments. MODEL provides two methods for processing these value assignments:
Cell Assignments. The most complex set of statements describe which cells should participate in the rules, so naturally I have saved them for last. Fortunately, there are just two types, symbolic and positional.
total_sales[99910] =
total_sales[prod=125] + total_sales[prod=126],
tells Oracle to create a new row in the total_sales dimension that's indexed by value 99910, and add all values in the rows indexed by Product ID 125 and Product ID 126 into the columns in that new row.
Listing 3.2 shows the original SQL statement, and the results returned from its execution.
Another important difference in symbolic versus positional referencing is the way NULL values are treated. For example, if I was referencing a single cell in a two-dimensional array and I used a symbolic reference, the rule would not be applied; but if I used a positional reference for that same single cell, the rule would be applied. A slightly modified version of the previous query that uses positional rather than symbolic references is shown in Listing 3.3, and it effectively demonstrates this difference.
Finally, the MODEL clause offers the ITERATE directive to perform the same calculation a specific number of times until either the maximum iterations have been completed, or a threshold value has been reached that forces the iteration to be terminated. To complete the example solution for my second scenario – calculating outstanding cash balances based on the value of particular assets -- I have constructed a new table, SH.BALANCE_SHEETS, and populated it with some sample data representing several balance sheet accounts (see Listing 3.4).
Here are the business rules my users have supplied for projecting cash balances:
This would be relatively simple to do in an Excel spreadsheet, of course, but with ITERATE I can now also perform it right on the Oracle server. Here is the SQL statement I constructed against SH.BALANCE_SHEETS to accomplish this:
SELECT
acct#, fb
FROM sh.balance_sheets
MODEL
DIMENSION BY (acct#)
MEASURES (balance fb)
RULES ITERATE (3)
(
fb['3020'] =
((fb['1010'] + fb['2010'] - fb[2020] + fb[2030]) * 0.15)
,fb['2030'] = ((fb['1010'] + fb['2010'] - fb[2020]) * 0.075)
,fb['2020'] = (fb['1010'] + fb['2010'] * 0.30)
);
Listing 3.5 shows the results of executing this query for one, two, and three iterations.
Oracle 10g's new inter-row calculation capabilities significantly expand the already-powerful suite of analytical functions so that any developer or Oracle DBA can perform complex, advanced data modeling and reporting in multiple dimensions, all without the use of third-party software to perform these analyses. Oracle 10g's extensions to Structured Query Language have insured it is among the most robust, flexible, and fully featured of any of the relational premier database management systems.
While there is no substitute for direct experience, reading the manual is not a bad idea, either. I have drawn upon the following Oracle 10g documentation for the deeper technical details of this article:
B10736-01 Oracle Database Data Warehousing Guide, Chapter 22
B10750-01 Oracle Database New Features Guide
B10759-01 SQL Reference