Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Jan 5, 2009

Grouping with SQL Server 2008

By Don Schlichting

Introduction

Grouping records is the process of combining like SELECT columns together. Once combined, Aggregate functions such as SUM and AVERAGE can be performed on the set. This article will examine several SQL Server 2008 grouping functions including GROUP BY, GROUPING SETS, CUBE, and ROLLUP. In addition, we’ll demonstrate how to use SUM and COUNT in a WHERE clause after rows have been grouped.

DISTINCT

Before beginning a review of grouping functions, we’ll introduce the TSQL keyword DISTINCT because one of first questions usually asked when discussing grouping is “what is the difference between GROUP BY and DISTINCT?” DISTINCT is a TSQL keyword that will remove duplicates from a record set. For example, in the following table of first names, the return of a DISTINCT statement is shown.

table of first names

SELECT DISTINCT FirstName from Names;

the return of a DISTINCT statement

So although there are five rows in the table, only three records are returned. The DISTINCT keyword has removed any duplicates. It may be helpful to think of DISTINCT as meaning “the removing of duplicate rows”. The rows have not been grouped in the sense that we could ask “how many Bob first names are there.” For that we would need a “GROUP BY” statement rather than DISTINCT. DISTINCT simply removes duplicates and presents a list of what remains.

GROUP BY

The TSQL key word GROUP UP combines like selected rows together in a summary row. Here is the above statement rewritten with a GROUP BY.

SELECT FirstName FROM Names GROUP BY FirstName;

results of DISTINCT statement rewritten with a GROUP BY

The results are identical to the DISTINCT example. The reason for using GROUP BY rather than DISTINCT is the ability to use Aggregate functions such as COUNT and SUM. Here is the same query as above but including a count of how many times each name appears.

SELECT FirstName, COUNT(*) FROM Names GROUP BY FirstName;

results of SELECT FirstName, COUNT(*) FROM Names GROUP BY FirstName; including a count of how many times each name appears

Aggregates

There are several aggregate commands that are frequently used with GROUP BY.

AVG – Returns the average value of a grouped column.

COUNT – Returns the number of times the time appears in a column.

MAX – Returns the single greatest value in the grouped column.

MIN - Returns the least great value in the grouped column.

SUM – Returns the calculated total.

For a complete list of Aggregates, see Books On Line: Aggregate Functions Transact-SQL.

GROUP BY

This next example will be run on the SQL 2008 Adventure Works LT database. Adventure Works is a free test database that can be downloaded from Microsoft.

When multiple columns are selected, they must also be included in the grouping. For example, if both the First Name and Last Names were selected, then the grouping would need to include them as well, as shown below.

SELECT FirstName, LastName, COUNT(*) AS Nbr
FROM SalesLT.Customer 
GROUP BY FirstName, LastName;

multiple columns are selected, they must also be included in the grouping

As an aside, Aggregates can be used without an explicit GROUP BY statement, creating an implied GROUP BY as shown below:

SELECT  COUNT(*) AS Nbr
FROM SalesLT.Customer 

results of SELECT  COUNT(*) AS Nbr FROM SalesLT.Customer

For this to work though, only the Aggregate can be Selected, no other columns.

GROUPING SETS

GROUPING SETS are a new feature of SQL Server 2008. Using them will allow multiple groupings to be returned in one record set. This example will group twice, once on SalesPerson , the second time on the table as a whole.

SELECT SalesPerson, COUNT(*) AS Nbr
FROM SalesLT.Customer 
GROUP BY 
 GROUPING SETS
 (
  (SalesPerson),
  ()
  );

GROUPING SETS allow multiple groupings to be returned in one record set

The last row with a SalesPerson name of NULL is the total Count. We could rewrite our statement using CASE to replace the NULL value with a more readable output.

 SELECT  [Sales Rep] =
 CASE WHEN SalesPerson IS NULL THEN 'Total'
 ELSE SalesPerson
 END 
 , COUNT(*) AS Nbr
FROM SalesLT.Customer 
GROUP BY 
 GROUPING SETS
 (
  (SalesPerson),
  ()
  );

using CASE to replace the NULL value with a more readable output

In this next example, based on the “Address” table, the Grouping Set creates three summary levels. Giving us the number of records located in the same City + State, the count of same State, then a Grand Total of all.

SELECT StateProvince, City, COUNT(StateProvince) AS Nbr
FROM SalesLT.Address
GROUP BY 
 GROUPING SETS
 (
 (StateProvince, City),
 (StateProvince),
 ()
 );

the Grouping Set creates three summary levels

UNION

UNION is a TSQL keyword for combining multiple Select statements into one result set. UNION was one of the methods used for creating summary rows before GROUPING SETS existed. Below is the above State + City example but rewritten using a UNION.

SELECT StateProvince, City, COUNT(StateProvince) AS Nbr
FROM SalesLT.Address
GROUP BY StateProvince, City
UNION 
SELECT StateProvince, NULL AS City, COUNT(StateProvince) AS Nbr
FROM SalesLT.Address
GROUP BY StateProvince
UNION 
SELECT NULL AS StateProvince, NULL AS City, COUNT(StateProvince) AS Nbr
FROM SalesLT.Address
ORDER BY StateProvince DESC, City DESC

Each statement combined with a UNION must have the same number of columns. That’s why the last two groups use NULL as column placeholders.

CUBE and ROLLUP

Before GROUPING SETS, CUBE and ROLLUP were also used to create summary rows. Both are extensions of the GROUP BY statement. The example below using ROLLUP creates a summary row for each group of First Names and a Total Row at the bottom.

SELECT FirstName, LastName, COUNT(*) AS Nbr
FROM SalesLT.Customer 
GROUP BY FirstName, LastName
 WITH ROLLUP

ROLLUP creates a summary row for each group of First Names and a Total Row at the bottom

WHERE - HAVING

Aggregates such as COUNT can be used as a filtering clause, like a WHERE statement, in addition to being included in a SELECT list. Because aggregates are derived from a GROUP BY, their syntax is different from a regular WHERE statement. Instead, there is a special keyword, HAVING. As an example, a statement selecting only Cities that have COUNTs greater than two is shown below.

SELECT City, COUNT(City) AS NbrCity
FROM SalesLT.Address
GROUP BY City
HAVING COUNT(City) > 2

results of a statement selecting only Cities that have COUNTs greater than two

Conclusion

Grouping in SQL Server allow a single statement to return aggregated totals as well as discreet rows in one record set. The new GROUPING SETS command creates TSQL statements that are more readable and easily understood compared with multiple UNION statements or CUBE and ROLLUP. The DISTINCT keyword, while handy for use in a SELECT returning only unique value, cannot be used for totaling. Lastly, to filter on GROUPED totals, use the HAVING clause.

» See All Articles by Columnist Don Schlichting



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM