When someone mentions the term “reusable query”, the first thought that comes to mind is usually stored procedures. While they are an integral part of writing reusable code, they are only one ingredient in a bigger pie. Other key players include views and native as well as user-defined functions. This article will explain how all of these elements fit together so that your SELECT statements can be applied to a greater variety of queries.
The Role of Views
Views are actually used for several very different purposes, such as to simplify complex schemas and/or queries, and to implement security. One way that views contribute to security is to hide auditing fields from developers. A third possibility is to reduce the number of columns in order to improve performance. The idea would be to reference only indexed fields that are blazingly fast to search on. In practice, that can be tricky to implement because you have to be sure that you won’t need to access one of the hidden columns. Our main interest here is the use of views to simulate Joins between two or more tables to reduce the complexity of our queries. In Clearing a Path through the 3NF Join Jungle, we saw how a typical user profile in a database normalized to third normal form (3NF) could require up to six joins:
select * from Users u inner join UserPhoneNumbers upn on u.user_id = upn.user_id inner join UserScreenNames usn on u.user_id = usn.user_id inner join UserAffiliations ua on u.user_id = ua.user_id inner join Affiliations a on a.affiliation_id = ua.affiliation_id inner join UserWorkHistory uwh on u.user_id = uwh.user_id inner join Affiliations wa on uwh.affiliation_id = wa.affiliation_id
Here’s how you could replace this common lookup with a view:
CREATE VIEW `vusers` AS select * from Users u inner join UserPhoneNumbers upn on u.user_id = upn.user_id inner join UserScreenNames usn on u.user_id = usn.user_id inner join UserAffiliations ua on u.user_id = ua.user_id inner join Affiliations a on a.affiliation_id = ua.affiliation_id inner join UserWorkHistory uwh on u.user_id = uwh.user_id inner join Affiliations wa on uwh.affiliation_id = wa.affiliation_id;
Now, we can retrieve a user profile with the following simple SELECT statement:
select * from vusers u where u.user_id = 100
Know your Built-in MySQL Functions
One of my articles on crosstab queries, MySQL Prepared Statements to Generate Crosstab SQL, demonstrated the use of several MySQL functions, including CONCAT() and GROUP_CONCAT(). The GROUP_CONCAT() function in particular, was utilized to aggregate all the rows from the a table and return the collection of SELECT list expressions that made up the horizontal axis of the crosstab. In effect, it is what made the porting of the crosstab SELECT statement into a stored procedure possible.
Other functions like Count(), Month(), and MonthName(), as well as filtering statements such as CASE WHEN ELSE all contributed to making the code more generic.
Collect Your own Functions
Other times, when MySQL’s built-in functions don’t cover what you’re looking for, you are presented with an opportunity to write your own user function. At times such as these, it’s helpful to think of the function that serves a general purpose. Here is a convenience function that checks whether or not a mandatory stored procedure parameter is assigned. I don’t want to allow an empty string or spaces so this function checks for those as well as NULL:
BEGIN DECLARE isEmpty TINYINT; SET isEmpty = (param_name IS NULL or char_length(trim(param_name)) = 0); RETURN isEmpty; END
Notice that within my function I am calling the char_length() and trim() built-in functions. I can and do use this generic function all the time now.
Before you go off and write dozens of functions, make sure that there isn’t already something similar on the Web. There are plenty of MySQL function repository sites such as www.mysqludf.org that might just have exactly what you’re looking for!
Stored Procedures
There’s no question that stored procedures are a great way to both modularize and centralize your code. However, you should bear in mind that moving SQL code to a stored proc does not in-and-of itself render it more generic or reusable. Take for example, the following statement from the Use Derived Tables in Your MySQL Queries to Improve Performance article, which generates a report that tallies the total number of bonuses that each employee received last year:
SELECT e.name, e.salary, COUNT(b.bonus_id) AS 'Total Bonuses' FROM employees e LEFT OUTER JOIN (SELECT emp_id, bonus_id FROM bonuses WHERE YEAR(award_date) = 2009) AS b ON e.id = b.emp_id GROUP BY e.id;
It’s easy enough to convert it to a stored procedure:
CREATE PROCEDURE `p_2009_bonuses_lst`() LANGUAGE SQL NOT DETERMINISTIC CONTAINS SQL SQL SECURITY DEFINER BEGIN SELECT e.name, e.salary, COUNT(b.bonus_id) AS 'Total Bonuses' FROM employees e LEFT OUTER JOIN (SELECT emp_id, bonus_id FROM bonuses WHERE YEAR(award_date) = 2009) AS b ON e.id = b.emp_id GROUP BY e.id; END;
Now the clients can run this proc to their hearts’ content…that is until next year! Are developers really dumb enough to create something this limiting? Well, I can remember a time that my team was! We generated many reports that were supposed to be one-time jobs, before we caught on that the clients seemed to ask for very similar reports around the same time every year! We now abide by a variation of a great line from Momento, “don’t believe [their] lies!”
Here is the proc again with the hard-coded date removed:
CREATE PROCEDURE `p_yearly_bonuses_lst`(IN `@year` INT) LANGUAGE SQL NOT DETERMINISTIC CONTAINS SQL SQL SECURITY DEFINER BEGIN SELECT e.name, e.salary, COUNT(b.bonus_id) AS 'Total Bonuses' FROM employees e LEFT OUTER JOIN (SELECT emp_id, bonus_id FROM bonuses WHERE YEAR(award_date) = @year) AS b ON e.id = b.emp_id GROUP BY e.id; END;
As a forward thinking developer, you might ask yourself if we can do even better. If we had been so proactive, we too would have realized that the clients were bound to come back asking for more flexible start and end dates, which they did a short time later, when they requested that the date range coincide with the fiscal year!
From that point on, all of our reports have had start and end date parameters, whether clients want them or not! MySQL has a wonderful BETWEEN operator, which is ideal for comparing values against a range. Here is how to use it against a start and end date:
CREATE PROCEDURE `p_bonuses_lst`( IN `@StartDate` DATETIME, IN `@EndDate` DATETIME ) LANGUAGE SQL NOT DETERMINISTIC CONTAINS SQL SQL SECURITY DEFINER BEGIN SELECT e.name, e.salary, COUNT(b.bonus_id) AS 'Total Bonuses' FROM employees e LEFT OUTER JOIN (SELECT emp_id, bonus_id FROM bonuses WHERE award_date Between @StartDate AND @EndDate) AS b ON e.id = b.emp_id GROUP BY e.id; END;
Conclusion
Today we saw how views, native and user-defined functions, as well as stored procedures all play a role in making your SELECT queries more generic and reusable. One final subject, which was not presented here, is the adherence to coding templates and standards. They also play a part because the structure of your code should remain more-or-less constant across tables and even databases. The reasoning is that if the inclusion of several steps in a certain order worked once, they’ll likely work again. By the same token, sometimes breaking a tested structure introduces bugs. So once you’ve got some best practices, follow them!