Generate Test Data Quickly With Cross JoinsJuly 17, 2000 IntroductionNeed to rough up some bulk test data in a hurry? A carefully thought-out Cross Join could be the answer. Take any SQL query that joins two or more tables, delete the joining clause, and what do you get? In SQL terms you get a Cross Join, in relational database theory you get a Cartesian Product. Whatever you call it, you usually end up with far more rows than you wanted, and most of them make no sense. Although Cross Join queries are not normally much use, with a bit of thought we can use them to quickly create large amounts of useful test data. A simple exampleTake the following query:
This will produce 8 rows - the result of multiplying the four rows in the first derived table (flintstones_1) against the two rows in the second derived table (flintstones_2):
Needless to say, not all the above are real Flintstones, but that is not the point. The point is that we have a cheap and cheerful way of generating multiple unique names. For a small extra investment we can generate eighteen, not eight, unique names:
As many tables as you need can be Cross Joined to generate exponentially-large amounts of test data. This simple query generates 27 mostly-fake politicians with middle names:
A more practical exampleIn the following query I have raided a few more US Sitcoms to make a simple query that will generate no less than 150 unique authors in the PUBS database. Note that I have serialised the two parts of the data that will make up the author ID (and the phone number) to keep them unique, but I have chosen -55- to be the center portion of all my generated IDs (010-55-0010 for example) There were none in the initial authors table that matched this pattern so this gives me an at-a-glance way of identifying my auto-generated authors.
SummaryThe principle will work for any test data provided you construct your query carefully - you can generate multiple orders for multiple books across multiple stores for multiple dates. The data will exhibit a regular pattern, rather than real-world randomness, but in most cases that will not be a problem.
|