SHARE
Facebook X Pinterest WhatsApp

Generate Test Data Quickly With Cross Joins

Written By
thumbnail
Neil Boyle
Neil Boyle
Jul 18, 2000

Introduction

Need to rough up some bulk test data in a hurry? A carefully thought-out
Cross Join could be the answer.

Take any SQL query that joins two or more tables, delete the joining clause,
and what do you get? In SQL terms you get a Cross Join, in relational database
theory you get a Cartesian Product. Whatever you call it, you usually end up
with far more rows than you wanted, and most of them make no sense. Although
Cross Join queries are not normally much use, with a bit of thought we can use
them to quickly create large amounts of useful test data.

A simple example

Take the following query:

select * from
(
    select “Fred” as fName union 
    select “Wilma” union
    select “Barney” union
    select “Betty”
) as flintstones_1 CROSS JOIN
(
    select “Flintstone” as lName union
    select “Rubble” 
) as flintstones_2

This will produce 8 rows – the result of multiplying the four rows in the
first derived table (flintstones_1) against the two rows in the second derived
table (flintstones_2):

fName  lName 
—— ———- 
Betty  Rubble
Betty  Flintstone
Barney Rubble
Barney Flintstone
Wilma  Rubble
Wilma  Flintstone
Fred   Rubble
Fred   Flintstone

(8 row(s) affected)

Needless to say, not all the above are real Flintstones, but that is not the
point. The point is that we have a cheap and cheerful way of generating multiple
unique names. For a small extra investment we can generate eighteen, not eight,
unique names:

select * from
(
    select “Fred” as fName union 
    select “Wilma” union
    select “Barney” union
    select “Betty” union
    select “Al” union
    select “Peggy” 

) as characters_1 CROSS JOIN
(
    select “Flintstone” as lName union
    select “Rubble” union
    select “Bundy”
) as characters_2

As many tables as you need can be Cross Joined to generate
exponentially-large amounts of test data. This simple query generates 27
mostly-fake politicians with middle names:

select * from
(
    select “Harry” as fName union 
    select “Winston” union 
    select “Vladimir” 
) as polit1 CROSS JOIN
(
    select “S ” as mName union 
    select “Spencer” union 
    select “Ilich” 
) as polit2 CROSS JOIN
(
    select "Trueman" as lName union 
    select “Churchill” union 
    select “Lenin” 
) as polit3 

A more practical example

In the following query I have raided a few more US Sitcoms to make a simple
query that will generate no less than 150 unique authors in the PUBS database.
Note that I have serialised the two parts of the data that will make up the
author ID (and the phone number) to keep them unique, but I have chosen -55- to
be the center portion of all my generated IDs (010-55-0010 for example) There
were none in the initial authors table that matched this pattern so this gives
me an at-a-glance way of identifying my auto-generated authors.

insert authors 
select au_id1 + ‘-‘ + au_id2 as au_ud, 
    fName, 
    lName,
    au_id1 + ‘ 5’ + au_id2 as phone,
    ‘Test address for ‘ + fName + ‘ ‘ + lName, 
    ‘London’,
    ‘UK’,
    ‘12345’,
    1
from
(
    select ‘009’ as au_id1, ‘Fred’ as fName union 
    select ‘010’, ‘Wilma’ union
    select ‘012’, ‘Barney’ union
    select ‘013’, ‘Betty’ union
    select ‘014’, ‘Al’ union
    select ‘015’, ‘Peggy’ union
    select ‘016’, ‘Frasier’ union
    select ‘017’, ‘Niles’ union
    select ‘018’, ‘Homer’ union
    select ‘019’, ‘Marge’ union
    select ‘020’, ‘Hawkeye’ union
    select ‘021’, ‘Trapper’ union
    select ‘024’, ‘Sam’ union
    select ‘025’, ‘Diane’ union
    select ‘026’, ‘Rebecca’
) as test_authors_part_1 CROSS JOIN
(
    select ’55-0010′ as au_id2, ‘Flintstone’ as lName union
    select ’55-0021′, ‘Rubble’ union
    select ’55-0022′, ‘Bundy’ union
    select ’55-0023′, ‘Crane’ union
    select ’55-0024′, ‘Simpson’ union
    select ’55-0025′, ‘Pierce’ union
    select ’55-0026′, ‘John’ union
    select ’55-0028′, ‘Malone’ union
    select ’55-0029′, ‘Chambers’ union
    select ’55-0030′, ‘Howe’
) as test_authors_part_2

Summary

The principle will work for any test data provided you construct your query
carefully – you can generate multiple orders for multiple books across multiple
stores for multiple dates. The data will exhibit a regular pattern, rather than
real-world randomness, but in most cases that will not be a problem.

Recommended for you...

Best Online Courses to Learn SQL
Ronnie Payne
Sep 23, 2022
Best Courses for Database Administrators
Ronnie Payne
Jul 22, 2022
Tip 74 – Changing Cost Threshold for Parallelism
Gregory Larsen
Feb 24, 2021
How Many Databases Can You Name?
Brad Jones
May 11, 2020
Database Journal Logo

DatabaseJournal.com publishes relevant, up-to-date and pragmatic articles on the use of database hardware and management tools and serves as a forum for professional knowledge about proprietary, open source and cloud-based databases--foundational technology for all IT systems. We publish insightful articles about new products, best practices and trends; readers help each other out on various database questions and problems. Database management systems (DBMS) and database security processes are also key areas of focus at DatabaseJournal.com.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.