In our most recent article published on this forum, we have presented a procedure for uploading content of an arbitrarily chosen table from an on-premise SQL Server database to an Azure PaaS-based SQL Database. We have accomplished this by implementing a custom SQL Server Integration Services-based project (created by using SQL Server Data Tools - Business Intelligence for Visual Studio 2012). While this approach provides the most flexibility when handling data extraction, transformation, and loading, its inherent complexity might not be justified in more simplistic scenarios. In this article, we will present an alternative method of dealing with data migration to the Microsoft cloud that addresses this concern by employing the bcp utility, which not only eliminates the need for any custom development, but also rivals SSIS in terms of its data transfer efficiency.
Just as before, we will need to assume that our sample target SQL Database with all of its objects (but no data) already exists, since the bcp utility can neither be used to copy on-premise database schema to Windows Azure nor detect and mitigate compatibility issues that might be discovered during such transition (which, incidentally, is also a shortcoming of SSIS). In addition, we should ensure that our source database is Azure PaaS-compliant. In order to satisfy these requirements, when preparing our on-premise test environment, we can leverage the procedure for generating and enforcing SQL Database compliant AdventureWorks2012 database schema that we have already described in detail in the Migrating SQL Server Database to Windows Azure - Resolving Incompatibility Issues article published previously on this forum. When setting up our Azure-resident target, we will take advantage (for the sake of simplicity) of a copy of the pre-packaged Adventure Works for Azure SQL Database available for download from the CodePlex Web site. While its original intent is to automate the tasks of creating the database, configuring its schema, and populating it with sample data, we will customize it slightly to perform only the second of these tasks.
Let's start by creating our sample target database. In the Windows Azure Management Portal, navigate to the entry labeled SQL Databases. Once you select it, switch to the Servers view and click on the Add button at the bottom of the window to create a new Azure-based SQL Server instance (assuming that you have not done this yet). Alternately, you can click on the New button in the lower left corner to create a new SQL Database (you can actually create a new server as part of the New SQL Database wizard). Note that each server relies on SQL Authentication to validate incoming connections, so when setting up a new server you will be prompted to specify Login Name and Password for the administrative account (assigned the sysadmin fixed server role). You will also need to designate the region where the server should be hosted by selecting it from the corresponding listbox and enabling the Allow Windows Azure Services to access the server checkbox (in order to make SQL Databases hosted on the server manageable via the Windows Azure Management Portal). When setting up a new database, choosing the Custom Create option will allow you to designate the desired database edition and the maximum size (Web 1 GB will more than suffice for the purpose of this exercise) as well as its collation, in addition to its name (which we will set to AdventureWorks2012_BCP) and the hosting SQL Server instance.
Once these steps are completed, extract the content of the downloaded Adventure Works for Azure SQL Database file to a temporary folder on your administrative computer. You will find there an executable
ExecuteSQL.exe, a batch file
CreateAdventureWorksForSQLAzure.cmd, as well as a couple of T-SQL scripts named
CreateAdventureWorksForSQLAzure_DB.sql (the last three of these files reside in the AdventureWorks subfolder). As you can easily determine based on the content of the second of them, you can configure the schema of our target database by running the following command from the root of the temporary folder (where server_name, admin_name, and passwordcorrespond respectively to the name of your Azure SQL Server instance and its administartive credentials):
ExecuteSQL.exe server_name.database.windows.net AdventureWorks2012_BCP admin_name password .\AdventureWorks\AdventureWorks2012ForSQLAzure_Schema.sql
After implementing the source database by following the aforementioned article (Migrating SQL Server Database to Windows Azure - Resolving Incompatibility Issues), we are ready to focus on specifics of the bulk copy import. The bcp is a command line utility incorporated into the SQL Server 2012 Feature Pack (more specifically, its Connectivity Feature Pack) available from the Microsoft Download Center. (Note that it has dependency on Microsoft SQL Server Native Client, conveniently included in the same Feature Pack). Alternatively, it can also be installed as part of the SQL Server 2012 Management Tools from the SQL Server installation media.
The bcp utility facilitates data transfers by leveraging SQL Server API (similarly to the methodology employed by BACPAC-driven transfers, which we recently covered on this forum). The transfer takes place in two stages, with the first one involving an export from a source data set (which can be scoped to a table, a view, or results of a T-SQL query) into a data file and the second one carrying out an import of this file into a target table. The syntax of the respective commands (applicable within the context of this article) uses a format resembling the following:
bcp data_set out data_file.dat –S server_name –T -n -q
bcp data_set in data_file.dat –S tcp:server_name.database.windows.net –U admin_username –P admin_password –n -q –h ”TABLOCK”
...where the in and out designate direction of data transfer (from the point of view of the SQL Server instance), data_set represents a fully qualified identifier (consisting of three parts database_name.schema.object) of a table, view, or a T-SQL query (the last of these is applicable to export only) containing data to be copied, server_name is an on-premise or Azure-based instance of SQL Server, and admin_username combined with admin_password constitute the SQL Server authentication credentials. (In the example listed above, we assume that during initial export we are leveraging integrated Windows security, as indicated by the -T switch). The remaining parameters of the bcp utility (relevant to the topics covered here) include the following:
- -N or -n : expedites transfer of non-character data by avoiding converting it to and from character format. Character data gets stored (respectively, depending on the case of the parameter) in Unicode (to accommodate extended character sets and differing code pages across databases) or non-Unicode format. Note that applying this parameter automatically limits the amount of data format checks (to further improve performance), so you should either verify that source and target tables actually match or use a format file that accounts for any differences (for more detail refer to SQL Server 2012 Books Online).
- -q : invokes the SET QUOTED_IDENTIFIERS ON statement when establishing the connection to a source or target SQL Server instance. This is necessary in situations where the value of the data_setparameter contains a space or a quotation mark.
- -h : allows you to optimize the import operation by specifying hints to be used when loading data into a target table or view. In this particular context, you might want to consider using the following:
- –h “TABLOCK” : enforces table level lock for the duration of the load (rather than defaulting to less efficient row-level lock, as determined by the table option table lock on bulk load).
- –h “ORDER(column[ASC|DESC])” : improves the performance of the load operation by indicating the sort order of data in the data file, assuming that such data is sorted according to the clustered index on the column of a target table (incidentally, the presence of such index is one of the requirements of Azure-based SQL Databases).
- -b : increases resiliency of the load operation by dividing the content of the data file into separate, smaller batches of arbitrarily chosen size. Since import of each batch constitutes a separate transaction, failure of any of them does not affect the others.
- -F and -L : give you the ability to perform concurrent imports, whose individual scope is identified by the first (-F) and last (-L) data row numbers within the source data file.
- -E : intended to preserve an IDENTITY primary key from the source table when importing its content into the target (without it, new identity values are automatically calculated and inserted during data load).
- -d : can be used to specify a source or target database to connect to, in lieu of the first segment in the three-part data_set identifier (if you decide to include it, you should remove the database_name from database_name.schema.object).
To further optimize the bulk copy process, it might be appropriate to apply temporary changes to the target database (including, for example, disabling non-clustered indexes or triggers) and re-enable them once the load is completed. In addition, in scenarios where the source and target tables do not match, you will likely have to resort to configuring a bcp format file, that accounts for their differences. For more information regarding this subject, refer to SQL Server Books Online.
It is worth noting that the pre-packaged Adventure Works for Azure SQL Database available for download from the CodePlex Web site we leveraged earlier takes advantage of the bcp utility to populate SQL Database once its schema is configured. You can identify individual steps necessary to complete this process by reviewing content of the CreateAdventureWorksForSQLAzure.cmd batch file located in the AdventureWorks folder within the compressed downloadable archive.
See all articles by Marcin Policht