In a recent article I introduced a cloud-based approach to Extraction, Transformation, and Loading (ETL) that relies on SQL Server Integration Services (SSIS) runtime of Azure Data Factory. As I pointed out, this approach benefits from the managed nature and cloud-related advantages of Azure Data Factory (such as scalability and availability with minimized infrastructure requirements), while at the same time it allows customers to leverage investment in their existing SSIS package libraries. In this article, we will step through the process of implementing this functionality.
As explained previously, the functionality referenced above relies on three primary architectural components:
- Azure Data Factory, which constitutes a cloud-based integration service, responsible for implementing data-driven workflows, including automation and transformation tasks.
- SSISDB catalog, which provides storage for SSIS packages and the corresponding metadata, implemented as Azure SQL Database (in the form of a single database, a member of an elastic pool, or a Managed Instance). Note that the choice of SSISDB catalog implies the use of the SSIS Project Deployment Model. For file system hosted packages, it is possible to use the Package Deployment Model (for a list of differences that distinguishes it from the legacy Package Deployment Model, refer to Microsoft Docs).
- Azure SSIS Integration Runtime (Azure SSIS IR) component of Azure Data Factory, which serves as the runtime engine for package execution.
The most straightforward way to provision Azure Data Factory is available via the graphical interface of the Azure portal. During provisioning, you will need to assign a globally unique name to the new instance of Data Factory and provide the name of a new or existing resource group to host its resource, the Azure region where the Data Factory metadata will reside (note that this does not impose restrictions on the locations of data stores accessible via Data Factory), and the version number. The supported versions include V1 and V2, but you should consider choosing the latter (for differences between the two versions, refer to Microsoft Docs). You also have the option to enable Git integration (Azure DevOps Git or GitHub) to facilitate collaboration, source control, change tracking, as well as continuous integration and continuous deployment (CI/CD).
The purpose of the next step is to prepare for creation of an Azure SQL Database-based SSISDB catalog. While Azure SSIS Integration Runtime has the ability to automatically create the SSISDB database, you need to first ensure that the underlying database engine is available. To accomplish this, you should deploy either a logical Azure SQL Database server or an Azure SQL Database Managed instance in the region hosting the Data Factory deployment (in order to minimize operational latency and charges associated with egress network traffic).
The choice of the database implementation (a stand-alone Azure SQL Database, a member of an elastic pool, or a Managed Instance) depends to some extent on the desired level of isolation and the need to connect to on-premises data stores. In particular, Managed Instance deploys directly into Azure virtual networks, while single instances and pooled Azure SQL Database support direct virtual network connectivity via service endpoints. Similarly, Azure SSIS Integration Runtime can be joined to a virtual network. In hybrid scenarios (where some of linked services reside in corporate data centers), you have the option of connecting that virtual network to your on-premises environment via Site-to-Site VPN, eliminating the need for deploying Self Hosted Integration Runtime.
With the Azure SQL Database-related prerequisites satisfied, you are ready to provision Azure SSIS Integration Runtime. This step is performed directly from within the Azure Data Factory interface, either via the Configure SSIS Integration Runtime option in the Data Factory overview pane or via the Integration Runtime tab of the Authoring UI. Both options lead to the Integration Runtime Setup blade, prompting you to specify the following configuration settings:
- Name and Description: These are arbitrary values, with name uniqueness enforced within the same resource group
- Location: This value designates an Azure region hosting the runtime (you should select the same region that will host the SSISDB database)
- Node Size and Node Number: Integration runtime is implemented as a cluster of Azure VMs. You can adjust the compute resources available in the cluster by scaling it vertically (changing the node size) or horizontally (changing the number of nodes)
- Edition/License: The runtime can be based on either SQL Server Standard or Enterprise, with the latter offering more advanced functionality, such as additional connectors (Oracle, Teradata, SAP BW, SQL Server Analysis Services, and Azure Analysis Services) as well as Fuzzy Grouping and Fuzzy Lookup transformations (for the comprehensive list of features available in the Enterprise Edition, refer to Microsoft Docs).
- Save Money: Customers with Software Assurance might be able to realize significant savings by taking advantage of Azure Hybrid Benefit option if they have existing SQL Server licenses assigned to on-premises workloads.
- Create SSIS catalog (SSISDB): Enabling this checkbox indicates your intent to implement Project Deployment Model, which requires specifying additional parameters of the target SSISDB catalog, including:
- Subscription
- Location (this should match the Azure region hosting the integration runtime)
- Catalog Database Server Endpoint (fully qualified domain name of the target Azure SQL Database logical server or Managed Instance)
- Admin Username and Admin Password (alternatively, you can use Azure AD authentication with the managed identity associated with the Azure Data Factory)
- Catalog Database Service Tier (determines the performance characteristics of the database)
- Maximum Parallel Executions per Node: Your choice should reflect the intended workload characteristics, with the low number geared towards resource intensive packages and a large number appropriate for lightweight, simultaneously executing packages.
- Custom Setup Container SAS URI: This option facilitates more advanced, custom deployments based on scripts and associated artifacts residing in Azure Blob Storage
- Select a VNet for your Azure-SSIS Integration Runtime to join and allow ADF to create certain network resources: By enabling this option, you allow for virtual network-based connectivity between the runtime and SSIS DB. This option also provides the ability to connect to on-premises data by leveraging hybrid connectivity (such as Site-to-Site VPN) without the need for Self-Hosted Integration Runtime.
- Set up Self-Hosted Integration Runtime as a proxy for your Azure-SSIS Integration Runtime: Alternatively, in hybrid scenarios, you can eliminate the need for cross-premises connectivity and provide access to on-premises data via Self-Hosted Integration Runtime.
This concludes our overview of installation of SSIS Integration Runtime. In upcoming articles, I will show you how to author and execute SSIS packages that take advantage of the capabilities of SSIS-IR.