Introduction
In a previous article we discussed Modern Data Warehouse designs patterns and components. In this article we will discuss two more modern design patterns to handle your scenarios; 1) Advanced Analytics on big data 2) Real time analytics.
Advanced Analytics on big data and Real-time analytics are prime business needs these days and require a modern design using the latest technology components. Microsoft Azure provides a set of technology components to meet all your needs. Advanced Analytics can be achieved using Azure Machine Learning tools. A modern design helps to build and deploy custom machine learning models. These models can transform data into actionable insight. In the situation of analyzing and processing data from the Internet of Things (IoT), Weblog and live streaming data, a real-time analytics design pattern can be used.
Design of Advanced Analytics on Big Data
The Modern design of Advanced Analytics on big data integrates structured, semi-structured and unstructured data from various data sources using Azure Data Factory and stores it in Azure storage, Azure Data Lake or Azure Blob Storage. This is a common data ingest process like other data warehouse design patterns. In the next part of the design, once data is stored in Azure storage, Azure Databricks can be leveraged to clean and transform the unstructured data and combine the data with other available structured data from an operational database or data warehouse. Databricks gives flexibility to build and deploy machine learning models on-premises or in the cloud. The Azure Databricks capabilities can be explored more to derive deeper insight from data using Python, R or Scala with inbuilt notebook experiences in Azure Databricks. Now, data can be used various ways from here:
- Data can be moved to Azure SQL Data Warehouse to access using Azure native connectors
- Power BI users can take advantage of Databricks to perform root cause determination and raw data analysis
- Business users can execute ad hoc queries directly on data within Azure Databricks
- Data can be moved from Databricks to Azure Cosmos DB to make it accessible to web and mobile applications for greater insight
Figure 1 Advanced Analytics on Big Data Design
Design of Real-time Analytics
The modern design of Real-time Analytics starts with two parts of the data ingest process; one is ingestion of live streaming data using the Apache Kafka cluster in Azure HDInsight and the second is get the all structured/semi-structured data to Azure Data Lake or Azure Blob Storage using Azure Data Factory. In the next part of the design, data can be prepared and trained using Azure Databricks. Azure Databricks can be leveraged to clean and transform the live streaming data and combine the data with structured data from an operational database or data warehouse. The inbuilt notebook experiences in Azure Databricks can be used to apply machine learning/deep learning techniques to derive deeper insights from this data. Afterwards, data can be accessed in various ways using other Azure components.
Data can be moved at any scale between Azure SQL Data Warehouse and Azure Databricks using native connectors. A set of analytical dashboards can be built, and reports can be embedded on top of Azure SQL Data Warehouse data to give insight to business users within the organization. Also, Azure Analysis Services can serve this data to other users.
Power BI users can use Azure Databricks and Azure HDInsight to perform raw data analysis and root cause determination. In this design, Azure Cosmos DB is also leveraged to take the insight from Azure Databricks to Azure Cosmos DB, to make data accessible through real time mobile and web applications.
Figure 2 Real-time Analytics Design
Components of Advanced Analytics on Big Data and Real-time Analytics
The complete design consists of various technology components. We have leveraged many Azure components in the two design patterns above. Some of the Azure components are commonly used and discussed in a previous article.
We have leveraged Azure Cosmos DB in Advanced Analytics on Big Data design to get a greater insight and make data accessible to web and mobile applications.
Azure Cosmos DB is a globally distributed, multi-model database service and provides native support for NoSQL choices. It offers turnkey global distribution across any number of Azure regions and elastically scales throughput and storage worldwide.
We have leveraged Azure HDInsight with Kafka cluster in Real-time Analytics design to ingest the live streaming data.
Azure HDInsight is a fully managed, full spectrum open-source analytics service for widespread open-source frameworks such as Hadoop, Kafka, Storm, Spark, R, Hive, etc.
Summary
The modern data warehouse designs support various types of business needs, including changes in data behavior, real time analytics on live streaming data and lambda architecture to serve multiple purposes with the source data. Also, selection of technology components is important to meet your business needs to build a flexible, high performing and scalable solution. Microsoft Azure provides a full set of technology components to build the hub for all types of data—structured, unstructured, or streaming—to develop transformative solutions like Business Intelligence and reporting, advanced analytics on big data, and real-time analytics.