Interested in a private company training? Request it here.
The cloud requires to reconsider some of the choices made for on-premisses data handling. This module introduces the different services in Azure that can be used for data processing, and compares them to the traditional on-premisses data stack. It also provides a brief intro in Azure and the use of the Azure portal.
This module discusses the different types of storage available in Azure Storage as well as data lake storage. Also some of the tools to load and manage files in Azure storage and Data lake storage are covered.
When the data is stored and analysed on on-premisses you typically use ETL tools such as SQL Server Integration Services for this. But what if the data is stored in the Azure cloud? Then you can use Azure Data Factory, the cloud-based ETL service. First we need to get used to the terminology, then we can start creating the proper objects in the portal.
This module dives into the process of building a Data Factory pipeline from scratch. The most common activities are illustrated. The module also focusses on how to work with variables and parameters to make the pipelines more dynamic.
With Data flows data can be transformed without the need to learn about another tool (such as Databricks or Spark). Both Data flows as well as Wrangling Data Flows are covered.
Data Factory needs integration runtimes to control where the code executes. This module walks you through the 3 types of Integration Runtimes: Azure, SSIS and self-hosted runtimes.
Once development has finished the pipelines need to be deployed and scheduled for execution. Monitoring the deployed pipelines for failure, errors or just performance is another crucial topic discussed in this module.
An easy way to create a business intelligence solution in the cloud is by taking SQL Server -- familiar to many Microsoft BI developers -- and run it in the cloud. Backup and high availability happen automatically, and we can use nearly all the skills and tools we used on a local SQL Server on this cloud based solution as well.
Azure Synapse Analytics is a suite of services aiming at loading, storing and querying large volumes of data. It allows both Spark as well as SQL users interacting with the data.
Azure SQL Databases have their limitations in compute power since they run on a single machine, and their size is limited to the Terabyte range. Provisioned SQL Pools in Azure Synapse Analytics (formerly known as Azure Data Warehouse) is a service aiming at an analytical workload on data volumes hundreds of times larger than what Azure SQL databases can handle. Yet at the same time we can keep on using the familiar T-SQL query language, or we can connect traditional applications such as Excel and Management Studio to interact with this service. Both storage and compute can be scaled independently.
Azure Databricks allows us to use the power of Spark without the configuration hassle of Hadoop clusters. Using popular languages such as Python, SQL and R data can be loaded, visualized, transformed and analyzed via interactive notebooks.
There are many ways to access data in Azure Databricks: From uploading small files via the portal over ad-hoc connections up to mounting Azure Blob storage or data lakes. The files can also be treated as a table, providing easy access. Another point of attention in this module is dealing with malformed input data.
Once the Databricks solution has been tested it need to be scheduled for execution. This can be done either with jobs in Azure Databricks or via a Data Factory. In the latter case you need to be able to pass on variables from Data Factory into Databricks. Azure databricks widgets will make this possible.
Analysis Services is Microsoft's OLAP (cube) technology. The latest version, Analysis Services Tabular, can also run as a database-as-a-service. This is ideal to load the cleaned, pre-processed data produced by other Azure services and cache it. This leads to faster reporting. But the data can also be enriched with KPIs, translations, derived measures etc.
Processing real-time events is the main goal of Azure Stream Analytics. In this module events are received from an Event hub input, processed by a SQL query and send into a destination.
A modern data warehouse lets you bring together all your data at any scale easily. It offers insights through analytical dashboards, operational reports and advanced analytics. Microsoft Azure offers a broad range of services like Azure Data Factory, Azure Data Lake, Azure Databricks and Azure Synapse Analytics helping you build your data warehouse in the cloud. This training will cover all aspects of designing and implementing a data warehouse on Microsoft Azure. Participants will leave the training with hands-on experience with all Microsoft Azure services to explore, prepare, manage and serve data for immediate BI or machine-learning needs.
This course focusses on developers and administrators who are considering migrating existing data solutions to the Microsoft Azure cloud. Some familiarity with relational database systems such as SQL Server is handy. Prior knowledge of Azure is not required.