Part 1: Data Engineering
The modern data warehouse
The cloud requires to reconsider some of the choices made for on-premises data handling. This module
introduces the different services in Azure that can be used for data processing, and compares them to the
traditional on-premises data stack.
- From traditional to modern data warehouse
- Lambda architecture
- Overview of Big Data related Azure services
- Getting started with Azure
Staging data in Azure
This module discusses the different types of storage available in Azure Storage as well as data
lake storage. Also some of the tools to load and manage files in Azure storage and Data lake storage are
- Introduction to Azure Blob Storage
- Compare Azure Data Lake Storage Gen 2 with traditional blob storage
- Tools for uploading data
- Storage Explorer, AZCopy, ADLCopy, PolyBase
Using Azure Data Factory for ETL
When the data is stored and analysed on-premises we typically use ETL tools such as SQL Server Integration Services. But what if the data
is stored in the cloud? Then we need Azure Data Factory, the cloud-based ETL service. First we need to get used to
the terminology, then we can start creating the proper objects in the portal.
- Data Factory V2 terminology
- The Data Factory wizard
- Developing Data Factory pipelines in the browser
- Creating Data Factory Data flows
- Setup of Integration Runtimes
- Debugging, scheduling and monitoring DF pipelines
Azure Synapse Analytics
SQL Databases have their limitations in compute power since they run on a single machine, and their size
is limited to the Terabyte range. Azure Synapse Analytics (previously Azure Data Warehouse) is a service aiming at an analytical workload on data
volumes hundreds of times larger than what Azure SQL databases can handle. Yet at the same time we can keep on
using the familiar T-SQL query language, or we can connect traditional applications such as Excel and
Management Studio to interact with this service. Both storage and compute can be scaled independently.
- Architecture of Azure Synapse Analytics
- Loading data via PolyBase
- CTAS and CETAS
- Setting up table distributions
- Performance monitoring and tuning
Advanced data processing with Databricks
Azure Databricks allows us to use the power of Spark without the configuration hassle of Hadoop clusters. Using
popular languages such as Python, SQL and R data can be loaded, visualized, transformed and analyzed via
- Introduction Azure Databricks
- Cluster setup
- Databricks Notebooks and widgets
- Connecting to Azure Storage and Data Warehouse
- Processing Spark Dataframes in Python
- Using Spark SQL
- Scheduling Databricks jobs
Modeling data with Azure Analysis Services
Analysis Services is Microsoft's OLAP (cube) technology. The latest version, Analysis Services Tabular, can
also run as a database-as-a-service. This is ideal to load the cleaned, pre-processed data produced by other
Azure services and cache it. This leads to faster reporting. But the data can also be enriched
with KPIs, translations, derived measures etc. In this module we take a brief look at how an Analysis Services
model can be created and deployed to the cloud, but for a more in-depth discussion we refer to the
- Online Analytical Processing
- Azure Analysis Services Tabular
- Creating an Analysis Services model
- Model deployment
- Model management
Part 2: Machine Learning
Introduction to Machine Learning
This classroom training does not require people to be familiar with Machine Learning. This introductory module makes sure
all participants have a common ground for diving into the rest of the training by discussing the basic concepts of Machine
- What is Machine Learning?
- What questions can Machine Learning answer?
- Machine Learning Methodology
- Data preparation
- Data Modeling
- Model evaluation
Tools for Machine Learning in Azure
In this introductory chapter we will introduce the different tools that are available for (citizen) data scientists to do Machine Learning in Microsoft Azure.
- Overview of Machine Learning in Azure
- Machine Learning with pretrained models
- Using Transfer Learning
- Graphical Approaches to Machine Learning
- Machine Learning using Coding Approaches
Azure Cognitive Services
Business Intelligence for many years focused on turning data stored in structured, relational databases into insights or actionable information.
There is however plenty of useful data that less easy to access such as plain text, images, phone recordings, ... . Cognitive services provides web services
hosted in Microsoft Azure to convert these sources into an easier to analyze format (mostly json documents). In this chapter we will give an overview of the different
cognitive services. Some of these are ready-made, whearas others are customizable.
- Overview of Cognitive Services
- Pre-trained Services
- Customizable Services
- Getting Started with LUIS
Azure Machine Learning: Automated ML
Azure Machine Learning is a service that helps to bring Machine Learning to the enterprise level. This service contains tools
for data scientists, as well as data citizens. One of the tools that may be especially useful for citizen data scientists is
Automated ML, where Machine Learning is done in an automated way, with little time investment, programming skills or
domain knowledge needed.
- Introduction to Azure Machine Learning
- Architecture of Azure Machine Learning
- Important concepts in Azure Machine Learning
- What is Automated Machine Learning?
- Building Automated ML models
- Deploying and consuming Automated ML Models
Azure Machine Learning Service: Designer
A second service available in Azure Machine Learning is the Designer. This allows you to visually connect modules
to create Machine Learning pipelines using a drag-n-drop approach. A module is an algorithm that you perform on your data,
such as a data transformation, training an algorithm, scoring new data, and validating a model.
- What is the Designer?
- Modules for Loading data
- Preprocessing data
- Training Machine Learning Models
- Testing Machine Learning Models
- Deploying models
AI features in Power BI
Power BI is a very popular tool for visualizing data. Lately, more and more features have been added, that allow for some
more advanced data analysis. Amongst others the Cognitive services and deployed Azure Machine Learning models can be consumed in Power BI Dataflows and Power Query.
- Introduction to Power BI
- Using Cognitive Services or Deployed Azure ML models in Power Query
- Using Cognitive Services or Deployed Azure ML models in Power BI Dataflows
- More Machine Learning options in Power BI
Microsoft's Big Data solution is a collection of scalable Azure services to load, store and analyze
data in the cloud. Although each of these services can be used independently,
they will often be used together to process data in the cloud. This course consists of 2 parts that can also be
followed independently: 3 days of Data
Engineering (Microsoft Azure Big Data for Data Engineers) and 2 days
Machine Learning (Machine Learning for the Citizen Data Scientist).
In the data engineering part the focus is on preparing data for reporting and analysis: How can data be loaded
from on-premises into the cloud, what are the different storage options in the cloud and how can data be
transformed to simplify reporting and analysis.
In the Machine Learning part, we will first discuss the basic concepts of Machine Learning. After that,
we will go over the available no-code Microsoft Azure tools, and see how we can use them to create and deploy Machine Learning Models.