Call Us: +32 2 466 00 16
Follow Us:

Data Engineering and Citizen Data Science with Microsoft Azure

5 days
5 days

Upcoming Sessions





Book now





Book now

Interested in a private company training? Request it here.

Part 1: Data Engineering

The modern data warehouse

The cloud requires to reconsider some of the choices made for on-premises data handling. This module introduces the different services in Azure that can be used for data processing, and compares them to the traditional on-premises data stack.

  • From traditional to modern data warehouse
  • Lambda architecture
  • Overview of Big Data related Azure services
  • Getting started with Azure

Staging data in Azure

This module discusses the different types of storage available in Azure Storage as well as data lake storage. Also some of the tools to load and manage files in Azure storage and Data lake storage are covered.

  • Introduction to Azure Blob Storage
  • Compare Azure Data Lake Storage Gen 2 with traditional blob storage
  • Tools for uploading data
  • Storage Explorer, AZCopy, ADLCopy, PolyBase

Using Azure Data Factory for ETL

When the data is stored and analysed on-premises we typically use ETL tools such as SQL Server Integration Services. But what if the data is stored in the cloud? Then we need Azure Data Factory, the cloud-based ETL service. First we need to get used to the terminology, then we can start creating the proper objects in the portal.

  • Data Factory V2 terminology
  • The Data Factory wizard
  • Developing Data Factory pipelines in the browser
  • Creating Data Factory Data flows
  • Setup of Integration Runtimes
  • Debugging, scheduling and monitoring DF pipelines

Azure Synapse Analytics

SQL Databases have their limitations in compute power since they run on a single machine, and their size is limited to the Terabyte range. Azure Synapse Analytics (previously Azure Data Warehouse) is a service aiming at an analytical workload on data volumes hundreds of times larger than what Azure SQL databases can handle. Yet at the same time we can keep on using the familiar T-SQL query language, or we can connect traditional applications such as Excel and Management Studio to interact with this service. Both storage and compute can be scaled independently.

  • Architecture of Azure Synapse Analytics
  • Loading data via PolyBase
  • CTAS and CETAS
  • Setting up table distributions
  • Indexing
  • Partitioning
  • Performance monitoring and tuning

Advanced data processing with Databricks

Azure Databricks allows us to use the power of Spark without the configuration hassle of Hadoop clusters. Using popular languages such as Python, SQL and R data can be loaded, visualized, transformed and analyzed via interactive notebooks.

  • Introduction Azure Databricks
  • Cluster setup
  • Databricks Notebooks and widgets
  • Connecting to Azure Storage and Data Warehouse
  • Processing Spark Dataframes in Python
  • Using Spark SQL
  • Scheduling Databricks jobs

Modeling data with Azure Analysis Services

Analysis Services is Microsoft's OLAP (cube) technology. The latest version, Analysis Services Tabular, can also run as a database-as-a-service. This is ideal to load the cleaned, pre-processed data produced by other Azure services and cache it. This leads to faster reporting. But the data can also be enriched with KPIs, translations, derived measures etc. In this module we take a brief look at how an Analysis Services model can be created and deployed to the cloud, but for a more in-depth discussion we refer to the Analysis Services Tabular training .

  • Online Analytical Processing
  • Azure Analysis Services Tabular
  • Creating an Analysis Services model
  • Model deployment
  • Processing
  • Model management

Part 2: Citizen Data Science

Introduction to Machine Learning

This classroom training does not require people to be familiar with Machine Learning. This introductory module makes sure all participants have a common ground for diving into the rest of the training by discussing the basic concepts of Machine Learning.

  • What is Machine Learning?
  • What questions can Machine Learning answer?
  • Machine Learning Methodology
  • Data preparation
  • Data Modeling
  • Model evaluation

Tools for Machine Learning in Azure

In this introductory chapter we will introduce the different tools that are available for (citizen) data scientists to do Machine Learning in Microsoft Azure.

  • Overview of Machine Learning in Azure
  • Machine Learning with pretrained models
  • Using Transfer Learning
  • Graphical Approaches to Machine Learning
  • Machine Learning using Coding Approaches

Azure Cognitive Services

Business Intelligence for many years focused on turning data stored in structured, relational databases into insights or actionable information. There is however plenty of useful data that less easy to access such as plain text, images, phone recordings, ... . Cognitive services provides web services hosted in Microsoft Azure to convert these sources into an easier to analyze format (mostly json documents). In this chapter we will give an overview of the different cognitive services. Some of these are ready-made, whearas others are customizable.

  • Overview of Cognitive Services
  • Pre-trained Services
  • Customizable Services
  • Getting Started with LUIS

Azure Machine Learning: Automated ML

Azure Machine Learning is a service that helps to bring Machine Learning to the enterprise level. This service contains tools for data scientists, as well as data citizens. One of the tools that may be especially useful for citizen data scientists is Automated ML, where Machine Learning is done in an automated way, with little time investment, programming skills or domain knowledge needed.

  • Introduction to Azure Machine Learning
  • Architecture of Azure Machine Learning
  • Important concepts in Azure Machine Learning
  • What is Automated Machine Learning?
  • Building Automated ML models
  • Deploying and consuming Automated ML Models

Azure Machine Learning Service: Designer

A second service available in Azure Machine Learning is the Designer. This allows you to visually connect modules to create Machine Learning pipelines using a drag-n-drop approach. A module is an algorithm that you perform on your data, such as a data transformation, training an algorithm, scoring new data, and validating a model.

  • What is the Designer?
  • Modules for Loading data
  • Preprocessing data
  • Training Machine Learning Models
  • Testing Machine Learning Models
  • Deploying models

AI features in Power BI

Power BI is a very popular tool for visualizing data. Lately, more and more features have been added, that allow for some more advanced data analysis. Amongst others the Cognitive services and deployed Azure Machine Learning models can be consumed in Power BI Dataflows and Power Query.

  • Introduction to Power BI
  • Using Cognitive Services or Deployed Azure ML models in Power Query
  • Using Cognitive Services or Deployed Azure ML models in Power BI Dataflows
  • More Machine Learning options in Power BI

Microsoft's Big Data solution is a collection of scalable Azure services to load, store and analyze data in the cloud. Although each of these services can be used independently, they will often be used together to process data in the cloud. This course consists of 2 parts that can also be followed independently: 3 days of Data Engineering (Microsoft Azure Big Data for Data Engineers) and 2 days Machine Learning (Machine Learning for the Citizen Data Scientist).

In the data engineering part the focus is on preparing data for reporting and analysis: How can data be loaded from on-premises into the cloud, what are the different storage options in the cloud and how can data be transformed to simplify reporting and analysis.

In the Machine Learning part, we will first discuss the basic concepts of Machine Learning. After that, we will go over the available no-code Microsoft Azure tools, and see how we can use them to create and deploy Machine Learning Models.

This course focuses on developers, administrators and project managers who are developing new data centric applications in the Microsoft Azure cloud. Some familiarity with relational database systems such as SQL Server is handy. No prior knowledge of Azure or Machine Learning is required.

© 2020 U2U All rights reserved.