Call Us: +32 2 466 00 16
Email: info@u2u.be
Follow Us:

Microsoft Azure Big Data for Data Engineers

3days
Training code
UADE
Book this course

Preparing enterprise data for analysis and reporting requires multiple steps: uploading the data to a central staging location, convert the data into the proper format, cleansing and if needed pre-aggregating the data and prepare it in a tabular format such that analysts and report developers can get started with this.

In this course you will see the services in Microsoft Azure which allow you to run these processes on large scale data in the cloud.

The modern data warehouse

The cloud requires to reconsider some of the choices made for on-premisses data handling. This module introduces the different services in Azure that can be used for data processing, and compares them to the traditional on-premisses data stack.

  • From traditional to modern data warehouse
  • Lambda architecture
  • Overview of Big Data related Azure services
  • Getting started with Azure

Staging data in Azure

This module discusses the different types of storage available in Azure Storage as well as data lake storage. Also some of the tools to load and manage files in Azure storage and Data lake storage are covered.

  • Introduction Azure Blob Storage
  • Compare Azure Data Lake Storage Gen 2 with traditional blob storage
  • Tools for uploading data
  • Storage Explorer, AZCopy, ADLCopy, PolyBase

Using Azure Data Factory for ETL

When the data is stored and analysed on on-premisses we typically use ETL tools such as SQL Server Integration Services for this. But what if the data is stored in the cloud? Then we need Azure Data Factory, the cloud-based ETL service. First we need to get used to the terminology, then we can start creating the proper objects in the portal.

  • Data Factory V2 terminology
  • The Data Factory wizard
  • Developing Data Factory pipelines in the browser
  • Creating Data Factory Data flows
  • Setup of Integration Runtimes
  • Debugging, scheduling and monitoring DF pipelines

Azure Data Warehouse

Azure SQL Databases have their limitations in compute power since they run on a single machine, and their size is limited to the Terabyte range. Azure Data Warehouse is a service aiming at an analytical workload on data volumes hundreds of times larger than what Azure SQL databases can handle. Yet at the same time we can keep on using the familiar T-SQL query language, or we can connect traditional applications such as Excel and Management Studio to interact with this service. Both storage and compute can be scaled independently.

  • Architecture of Azure Data Warehouse
  • Loading data via PolyBase
  • CTAS and CETAS
  • Setting up table distributions
  • Indexing
  • Partitioning
  • Performance monitoring and tuning

Advanced data processing with Databricks

Azure Databricks allows us to use the power of Spark without the configuration hassle of Hadoop clusters. Using popular languages such as Python, SQL and R data can be loaded, visualized, transformed and analyzed via interactive notebooks.

  • Introduction Azure Databricks
  • Cluster setup
  • Databricks Notebooks
  • Connecting to Azure Storage and Data Warehouse
  • Processing Spark Dataframes in Python
  • Using Spark SQL
  • Scheduling Databricks jobs

Modeling data with Azure Analysis Services

Analysis Services is Microsoft's OLAP (cube) technology. The latest version, Analysis Services Tabular, can also run as a database-as-a-service. This is ideal to load the cleaned, pre-processed data produced by other Azure services and cache it. This leads to faster reporting. But the data can also be enriched with KPIs, translations, derived measures etc. In this module we take a brief look at how an Analysis Services model can be created and deployed to the cloud, but for a more in-depth discussion we refer to the Analysis Services Tabular training.

  • Online Analytical Processing
  • Analysis Services Tabular
  • Creating a model on top of Azure Storage or Azure Data Warehouse
  • Model deployment
  • Processing
  • Model management

Data lives at different locations, both on-premisses as well as in the cloud. This training focusses of how to upload, transform and manage large volumes of data in the Azure cloud. This can be a preprocessing step to allow business intelligence people to build reports and analysis on top of this data. Or it can be a first step towards data science. Notice this training does not focus on data science, if interested check out our Azure Data Science training.

This course focusses on developers, BI developers and project managers who are considering migrating existing data solutions to the Microsoft Azure cloud. Some familiarity with relational database systems such as SQL Server is handy. Prior knowledge of Azure is not required.

© 2019 U2U All rights reserved.