Call Us: +32 2 466 00 16
Email: info@u2u.be
Follow Us:

Data Engineering with Azure Synapse Analytics

3 days
uasams
3 days

Upcoming Sessions

Date: currently not scheduled

Format: Classroom

Price: 0€

Subscribe to waiting list

Date: currently not scheduled

Format: Classroom

Price: 0€

Subscribe to waiting list

Interested in a private company training? Request it here.

The modern data warehouse

The cloud requires to reconsider some of the choices made for on-premisses data handling. This module introduces the concept of a data lake and the data lakehouse. It also introduces the different services in Azure that can be used for data processing, and compares them to the traditional on-premisses data stack. Finally, it provides a brief intro in Azure and the use of the Azure portal.

  • From traditional to modern data warehouse
  • Comparing data warehouse with data lake
  • Lambda architecture
  • Overview of Big Data related Azure services
  • Getting started with the Azure Portal
  • LAB: Navigating the Azure Portal

Overview of Azure Synapse Analytics (ASA)

Synapse Analytics is the cornerstone service for the data engineer. It encompasses pipelines to copy data, Spark and SQL to transform and query data, Data Explorer for near realtime analysis and data exploration and Power BI for reporting. This module provides a brief introduction into this service.

  • The different components of Synapse Analytics
  • Provisioning a Synapse Analytics workspace
  • Github setup
  • Navigating Synapse Analytics Studio
  • LAB: Provision Synapse Analytics workspace

Azure Synapse Analytics Pipelines

When the data is stored on-premisses, you typically use ETL tools such as SQL Server Integration Services for loading and transforming data. But what if the data is stored in the Azure cloud? Then you can use pipelines in Azure Synapse Analytics. This service is nearly identical to Azure Data Factory, the cloud-based ETL service. First we need to get used to the terminology, then we can start creating the proper objects in the portal.

  • Introducing Azure Synapse Analytics Pipelines
  • Pipeline terminology
  • Creating Pipelines, Linked Services and Datasets
  • Copying data with the Copy Data wizard
  • LAB: Migrating data with Data Factory Wizard

Creating Data Flows in Synapse Analytics Pipelines

With Data flows data can be transformed without the need to learn about another tool (such as Databricks or Spark). Both Data flows as well as the Power Query activity are covered.

  • From ELT to ETL
  • Creating Data flows
  • Sizing the Spark cluster
  • Running and Profiling Data flows
  • Exploring the Power Query activity
  • LAB: Transforming data with a Data flow

Configuring Integration Runtimes

Pipelines need integration runtimes to control where the code executes. This module walks you through the 3 types of Integration Runtimes: Azure, SSIS and self-hosted runtimes.

  • Integration runtime overview
  • Controlling the Azure Integration Runtime
  • Setup self-hosted Integration Runtimes
  • Lift and shift SSIS packages in Data Factory

Deploying and monitoring pipelines

Once development has finished the pipelines need to be deployed and scheduled for execution. Monitoring the deployed pipelines for failure, errors or just performance is another crucial topic discussed in this module.

  • Adding triggers to pipelines
  • Deploying pipelines
  • Monitoring pipeline executions
  • Restart failed pipelines
  • LAB: Monitoring pipeline runs

Azure Synapse Analytics Serverless SQL

Once data has been loaded into the data lake, the next step is to cleanse the data, pre-aggregate the data and perform other steps to make the data accessible to reporting and analytical tools. Dependant on the transformations required and the skills of the data engineer, the SQL dialect common to the Microsoft data stack (T-SQL) could play an important role. This module first introduces the scenarios where the move from an Azure SQL Database into Synapse databases could be useful, introduces briefly the two different types of SQL databases, and then focusses more deeply on the Synapse Analytics Serverless databases.

  • When Azure SQL Databases reach their limits
  • Provisioned versus Serverless Synapse Analytics databases
  • Creating and accessing Serverless databases
  • Using OPENROWSET for data access
  • Creating external tables
  • LAB: Querying data via Azure Synapse Analytics Serverless databases

Azure Synapse Analytics Provisioned SQL Pools

Since Serverless SQL Pools don't store data in a proprietary format, they lack features such as indexes, update statements etc. This is where Provisioned SQL Pools in Azure Synapse Analytics (formerly known as Azure Data Warehouse) can come to the rescue.

  • Architecture of Provisioned SQL Pools
  • Loading data via PolyBase
  • CTAS and CETAS
  • Setting up table distributions
  • Indexing
  • Partitioning
  • Performance monitoring and tuning
  • LAB: Loading and querying data in Provisioned SQL Pools

Getting started with Spark

Although SQL is a very powerful language to access and manipulate data, it has its limitations. Complex data wrangling, advanced statistics or machine learning are ill-suited tasks for SQL. For this purpose Spark is better suited. It's a divide-and-conquer framework for data access, transformation and querying which relies on programming languages such as Scala and Python. Spark can be used in Synapse Analytics as well as in Azure Databricks, a popular service which integrates with Synapse Analytics pipelines as well.

  • Introduction Spark framework
  • Spark Cluster setup
  • Getting started with Notebooks
  • LAB: Getting started with Spark cluster and notebooks

Accessing data in Synapse Analytics Spark

Spark doesn't have a proprietary data storage option, but consumes and produces regular files stored in Azure Storage. This module covers how to access and manipulate data stored in the Synapse Analytics data lake or other Azure storage locations.

  • Uploading data
  • Connecting to Azure Storage and Data Lake Gen 2
  • Dealing with malformed data
  • Processing Spark Dataframes in Python
  • Using Spark SQL
  • Working with Delta Lake
  • LAB: Processing data on a Spark cluster

Deploying an Azure Databricks or Spark solution

Once the Databricks solution has been tested it need to be scheduled for execution. This can be done either with jobs in Azure Databricks or via Synapse Analytics Pipelines. In the latter case you need to be able to pass on variables. Azure Databricks widgets will make this possible.

  • Azure Databricks jobs
  • Working with Databricks Widgets
  • Calling Databricks Notebooks from within Azure Data Factory pipelines
  • LAB: Widgets in Azure DataBricks

Handling large volumes of data requires different skills: One must master storage options, tools to upload data performant, handling failed uploads, and convert data in a format appropriate for reporting and analysis. In the Microsoft Azure stack, Synapse Analytics is the cornerstone service for the data engineer. It encompasses pipelines to copy data, Spark and SQL to transform and query data, Data Explorer for near realtime analysis and data exploration, and Power BI for reporting.

This training teaches how to use Synapse Analytics to design, build and maintain a modern data lake architecture. The training also includes a few other Azure services which come in handy when working with Synapse Analytics, such as Azure Data Vault for handling authentication, Azure SQL Database for dealing with smaller datasets and Azure Databricks as an improved Spark engine.

This course focusses on developers and administrators who are considering migrating existing data solutions to the Microsoft Azure cloud, or start designing new data oriented solutions in the Azure cloud. Some familiarity with relational database systems such as SQL Server is handy. Prior knowledge of Azure is not required.

Contact Us
  • Address:
    U2U nv/sa
    Z.1. Researchpark 110
    1731 Zellik (Brussels)
    BELGIUM
  • Phone: +32 2 466 00 16
  • Email: info@u2u.be
  • Monday - Friday: 9:00 - 17:00
    Saturday - Sunday: Closed
Say Hi
© 2022 U2U All rights reserved.