Data Engineering with Azure Databricks

5 days

UADB

5 days

Upcoming Sessions

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Book now

Show fewer Show more

Interested in a private company training? Request it here.

Not ready to book yet? Request an offer here.

Getting Started with Azure Databricks

Azure Databricks allows us to use the power of Apache Spark without the configuration hassle of manually creating and configuring Apache Spark clusters. In this chapter you will learn how to setup an Azure Databricks environment and work with Databricks workspaces.

What is Azure Databricks
Introducing Apache Spark
Workspaces in Azure Databricks
Provision Azure Databricks Workspaces
Navigating Workspaces
Azure Databricks Configuration and Security
Azure Databricks Pricing
LAB: Getting started with Azure Databricks

Azure Storage and Data Lakes

Databricks does not come with it's own cloud object storage. When you are using Databricks on the Azure platform, it stores it's data and metadata in one ore more Azure Data Lake Gen2 storage accounts.

Storing Data in Azure Databricks
An introduction to Azure Storage
Accessing an Azure Storage Account
Storing Data in a Data Lake
The Medallion Architecture
Storage Formats in Data Lakes
Delta Lake
Other Open Table Formats
LAB: Provision an Azure Storage Account

Introduction to the Unity Catalog

Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces. In this chapter you will learn how to setup and configure a Unity Catalog metastore for your workspaces

Introduction to the Unity Catalog
Create a Unity Catalog Metastore
Creating Unity Catalog Artifacts
Working with Schemas, Tables and Volumes
LAB: Setup and configure a Unity Catalog Metastore

Configure Databricks Compute

Databricks compute refers to the selection of computing resources available in the Databricks workspace. Users need access to compute to run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning. Learn about the different types of Compute that can be provisioned in Azure Databricks.

Apache Spark
The Databricks Runtime
Databricks Compute Types
Provisioned Compute Types
Databricks Serverless Compute
Attaching Notebooks to Compute
Usage Monitoring
LAB: Creating and Using Databricks Compute

Using Notebooks in Azure Databricks

Using popular languages such as Python, SQL and R data can be loaded, visualized, transformed and analyzed via interactive notebooks.

The Databricks File System (DBFS)
Working with Notebooks in Databricks
Magic Commands
Databricks Utilities
The Databricks Assistant
Working with IPython Widgets
Working with Databricks Widgets
Notebook Dashboards
Scheduling Notebooks
LAB: Using Notebooks in Azure Databricks

Accessing data in Azure Databricks

There are many ways to access data in Azure Databricks. From uploading small files via the portal over ad-hoc connections up to mounting Azure Storage or data lakes. The files can also be treated as a table, providing easy access.

The Spark Framework
Introduction to Spark DataFrames
Reading and writing data using Spark DataFrames
Mounting Azure Blob and Data Lake Gen2 Storage
Cleaning and Transforming data using the Spark DataFrame API
Schemas and Tables in Databricks
Managed vs Unmanaged Tables
Tables in the Unity Catalog
LAB: Working with Data in Azure Databricks

Building a Lakehouse using Azure Databricks

Delta Lake is an optimized storage layer that provides the foundation for storing data and tables in a Databricks lakehouse. Learn how to create, query and optimize Delta Tables in a Databricks lakehouse

Implementing a Delta Lake
Working with Delta Tables
Managing Schema change
Version and Optimize Delta Tables
Data skipping and Z-order
Delta Tables and Change Data Feeds
Delta Tables and the Unity Catalog
Securing Tables in the Unity Catalog
LAB: Building a Lakehouse using Delta Tables

Data Warehousing and Analysis with Databricks SQL

The lakehouse architecture and Databricks SQL Warehouse bring cloud data warehousing capabilities to your data lakes. A SQL warehouse is a compute resource that lets you run SQL commands on objects within Databricks SQL. Learn about the available warehouse types and how to query them.

What are SQL Warehouses?
SQL Warehouse Compute Types
Writing queries using the SQL Editor
Working with Query Parameters
Add Visualizations to a Query
Creating and using AI/BI Dashboards
Using Databricks SQL Alerts
Monitoring Databricks SQL
LAB: Using SQL Warehouses

Introducing Databricks Lakeflow

Databricks Lakeflow is a new solution that contains everything you need to build and operate production data pipelines. It includes native, highly scalable connectors for databases like SQL Server and for enterprise applications like Salesforce and SharePoint. Users can transform data in batch and streaming using standard SQL and Python using Declarative ETL pipelines.

Introducing Databricks Lakeflow
Procedural versus Declarative pipelines
Overview of Lakeflow Connect
Overview of Lakeflow Declarative Pipelines
Overview of Lakeflow Jobs

Lakeflow Connect

Lakeflow Connect offers simple and efficient connectors to ingest data from local files, popular enterprise applications, databases, cloud storage, message buses, and more. Learn how to efficiently ingest data using managed connectors from database systems like SQL Server and with unmanaged connectors from cloud storage systems or event systemsInd using Auto Loader.

The Data Ingestion Landscape in Databricks
What are Managed Connectors
Ingesting SQL Server Data
What are Standard Connectors
Ingesting data from Cloud Object Storage using Auto Loader
Lakeflow Connect Billing
LAB: Ingesting data using Lakeflow Connect

Lakeflow Jobs

Lakeflow Jobs is workflow automation for Databricks, providing orchestration for data processing workloads so that you can coordinate and run multiple tasks as part of a larger workflow. You can optimize and schedule the execution of frequent, repeatable tasks and manage complex workflows.

What are Lakeflow Jobs?
Creating Lakeflow Jobs
Overview of Job Tasks
Defining Precedence Constraints between Tasks
Working with Task Parameters and Values
Conditions and Loops in Lakeflow Jobs
Working with Job Triggers
Monitoring Lakeflow Jobs
LAB: Creating and Running Lakeflow Jobs

Databricks and Power BI

Microsoft Power BI is a business analytics tool that provides interactive visualizations with self-service business intelligence capabilities, enabling end users to create reports and dashboards. You can connect Power BI Desktop to your Databricks clusters and Databricks SQL warehouses

Power BI Introduction
Connect Power BI Desktop to Databricks using Partner Connect
Connect Power BI Desktop to Databricks manually
LAB: Connection Power BI to Databricks

Databricks is a data analytics platform powered by Apache Spark for data engineering, data science, and machine learning. This training teaches how to use Azure Databricks to design and build a data lakehouse architecture.

No prior knowledge of Azure Databricks is required.

Developer and IT Training

Data Engineering with Azure Databricks

UADB

5 days

Upcoming Sessions

Getting Started with Azure Databricks

Azure Storage and Data Lakes

Introduction to the Unity Catalog

Configure Databricks Compute

Using Notebooks in Azure Databricks

Accessing data in Azure Databricks

Building a Lakehouse using Azure Databricks

Data Warehousing and Analysis with Databricks SQL

Introducing Databricks Lakeflow

Lakeflow Connect

Lakeflow Jobs

Databricks and Power BI

Contact Us

Say Hi