Python for Data Engineers: From Syntax to Solutions

3 days

UPDE

3 days

Upcoming Sessions

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Book now

Show fewer Show more

Interested in a private company training? Request it here.

Not ready to book yet? Request an offer here.

Getting Started with Python

Python is a high-level, interpreted, interactive and object-oriented scripting language. This chapter introduces the history of Python and how to install Python and run your first lines of Python Code. There are quite some editors available for writing Python code but this course focusses on using Visual Studio Code as a code editor for Python. We'll also cover modern Python tooling including uv, a fast Python package installer and project manager.

Introduction to Python
Installing Python
Executing Python Code from the Command Shell
Python and Visual Studio Code
Working with packages in Python
Working with Virtual Environments in Python
Modern Python tooling with uv
Interactive development in Jupyter notebooks
LAB: Installing Python and executing code

Basic Language Constructs in Python

To build code that remains readable and maintainable it is important to be able to break up code in reusable components such as functions and classes.

Introduction to writing Python code
Declaring and Using Variables
Data Types in Python
Working with Lists, Tuples, Sequences and Dictionaries
Basic Programming Constructs in Python
Declaring and executing Functions
LAB: Writing basic Python code

Working with Classes and Objects

Python classes provide all the standard features of Object Oriented Programming. Classes can inherit from other base classes, have Constructors for the initialization of objects, and leverage modern Python features like dataclasses for simplified class creation with automatic method generation.

Introduction to Object-Oriented Programming
Defining and instantiating Classes in Python
Working with Constructors
Instance and Class Variables
Inheritance in Python
Working with Access Modifiers
Python dataclasses for simplified class definitions
LAB: Working with classes and objects

Using and Creating Modules

Modules in Python are reusable code libraries and Python ships with quite a large amount of build-in Modules. Learn how to create and import Modules.

Introduction to Modules
Importing Modules
Creating Modules
LAB: Using and creating Modules

Data Processing and Cleansing using Pandas

Pandas is a Python library which makes loading and transforming data a lot easier. As long as all your data fits in memory, Pandas is your friend.

What is Pandas
Introducing Pandas Data Structures
Reading data with Pandas
Indexing in a DataFrame
Creating and deleting columns
Filtering and Replacing data
Sorting and Ranking data
Grouping and aggregating data
Regular Expressions
LAB: Working with Pandas

From Python and Pandas to Apache Spark

With Pandas you typically run code on a single machine. This means that as your data volumes become bigger and bigger, you will be hitting memory and cpu constraints. PySpark is a Spark library written in Python to run Python applications using Apache Spark. Apache Spark is an analytical processing engine for large scale powerful distributed data processing and machine learning applications. In Azure it is available in Azure Synapse Analytics and Azure Databricks.

Introducing Apache Spark
The SparkSession, SparkContext and SQLContext objects
An introduction to Resilient Distributed Datasets (RDD)
Convert a Pandas DataFrame to/from a PySpark DataFrame
Working with Parquet files
Working with DataFrames in PySpark
Data Cleansing using PySpark
Grouping and aggregating data in PySpark
Joining DataFrames
Using SQL to select and manipulate data
LAB: Data manipulation in Apache Spark

Building a Lakehouse using Delta Lake

Parquet is a very popular data format in the Big Data community, since it can store large volumes of data in a compact and easy to query way. But it doesn't allow to modify your data. So, a variant has been developed by Databricks, called Delta. This module explains this Delta format and shows how to use Delta format in Spark.

What Is a Lakehouse?
Introduction to Delta Lake
Creating tables
Partitioning data in tables
Reading table data
Query older snapshots of a table (Time Travel)
Insert, Update, Delete and Merge table data
Retrieving table metadata
Altering table metadata
Configuring Change Data Feed
LAB: Modifying data using Delta Lake

Python plays a crucial role in data engineering, data science and AI development due to its versatility, extensive libraries such as Pandas and PySpark, and its ability to handle large-scale data processing, making it an indispensable tool for extracting insights and building data pipelines. In this course, participants will gain a solid understanding of Python.

They will acquire the necessary skills and knowledge to utilize Python effectively, from basic syntax to implementing real-world solutions. During the course participants will get hands-on experience with Pandas, PySpark, Delta Lake...

This course is targeted at data engineers, data scientists and AI developers with no or little experience with Python. Familiarity with programming in general might come in handy.

Developer and IT Training

Python for Data Engineers: From Syntax to Solutions

UPDE

3 days

Upcoming Sessions

Getting Started with Python

Basic Language Constructs in Python

Working with Classes and Objects

Using and Creating Modules

Data Processing and Cleansing using Pandas

From Python and Pandas to Apache Spark

Building a Lakehouse using Delta Lake

Contact Us

Say Hi