Python for Data and AI Engineers

4 days
UPD
4 days

Upcoming Sessions

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Location:

Book now

Date:

Format:

Price:

Book now

Interested in a private company training? Request it here.

Not ready to book yet? Request an offer here.

Getting Started with Python

Python is a high-level, interpreted, interactive and object-oriented scripting language. This chapter introduces the history of Python and how to install Python and run your first lines of Python Code. There are quite some editors available for writing Python code but this course focusses on using Visual Studio Code as a code editor for Python. We'll also cover modern Python tooling including uv, a fast Python package installer and project manager.

  • Introduction to Python
  • Installing Python
  • Executing Python Code from the Command Shell
  • Python and Visual Studio Code
  • Working with packages in Python
  • Working with Virtual Environments in Python
  • Modern Python tooling with uv
  • Interactive development in Jupyter notebooks
  • LAB: Installing Python and executing code

Basic Language Constructs in Python

To build code that remains readable and maintainable it is important to be able to break up code in reusable components such as functions and classes.

  • Introduction to writing Python code
  • Declaring and Using Variables
  • Data Types in Python
  • Working with Lists, Tuples, Sequences and Dictionaries
  • Basic Programming Constructs in Python
  • Declaring and executing Functions
  • LAB: Writing basic Python code

Working with Classes and Objects

Python classes provide all the standard features of Object Oriented Programming. Classes can inherit from other base classes, have Constructors for the initialization of objects, and leverage modern Python features like dataclasses for simplified class creation with automatic method generation.

  • Introduction to Object-Oriented Programming
  • Defining and instantiating Classes in Python
  • Working with Constructors
  • Instance and Class Variables
  • Inheritance in Python
  • Working with Access Modifiers
  • Python dataclasses for simplified class definitions
  • LAB: Working with classes and objects

Object-Oriented Programming in Python

Building on the fundamentals of classes and objects, this chapter explores sophisticated OOP concepts and design patterns commonly used in enterprise Python development. Topics include abstract base classes, metaclasses, decorators that are particularly relevant for data engineering frameworks and app development.

  • Abstract base classes
  • Multiple inheritance
  • Class and static methods advanced usage
  • Metaclasses and class creation customization
  • Context managers and the with statement
  • LAB: Building extensible data processing frameworks

Using and Creating Modules

Modules in Python are reusable code libraries and Python ships with quite a large amount of build-in Modules. Learn how to create and import Modules.

  • Introduction to Modules
  • Importing Modules
  • Creating Modules
  • LAB: Using and creating Modules

Model Validation with Pydantic

Pydantic is a powerful Python library that uses Python type annotations to validate data and settings management. It provides runtime type checking and automatic data conversion, making it essential for building robust data pipelines and APIs. This chapter covers how to define data models, validate complex data structures, and handle validation errors effectively.

  • Introduction to Pydantic and data validation
  • Defining BaseModel classes and field types
  • Working with built-in validators and custom validation
  • Configuration and settings management
  • Validation error handling and custom error messages
  • LAB: Implementing data validation pipelines in a web API

Unit Testing in Python

Testing is a critical aspect of software development that ensures code reliability and maintainability. Python provides excellent testing frameworks, with pytest being the most popular choice for its simplicity and powerful features. This chapter covers writing effective unit tests, mocking dependencies, and implementing test-driven development practices for data engineering and app development.

  • Introduction to unit testing concepts
  • Getting started with pytest framework
  • Writing test functions and organizing test files
  • Mocking external dependencies and APIs
  • Code coverage analysis and reporting
  • LAB: Implementing unit tests for Python applications

Data Processing and Cleansing using Pandas

Pandas is a Python library which makes loading and transforming data a lot easier. As long as all your data fits in memory, Pandas is your friend.

  • What is Pandas
  • Introducing Pandas Data Structures
  • Reading data with Pandas
  • Indexing in a DataFrame
  • Creating and deleting columns
  • Filtering and Replacing data
  • Sorting and Ranking data
  • Grouping and aggregating data
  • Regular Expressions
  • LAB: Working with Pandas

From Python and Pandas to Apache Spark

With Pandas you typically run code on a single machine. This means that as your data volumes become bigger and bigger, you will be hitting memory and cpu constraints. PySpark is a Spark library written in Python to run Python applications using Apache Spark. Apache Spark is an analytical processing engine for large scale powerful distributed data processing and machine learning applications. In Azure it is available in Azure Synapse Analytics and Azure Databricks.

  • Introducing Apache Spark
  • The SparkSession, SparkContext and SQLContext objects
  • An introduction to Resilient Distributed Datasets (RDD)
  • Convert a Pandas DataFrame to/from a PySpark DataFrame
  • Working with Parquet files
  • Working with DataFrames in PySpark
  • Data Cleansing using PySpark
  • Grouping and aggregating data in PySpark
  • Joining DataFrames
  • Using SQL to select and manipulate data
  • LAB: Data manipulation in Apache Spark

Building a Lakehouse using Delta Lake

Parquet is a very popular data format in the Big Data community, since it can store large volumes of data in a compact and easy to query way. But it doesn't allow to modify your data. So, a variant has been developed by Databricks, called Delta. This module explains this Delta format and shows how to use Delta format in Spark.

  • What Is a Lakehouse?
  • Introduction to Delta Lake
  • Creating tables
  • Partitioning data in tables
  • Reading table data
  • Query older snapshots of a table (Time Travel)
  • Insert, Update, Delete and Merge table data
  • Retrieving table metadata
  • Altering table metadata
  • Configuring Change Data Feed
  • LAB: Modifying data using Delta Lake

Python plays a crucial role in data engineering, data science and AI development due to its versatility, extensive libraries such as Pandas and PySpark, and its ability to handle large-scale data processing, making it an indispensable tool for extracting insights and building data pipelines. In this course, participants will gain a solid understanding of Python.

They will acquire the necessary skills and knowledge to utilize Python effectively, from basic syntax to implementing real-world solutions. During the course participants will get hands-on experience with Pandas, PySpark, Delta Lake...

This course is targeted at data engineers, data scientists and AI developers with no or little experience with Python. Familiarity with programming in general might come in handy.

Contact Us
  • Address:
    U2U nv/sa
    Z.1. Researchpark 110
    1731 Zellik (Brussels)
    BELGIUM
  • Phone: +32 2 466 00 16
  • Email: info@u2u.be
  • Monday - Friday: 9:00 - 17:00
    Saturday - Sunday: Closed
Say Hi
© 2025 U2U All rights reserved.