Getting Started with Python
Python is a high-level, interpreted, interactive and object-oriented scripting language. This chapter introduces
the history of Python and
how to install Python and run your first lines of Python Code. There are quite some editors available for
writing Python code but this course
focusses on using Visual Studio Code as a code editor for Python.
- Introducing to Python
- Installing Python
- Executing Python Code from the Command Shell
- Python and Visual Studio Code
- Working with packages in Python
- Working with Virtual Environments in Python
- Interactive development in Jupyter notebooks
- LAB: Installing Python and executing code
Basic Language Constructs in Python
To build code that remains readable and maintainable it is important to be able to break up code in reusable
components such as functions and classes.
- Introduction to writing Python code
- Declaring and Using Variables
- Data Types in Python
- Working with Lists, Tuples, Sequences and Dictionaries
- Basic Programming Constructs in Python
- Declaring and executing Functions
- LAB: Writing basic Python code
Working with Classes and Objects
Python classes provide all the standard features of Object Oriented Programming. Classes can
inherit from other base classes, have Constructors for the initialization of objects...
- Introduction to Object-Oriented Programming
- Defining and instantiating Classes in Python
- Working with Constructors
- Instance and Class Variables
- Inheritance in Python
- Working with Access Modifiers
- LAB: Working with classes and objects
Using and Creating Modules
Modules in Python are reusable code libraries and Python ships with quite a large amount of build-in Modules.
Learn how to create and import Modules.
- Introduction to Modules
- Importing Modules
- Creating Modules
- LAB: Using and creating Modules
Data Processing and Cleansing using Pandas
- What is Pandas
- Introducing Pandas Data Structures
- Reading data with Pandas
- Indexing in a DataFrame
- Creating and deleting columns
- Filtering and Replacing data
- Sorting and Ranking data
- Grouping and aggregating data
From Python and Pandas to Apache Spark
With Pandas you typically run code on a single machine. This means that as your data volumes become bigger and
bigger, you will be hitting memory and cpu constraints.
PySpark is a Spark library written in Python to run Python applications using Apache Spark.
Apache Spark is an analytical processing engine for large scale powerful distributed data processing and machine
In Azure it is available in Azure Synapse Analytics and Azure Databricks.
- Introducing Apache Spark
- The SparkSession, SparkContext and SQLContext objects
- An introduction to Resilient Distributed Datasets (RDD)
- Convert a Pandas DataFrame to/from a PySpark DataFrame
- Reading and writing data using DataFrames
- Working with DataFrames in PySpark
- Data Cleansing using PySpark
- Grouping and aggregating data in PySpark
- Joining DataFrames
- Using SQL to select and manipulate data
Building a Lakehouse using Delta Lake
- What Is a Lakehouse?
- Introduction to Delta Lake
- Creating tables
- Partitioning data in tables
- Reading table data
- Query older snapshots of a table (Time Travel)
- Insert, Update, Delete and Merge table data
- Retrieving table metadata
- Altering table metadata
- Configuring Change Data Feed
Python plays a crucial role in data engineering due to its versatility, extensive libraries such as Pandas and
and its ability to handle large-scale data processing, making it an indispensable tool for extracting insights
and building data pipelines.
In this course, participants will gain a solid understanding of Python.
They will acquire the necessary skills and knowledge to utilize Python effectively for data engineering tasks,
from basic syntax to implementing real-world solutions. During the course participants will get hands-on
experience with Pandas, PySpark, Delta Lake...
This course is targeted at data engineers with no or little experience with Python.