Interested in a private company training? Request it here.
Not ready to book yet? Request an offer here.
Python is a high-level, interpreted, interactive and object-oriented scripting language. This chapter introduces the history of Python and how to install Python and run your first lines of Python Code. There are quite some editors available for writing Python code but this course focusses on using Visual Studio Code as a code editor for Python. We'll also cover modern Python tooling including uv, a fast Python package installer and project manager.
To build code that remains readable and maintainable it is important to be able to break up code in reusable components such as functions and classes.
Python classes provide all the standard features of Object Oriented Programming. Classes can inherit from other base classes, have Constructors for the initialization of objects, and leverage modern Python features like dataclasses for simplified class creation with automatic method generation.
Building on the fundamentals of classes and objects, this chapter explores sophisticated OOP concepts and design patterns commonly used in enterprise Python development. Topics include abstract base classes, metaclasses, decorators that are particularly relevant for data engineering frameworks and app development.
Modules in Python are reusable code libraries and Python ships with quite a large amount of build-in Modules. Learn how to create and import Modules.
Pydantic is a powerful Python library that uses Python type annotations to validate data and settings management. It provides runtime type checking and automatic data conversion, making it essential for building robust data pipelines and APIs. This chapter covers how to define data models, validate complex data structures, and handle validation errors effectively.
Testing is a critical aspect of software development that ensures code reliability and maintainability. Python provides excellent testing frameworks, with pytest being the most popular choice for its simplicity and powerful features. This chapter covers writing effective unit tests, mocking dependencies, and implementing test-driven development practices for data engineering and app development.
Pandas is a Python library which makes loading and transforming data a lot easier. As long as all your data fits in memory, Pandas is your friend.
With Pandas you typically run code on a single machine. This means that as your data volumes become bigger and bigger, you will be hitting memory and cpu constraints. PySpark is a Spark library written in Python to run Python applications using Apache Spark. Apache Spark is an analytical processing engine for large scale powerful distributed data processing and machine learning applications. In Azure it is available in Azure Synapse Analytics and Azure Databricks.
Parquet is a very popular data format in the Big Data community, since it can store large volumes of data in a compact and easy to query way. But it doesn't allow to modify your data. So, a variant has been developed by Databricks, called Delta. This module explains this Delta format and shows how to use Delta format in Spark.
Python plays a crucial role in data engineering, data science and AI development due to its versatility, extensive libraries such as Pandas and PySpark, and its ability to handle large-scale data processing, making it an indispensable tool for extracting insights and building data pipelines. In this course, participants will gain a solid understanding of Python.
They will acquire the necessary skills and knowledge to utilize Python effectively, from basic syntax to implementing real-world solutions. During the course participants will get hands-on experience with Pandas, PySpark, Delta Lake...
This course is targeted at data engineers, data scientists and AI developers with no or little experience with Python. Familiarity with programming in general might come in handy.