Interested in a private company training? Request it here.
Not ready to book yet? Request an offer here.
Azure Databricks allows us to use the power of Apache Spark without the configuration hassle of manually creating and configuring Apache Spark clusters. In this chapter you will learn how to setup an Azure Databricks environment and work with Databricks workspaces.
Databricks does not come with it's own cloud object storage. When you are using Databricks on the Azure platform, it stores it's data and metadata in one ore more Azure Data Lake Gen2 storage accounts.
Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces. In this chapter you will learn how to setup and configure a Unity Catalog metastore for your workspaces
Databricks compute refers to the selection of computing resources available in the Databricks workspace. Users need access to compute to run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning. Learn about the different types of Compute that can be provisioned in Azure Databricks.
Using popular languages such as Python, SQL and R data can be loaded, visualized, transformed and analyzed via interactive notebooks.
There are many ways to access data in Azure Databricks. From uploading small files via the portal over ad-hoc connections up to mounting Azure Storage or data lakes. The files can also be treated as a table, providing easy access.
Delta Lake is an optimized storage layer that provides the foundation for storing data and tables in a Databricks lakehouse. Learn how to create, query and optimize Delta Tables in a Databricks lakehouse
The lakehouse architecture and Databricks SQL Warehouse bring cloud data warehousing capabilities to your data lakes. A SQL warehouse is a compute resource that lets you run SQL commands on objects within Databricks SQL. Learn about the available warehouse types and how to query them.
Microsoft Power BI is a business analytics tool that provides interactive visualizations with self-service business intelligence capabilities, enabling end users to create reports and dashboards. You can connect Power BI Desktop to your Databricks clusters and Databricks SQL warehouses
Databricks Lakeflow is a new solution that contains everything you need to build and operate production data pipelines. It includes native, highly scalable connectors for databases like SQL Server and for enterprise applications like Salesforce and SharePoint. Users can transform data in batch and streaming using standard SQL and Python using Declarative ETL pipelines.
Lakeflow Connect offers simple and efficient connectors to ingest data from local files, popular enterprise applications, databases, cloud storage, message buses, and more. Learn how to efficiently ingest data using managed connectors from database systems like SQL Server and with unmanaged connectors from cloud storage systems or event systemsInd using Auto Loader.
Lakeflow Jobs is workflow automation for Databricks, providing orchestration for data processing workloads so that you can coordinate and run multiple tasks as part of a larger workflow. You can optimize and schedule the execution of frequent, repeatable tasks and manage complex workflows.
Databricks is a data analytics platform powered by Apache Spark for data engineering, data science, and machine learning. This training teaches how to use Azure Databricks to design and build a data lakehouse architecture.
No prior knowledge of Azure Databricks is required.