Interested in a private company training? Request it here.
The cloud requires to reconsider some of the choices made for on-premises data handling. This module introduces the concept of a data lake and the data lake house. It also introduces the different services in Azure that can be used for data processing and compares them to the traditional on-premises data stack.
This module discusses the different types of storage accounts available in Azure Storage. Also, some of the tools to load and manage files in an Azure Storage are covered.
Synapse Analytics is the cornerstone service for the data engineer. It encompasses pipelines to copy data, Spark, and SQL to transform and query data, Data Explorer for near realtime analysis and data exploration and Power BI for reporting. This module provides a brief introduction into this service. In this module you will see how you can configure Azure Synapse Link for Microsoft Dataverse to ingest your Microsoft Dataverse data in close to real-time into a data lake hosted by Azure Synapse. You will also see some more advanced options like how you can partition the data during ingestion.
With Data flows data can be validated and transformed without the need to learn about another tool (such as Databricks or Spark). Using Data Flows you can transform and combine the ingested data from Microsoft Dataverse into a business-ready format.
Once data has been loaded into the data lake, the next step is to cleanse the data, pre-aggregate the data and perform other steps to make the data accessible to reporting and analytical tools. Dependent on the transformations required and the skills of the data engineer, the SQL dialect common to the Microsoft data stack (T-SQL) could play an important role in this. You will learn about the concept of external tables and how to create and configure them. Once you have mastered the concept of external tables you will see the the Azure Synapse Link for Dataverse automatically creates external tables for the Dataverse tables you choose the replicate via Azure Synapse Link for Dataverse.
Spark doesn't have a proprietary data storage option, but consumes and produces regular files stored in Azure Storage. This module covers how to access and manipulate data stored in the Synapse Analytics data lake or other Azure storage locations. Azure Spark for Azure Synapse also comes with a Common Data Model (CDM) connector that allows you the easily read and transform the Dataverse data that is ingested into the data lake.
Handling large volumes of data requires different skills: One must master storage options, tools to upload data performant, handling failed uploads, and convert data in a format appropriate for reporting and analysis. In the Microsoft Azure stack, Synapse Analytics is the cornerstone service for the data engineer. It encompasses pipelines to copy data, Spark, and SQL to transform and query data, Data Explorer for near real-time analysis and data exploration, and Power BI for reporting. Microsoft Dataverse securely stores and manages data that is used by business applications. As Microsoft Dataverse is used to store critical business data, you will almost always want to load this data into a data lake. This is where Synapse Link for Microsoft Dataverse comes in. It is a managed service that can ingest Dataverse data in close to real-time into a data lake. Once the data lands in the data lake, you can use the services provided by Azure Synapse Analytics to transform, cleanse the data and build either a logical or physical data warehouse on top of it.
This training teaches how to use Synapse Analytics to design, build and maintain a modern data lake architecture. The training also includes a few other Azure services which come in handy when working with Synapse Analytics, such as Azure Data Vault for handling authentication, Azure SQL Database for dealing with smaller datasets and Azure Databricks as an improved Spark engine.
This course focusses on developers and administrators who are considering migrating existing data solutions to the Microsoft Azure cloud, or start designing new data oriented solutions in the Azure cloud. Some familiarity with relational database systems such as SQL Server is handy. Prior knowledge of Azure is not required.