Call Us: +32 2 466 00 16
Email: info@u2u.be
Follow Us:

Big Data Analysis with the Microsoft AI Platform

4days
Training code
UADATA
Book this course

Introduction Cortana Intelligence Suite

Cortana Intelligence Suite (CIS) is a collection of Azure services, so before we can get started with these we must first discuss what we see as Big Data, and why we want to use Big Data technology. Since CIS is part of the Azure stack we also introduce Azure in general.

  • What is Big Data?
  • Overview of Microsoft Azure
  • The Azure Management Portals
  • Cortana Intelligence Suite Components

Storing your data in Azure Storage

Azure Storage is like a sort of file share that can be used by many of the Azure services, including the CIS. Often the output of one CIS components is stored in Azure Storage before being consumed by another component. In this module you will learn about the different types of storage available in Azure Storage. Also will you become familiar with some of the tools to load and manage files in Azure storage.

  • Microsoft Azure Storage Concepts: Storage accounts and Containers
  • Azure blob storage
  • Tools for storing data in Azure Storage

Azure SQL Database

An easy way to create a business intelligence solution in the cloud is by taking SQL Server -- familiar to many Microsoft BI developers -- and run it in the cloud. Backup and high availability happen automatically, and we can use nearly all the skills and tools we used on a local SQL Server on this cloud based solution as well.

  • Azure SQL Database feature set
  • Basic, Standard, Premium and Premium RS tier
  • Comparing performance: DTUs, transaction rates and benchmarks

Azure Data Warehouse

Azure SQL Databases have their limitations in compute power since they run on a single machine, and their size is limited to 4 Tb per database. Azure Data Warehouse is a service aiming at an analytical workload on data volumes hunderds of times larger than what Azure SQL databases can handle. Yet at the same time we can keep on using the familiar T-SQL query language, or we can connect traditional applications such as Excel and Management Studio to interact with this service. But storage and compute can be scaled independantly.

  • What is Azure Data Warehouse?
  • Setup
  • Creating and distributing tables
  • Loading data via external tables and PolyBase
  • Elasticity versus Performance tier
  • Monitoring and performance tuning

Azure Analysis Services

Analysis Services is Microsoft's OLAP (cube) technology. The latest version, Analysis Services Tabular, can also run as a database-as-a-service. This is ideal to load the cleaned, pre-processed data produced by other Cortana Intelligence components and cache it. This leads to faster reporting. But the data can also be enriched with KPIs, translations, derived measures etc. In this module we take a brief look at how an Analysis Services model can be created and deployed to the cloud, but for a more in-depth discussion we refer to the Analysis Services Tabular training.

  • Creating a cloud based Analysis Server
  • Deploying Power BI models
  • Deploying from Visual Studio
  • Maintenance

Azure Data Lake Store and Analytics

Azure Data Lake Store (ADLS) and Azure Data Lake Analytics (ADLA) are like bread and butter: you can use them seperately, but they are often used together. Azure Data Lake Store is comparable to Azure Storage, but it has a few features which make it better suited for Big Data projects. This makes it ideal for setting up a data lake. But to turn the 'raw' data in our data lake into something 'pure' and consumable, we need to apply some cleansing and/or analytics upon this. And that is where Azure Data Lake Analytics comes into play. Using a 'Unified SQL' language (U-SQL) it allows us to use a mixture of the relational language SQL and the object-oriented c# language to convert raw data into analysis.

  • What is a data lake?
  • Setup Azure Data Lake Storage
  • Loading data
  • Setup Azure Data Lake Analytics
  • Getting started with U-SQL
  • EXTRACT, SELECT, INSERT and OUTPUT
  • U-SQL projects in Visual Studio
  • Running U-SQL jobs locally

Cosmos DB

Cosmos DB is a No-SQL solution, with a schema-on-read approach based on JSON. It is an extension of the former DocumentDB database. It supports many APIs, such that you can treat it as a MongoDb, Cassandra, graph database etc. Very flexible for application developers, and a great source for BI data!

  • What is CosmosDB
  • Setting up a database
  • Partitioning
  • Resource Units
  • Tools: emulator and data migration tool

Azure Data Catalog

How do you find back all the relevant data that your business stores, spread over the sometimes hundreds of databases, cubes and reports? To help you in this task, you need a database of databases, which stores only meta-data such as table and column names, descriptions etc. This is exactly what the Azure Data Catalog is all about, and in this module you learn how to create, fill and query this catalog.

  • What is a data catalog
  • Creating an Azure Data Catalog
  • The Azure Data Catalog portal
  • Collecting and uploading meta-data

Azure Data Factory

Not only do we want to store data and run analysis on this, we also need a scheduler to move our data to the proper services and then run the relevant analysis on top of this. When the data is stored and analysed on premise we typically use ETL tools such as SQL Server Integration Services for this. But what if the data is stored in the cloud? Then we need Azure Data Factory, the cloud-based ETL service. First we need to get used to the terminology, then we can start creating the proper objects in the portal, using the wizard or in Visual Studio.

  • Introducing Data Factories
  • Creating linked services and data sets
  • Combining activities into pipelines
  • Build a complete flow with the wizard
  • Using Visual Studio to create or modify data factories
  • Monitoring and managing data factories
  • Data Factory V2 improvements

Azure Event Hubs

All the topics covered so far mainly focus on analyzing data at rest. But what if you want to analyze a never ending stream of incoming events, such as in Internet-of-things (IoT) applications? In this module we focus on buffering and timestamping streams of incoming events. The next module is on Azure Stream Analytics and shows how to analyze these streams of events in an easy way. Microsoft extended the T-SQL language with a few temporal concepts such as sliding windows. With these we can develop an event processing application in a matter of minutes.

  • Collecting streams of events
  • Setup Azure Service Bus and Event hubs
  • Managing Event Hubs
  • Consumer groups
  • Sending and consuming events

Azure Stream Analytics

  • Real-time analytics and event handling
  • Create Azure Stream Analytic jobs
  • Configure security
  • Connecting inputs and outputs
  • Writing Stream Analytic queries
  • Scaling

HDInsight

For many people Big Data processing is synonym with Hadoop. This open source big data eco system is very popular, and is part of the Azure stack under the name HDInsight. In this module we mainly focus on how to setup HDInsight, discuss the data storage options and illustrate the more popular Hadoop frameworks such as Hive, Pig and Spark. HDInsight is a big collection of complex tools, don't expect to become an expert in each of these. If you're new to Hadoop, it gives you some overview such that you know what is possible. If you are a data scientist with Hadoop experience is shows you enough to know how to get started with this on the Azure stack.

  • Setting up an HDInsight cluster
  • Tools for loading data
  • Map-Reduce and YARN
  • Hive
  • Pig
  • Spark

Cognitive Services

Analyzing nicely formatted tables is easy, but what if the data at hand are scanned invoices, security camera footage etc? With Azure cognitive services we can convert difficult data formats such as photo, audio or video into a structured representation, which can then be further used in remainder of the analysis.

  • What are cognitive services
  • Overview of the cognitive services
  • Customizable versus non-customizable cognitive services
  • Configuring LUIS for language understanding

Azure Machine Learning

Just remembering a bunch of things doesn't make somebody smart, but the skill to learn from 'old knowledge' and apply this on unseen situations is what makes somebody smart. That's exactly the purpose of machine learning. In Azure Machine Learning Microsoft created a framework that is easy enough such that non-programmers can use a simple GUI to build machine learning models. But machine learning experts can use their Python or R skills as well to do very advanced things in Azure Machine Learning that go beyond the scope of the GUI. Another great feature of Azure Machine Learning is the deployment feature: Once you learned the right model, with a few click (and zero coding!) you create a webservice such that you can call your model from nearly any applications!

  • Getting started in ML Studio
  • Accessing data sets
  • Using R and Python scripts
  • Exploring the different modeling techniques for classification, regression, clustering and collaborative filtering
  • Model training
  • Scoring datasets
  • Evaluate the models
  • Create a scoring experiment
  • Create and configure web service

Cortana Intelligence Suite (CIS) is Microsoft's Big Data solution: A collection of Azure services to load, store and analyze large volumes of data in the cloud. Although each of these services can be used independant for one another, they will often be used together to process data in the cloud.

First we investigate how data can be stored. We look into the traditional solutions using Azure Storage and Azure SQL Databases. But we also investigate newer technologies such as Azure SQL Data Warehouse and Data Lake Storage for dealing with large and very large volumes of data. For data that is less structured the No-SQL Cosmos DB can be used.

The next step is analyzing the data. We discuss HDInsight with its traditional Hadoop technologies such as Hive and Pig, but we also touch upon Azure Data Lake Analytics which introduces U-SQL as its new data query language. Azure Machine Learning is crucial to do more advanced analysis on large volumes of data. Also Azure Stream Analytics is discussed to analyze streams of events (together with Event Hubs to capture large volumes of incoming events).

We must also pay attention to how data can be loaded into cloud storage in an automated fashion using Azure Data Factory.

Finally we take a brief look at how the results of these analyses can be used in Power BI as a reporting tool. Also Azure Analysis Services comes into the picture, as we often need it as a fast and user-friendly cache of the data.

All these technologies are introduced and demonstrated, but participants will also have hands-on labs on each of these technologies.

This course focusses on developers, administrators and project managers who are considering migrating existing databases or developing new data centric applications in the Microsoft Azure cloud. Some familiarity with relational database systems such as SQL Server is handy. Prior knowledge of Azure is not required.

© 2018 U2U All rights reserved.