Introduction Cortana Intelligence Suite
Cortana Intelligence Suite (CIS) is a collection of Azure services, so before we can get started with these we must first
discuss what we see as Big Data, and why we want to use Big Data technology. Since CIS is part of the Azure stack
we also introduce Azure in general.
- What is Big Data?
- Overview of Microsoft Azure
- The Azure Management Portals
- Cortana Intelligence Suite Components
Storing your data in Azure Storage
Azure Storage is like a sort of file share that can be used by many of the Azure services, including the CIS. Often the output
of one CIS components is stored in Azure Storage before being consumed by another component. In this module you
will learn about the different types of storage available in Azure Storage. Also will you become familiar with
some of the tools to load and manage files in Azure storage.
- Microsoft Azure Storage Concepts: Storage accounts and Containers
- Azure blob storage
- Tools for storing data in Azure Storage
Azure SQL Database
An easy way to create a business intelligence solution in the cloud is by taking SQL Server -- familiar to many Microsoft
BI developers -- and run it in the cloud. Backup and high availability happen automatically, and we can use nearly
all the skills and tools we used on a local SQL Server on this cloud based solution as well.
- Azure SQL Database feature set
- Basic, Standard, Premium and Premium RS tier
- Comparing performance: DTUs, transaction rates and benchmarks
Azure Data Warehouse
Azure SQL Databases have their limitations in compute power since they run on a single machine, and their size is limited to 4 Tb per database. Azure Data Warehouse is a service aiming at an analytical
workload on data volumes hunderds of times larger than what Azure SQL databases can handle. Yet at the same time
we can keep on using the familiar T-SQL query language, or we can connect traditional applications such as Excel
and Management Studio to interact with this service. But storage and compute can be scaled independantly.
- What is Azure Data Warehouse?
- Creating and distributing tables
- Loading data via external tables and PolyBase
- Elasticity versus Performance tier
- Monitoring and performance tuning
Azure Analysis Services
Analysis Services is Microsoft's OLAP (cube) technology. The latest version, Analysis Services Tabular, can also run as a database-as-a-service. This is ideal to load the cleaned, pre-processed data produced by other Cortana Intelligence components and cache it. This leads to faster reporting. But the data can also be enriched with KPIs, translations, derived measures etc. In this module we take a brief look at how an Analysis Services model can be created and deployed to the cloud, but for a more in-depth discussion we refer to the Analysis Services Tabular training.
- Creating a cloud based Analysis Server
- Deploying Power BI models
- Deploying from Visual Studio
Azure Data Lake Store and Analytics
Azure Data Lake Store (ADLS) and Azure Data Lake Analytics (ADLA) are like bread and butter: you can use them seperately,
but they are often used together. Azure Data Lake Store is comparable to Azure Storage, but it has a few features which make
it better suited for Big Data projects. This makes it ideal for setting up a data lake. But to turn the 'raw' data in our
data lake into something 'pure' and consumable, we need to apply some cleansing and/or analytics upon this. And that is where
Azure Data Lake Analytics comes into play. Using a 'Unified SQL' language (U-SQL) it allows us to use a mixture of the relational
language SQL and the object-oriented c# language to convert raw data into analysis.
- What is a data lake?
- Setup Azure Data Lake Storage
- Loading data
- Setup Azure Data Lake Analytics
- Getting started with U-SQL
- EXTRACT, SELECT, INSERT and OUTPUT
- U-SQL projects in Visual Studio
- Running U-SQL jobs locally
Cosmos DB is a No-SQL solution, with a schema-on-read approach based on JSON. It is an extension of the former DocumentDB database. It supports many APIs, such that you can treat it as a MongoDb, Cassandra, graph database etc. Very flexible for application developers, and a great source for BI data!
- What is CosmosDB
- Setting up a database
- Resource Units
- Tools: emulator and data migration tool
Azure Data Catalog
How do you find back all the relevant data that your business stores, spread over the sometimes hundreds of databases, cubes
and reports? To help you in this task, you need a database of databases, which stores only meta-data such as
table and column names, descriptions etc. This is exactly what the Azure Data Catalog is all about, and in this
module you learn how to create, fill and query this catalog.
- What is a data catalog
- Creating an Azure Data Catalog
- The Azure Data Catalog portal
- Collecting and uploading meta-data
Azure Data Factory
Not only do we want to store data and run analysis on this, we also need a scheduler to move our data to the proper services
and then run the relevant analysis on top of this. When the data is stored and analysed on premise we typically
use ETL tools such as SQL Server Integration Services for this. But what if the data is stored in the cloud?
Then we need Azure Data Factory, the cloud-based ETL service. First we need to get used to the terminology, then
we can start creating the proper objects in the portal, using the wizard or in Visual Studio.
- Introducing Data Factories
- Creating linked services and data sets
- Combining activities into pipelines
- Build a complete flow with the wizard
- Using Visual Studio to create or modify data factories
- Monitoring and managing data factories
- Data Factory V2 improvements
Azure Event Hubs
All the topics covered so far mainly focus on analyzing data at rest. But what if you want to analyze a never ending stream
of incoming events, such as in Internet-of-things (IoT) applications? In this module we focus on buffering and
timestamping streams of incoming events. The next module is on Azure Stream Analytics and shows how to analyze
these streams of events in an easy way. Microsoft extended the T-SQL language with a few temporal concepts such
as sliding windows. With these we can develop an event processing application in a matter of minutes.
- Collecting streams of events
- Setup Azure Service Bus and Event hubs
- Managing Event Hubs
- Consumer groups
- Sending and consuming events
Azure Stream Analytics
- Real-time analytics and event handling
- Create Azure Stream Analytic jobs
- Configure security
- Connecting inputs and outputs
- Writing Stream Analytic queries
For many people Big Data processing is synonym with Hadoop. This open source big data eco system is very popular, and is
part of the Azure stack under the name HDInsight. In this module we mainly focus on how to setup HDInsight, discuss
the data storage options and illustrate the more popular Hadoop frameworks such as Hive, Pig and Spark. HDInsight
is a big collection of complex tools, don't expect to become an expert in each of these. If you're new to Hadoop,
it gives you some overview such that you know what is possible. If you are a data scientist with Hadoop experience
is shows you enough to know how to get started with this on the Azure stack.
- Setting up an HDInsight cluster
- Tools for loading data
- Map-Reduce and YARN
Analyzing nicely formatted tables is easy, but what if the data at hand are scanned invoices, security camera footage etc? With Azure cognitive services we can convert difficult data formats such as photo, audio or video into a structured representation, which can then be further used in remainder of the analysis.
- What are cognitive services
- Overview of the cognitive services
- Customizable versus non-customizable cognitive services
- Configuring LUIS for language understanding
Azure Machine Learning
Just remembering a bunch of things doesn't make somebody smart, but the skill to learn from 'old knowledge' and apply this
on unseen situations is what makes somebody smart. That's exactly the purpose of machine learning. In Azure
Machine Learning Microsoft created a framework that is easy enough such that non-programmers can use a simple
GUI to build machine learning models. But machine learning experts can use their Python or R skills as well to
do very advanced things in Azure Machine Learning that go beyond the scope of the GUI. Another great feature
of Azure Machine Learning is the deployment feature: Once you learned the right model, with a few click (and
zero coding!) you create a webservice such that you can call your model from nearly any applications!
- Getting started in ML Studio
- Accessing data sets
- Using R and Python scripts
- Exploring the different modeling techniques for classification, regression, clustering and collaborative filtering
- Model training
- Scoring datasets
- Evaluate the models
- Create a scoring experiment
- Create and configure web service
Cortana Intelligence Suite (CIS) is Microsoft's Big Data solution: A collection of Azure services to load, store and analyze
large volumes of data in the cloud. Although each of these services can be used independant for one another, they
will often be used together to process data in the cloud.
First we investigate how data can be stored. We look into the traditional solutions using Azure Storage and Azure SQL Databases.
But we also investigate newer technologies such as Azure SQL Data Warehouse and Data Lake Storage for dealing
with large and very large volumes of data. For data that is less structured the No-SQL Cosmos DB can be used.
The next step is analyzing the data. We discuss HDInsight with its traditional Hadoop technologies such as Hive and Pig,
but we also touch upon Azure Data Lake Analytics which introduces U-SQL as its new data query language. Azure
Machine Learning is crucial to do more advanced analysis on large volumes of data. Also Azure Stream Analytics
is discussed to analyze streams of events (together with Event Hubs to capture large volumes of incoming events).
We must also pay attention to how data can be loaded into cloud storage in an automated fashion using Azure Data Factory.
Finally we take a brief look at how the results of these analyses can be used in Power BI as a reporting tool. Also Azure Analysis Services comes into the picture, as we often need it as a fast and user-friendly cache of the data.
All these technologies are introduced and demonstrated, but participants will also have hands-on labs on each of these technologies.