Introduction Cortana Intelligence Suite
Cortana Intelligence Suite (CIS) is a collection of Azure services, so before we can get started with these we must first
discuss what we see as Big Data, and why we want to use Big Data technology. Since CIS is part of the Azure stack
we also introduce Azure in general.
- What is Big Data?
- Overview of Microsoft Azure
- The Azure Management Portals
- Cortana Intelligence Suite Components
Storing your data in Azure Storage
Azure Storage is like a sort of file share that can be used by many of the Azure services, including the CIS. Often the output
of one CIS components is stored in Azure Storage before being consumed by another component. In this module you
will learn about the different types of storage available in Azure Storage. Also will you become familiar with
some of the tools to load and manage files in Azure storage, such as Visual Studio and the Microsoft Azure Storage
So even though Azure Storage is not part of CIS, it is essential to know it in order to use CIS.
- The advantages of storing data in the Cloud
- Microsoft Azure Storage Concepts
- Working with Azure Tables
- Azure blob storage
- Tools for storing data in Azure Storage
Azure SQL Database
An easy way to create a business intelligence solution in the cloud is by taking SQL Server -- familiar to many Microsoft
BI developers -- and run it in the cloud. Backup and high availability happen automatically, and we can use nearly
all the skills and tools we used on a local SQL Server on this cloud based solution as well. This module shows
you how to get started with Azure SQL databases. But it is not only relevant as a service on its own, in later
modules you will discover that many services from CIS can access the data in Azure SQL Databases as well.
- Azure SQL Database feature set
- Connecting your apps with Azure SQL Database
- Migrating data to Azure SQL Database
- Basic, Standard and Premium tier
- Comparing performance: DTUs, transaction rates and benchmarks
- Elastic Database Pools
- High Availability and Disaster Recovery
Azure Data Warehouse
Azure SQL Databases have their limitations as well: you can't buy storage without also buying extra compute, and vice versa.
Also a single database can only grow till 1 TB in size. Azure Data Warehouse is a service aiming at an analytical
workload on data volumes hunderds of times larger than what Azure SQL databases can handle. Yet at the same time
we can keep on using the familiar T-SQL query language, or we can connect traditional applications such as Excel
and Management Studio to interact with this service. But storage and compute can be scaled independantly and
nearly instantly, making sure we pay only for what we want!
You will learn how to setup tables to get the best performance out of Azure Data Warehouse, see efficient
data loading techniques via CTAS (Create Table As Select) and PolyBase and get a basic insight in performance
- What is Azure Data Warehouse?
- Creating tables
- Loading data via external tables and PolyBase
Azure Data Lake Store and Analytics
Azure Data Lake Store (ADLS) and Azure Data Lake Analytics (ADLA) are like bread and butter: you can use them seperately,
but they are often used together. Azure Data Lake Store is comparable to Azure Storage, but it has a few features
which make it better suited for Big Data projects, such as support for much larger data volumes, Azure Active
Directory integration and it is accessible from Hadoop systems. This makes it ideal for setting up a data lake:
a spot to store lots of historical data that maybe isn't properly formatted for analysis yet. But hey, we rarely
drink directly from a lake, do we? So to turn the 'raw' data in our data lake into something 'pure' and consumable,
we need to apply some cleansing and/or analytics upon this. And that is where Azure Data Lake Analytics comes
into play. Using a 'Unified SQL' language (U-SQL) it allows us to use a mixture of the relational language SQL
and the object-oriented c# language to convert raw data into analysis.
- What is a data lake?
- Setup Azure Data Lake Storage
- Loading data
- Setup Azure Data Lake Analytics
- Getting started with U-SQL
- EXTRACT, SELECT, INSERT and OUTPUT
- U-SQL projects in Visual Studio
- Running U-SQL jobs locally
Azure Data Catalog
How do you find back all the relevant data that your business stores, spread over the sometimes hundreds of databases, cubes
and reports? To help you in this task, you need a database of databases, which stores only meta-data such as
table and column names, descriptions etc. This is exactly what the Azure Data Catalog is all about, and in this
module you learn how to create, fill and query this catalog.
- What is a data catalog
- Creating an Azure Data Catalog
- The Azure Data Catalog portal
- Collecting and uploading meta-data
Azure Data Factory
Not only do we want to store data and run analysis on this, we also need a scheduler to move our data to the proper services
and then run the relevant analysis on top of this. When the data is stored and analysed on premise we typically
use ETL tools such as SQL Server Integration Services for this. But what if the data is stored in the cloud?
Then we need Azure Data Factory, the cloud-based ETL service. First we need to get used to the terminology, then
we can start creating the proper objects in the portal, using the wizard or in Visual Studio.
- Introducing Data Factories
- Creating linked services and data sets
- Combining activities into pipelines
- Build a complete flow with the wizard
- Using Visual Studio to create or modify data factories
- Monitoring and managing data factories
Azure Event Hubs
All the topics covered so far mainly focus on analyzing data at rest. But what if you want to analyze a never ending stream
of incoming events, such as in Internet-of-things (IoT) applications? In this module we focus on buffering and
timestamping streams of incoming events. The next module is on Azure Stream Analytics and shows how to analyze
these streams of events in an easy way. Microsoft extended the T-SQL language with a few temporal concepts such
as sliding windows. With these we can develop an event processing application in a matter of minutes.
- Collecting streams of events
- Setup Azure Service Bus and Event hubs
- Managing Event Hubs
- Consumer groups
- Sending and consuming events
Azure Stream Analytics
- Real-time analytics and event handling
- Create Azure Stream Analytic jobs
- Configure security
- Connecting inputs and outputs
- Writing Stream Analytic queries
For many people Big Data processing is synonim with Hadoop. This open source big data eco system is very popular, and is
part of the Azure stack under the name HDInsight. In this module we mainly focus on how to setup HDInsight, discuss
the data storage options and illustrate the more popular Hadoop frameworks such as Hive, Pig and Spark. HDInsight
is a big collection of complex tools, don't expect to become an expert in each of these. If you're new to Hadoop,
it gives you some overview such that you know what is possible. If you are a data scientist with Hadoop experience
is shows you enough to know how to get started with this on the Azure stack.
- Setting up an HDInsight cluster
- Tools for loading data
- Map-Reduce and YARN
Azure Machine Learning
Just remembering a bunch of things doesn't make somebody smart, but the skill to learn from 'old knowledge' and apply this
on unseen situations is what makes somebody smart. That's exactly the purpose of machine learning. There are
many frameworks to do machine learning, such as MLLib in Spark, or Mahout in Hadoop, or R in SQL Server. In Azure
Machine Learning Microsoft created a framework that is easy enough such that non-programmers can use a simple
GUI to build machine learning models. But machine learning experts can use their Python or R skills as well to
do very advanced things in Azure Machine Learning that go beyond the scope of the GUI. Another great feature
of Azure Machine Learning is the deployment feature: Once you learned the right model, with a few click (and
zero coding!) you create a webservice such that you can call your model from nearly any applications!
In this last day of the Cortana Intelligence training we start by looking into the basic of machine learning,
then you will learn how to solve classification and regression problems and we conclude by creating and consuming
machine learning web services.
- Getting started in ML Studio
- Accessing data sets
- Data preparation: Filters, Manipulation, Sample and Split, Scale and reduce
- R and Python scripts
- Feature selection
- Exploring the different modeling techniques for classification, regression, clustering and collaborative filtering
- Model training
- Scoring datasets
- Evaluate the models
- Create a scoring experiment
- Create and configure web service
Cortana Intelligence Suite (CIS) is a collection of Azure services to load, store and analyze data in the cloud. Although
each of these services can be used independantly for one another, they will often be chained together to process
data in the cloud.
In this course we first look into how data can be loaded into cloud storage in an automated fashion using Azure Event Hub
and Azure Data Factory.
Then we investigate how data can be stored. We look into the traditional solutions using Azure Storage and Azure SQL Databases.
But we also investigate newer technologies such as Azure SQL Data Warehouse and Data Lake Storage for dealing
with large and very large volumes of data.
The next step is analyzing the data. We discuss HDInsight with its traditional Hadoop technologies such as Hive and Pig,
but we also touch upon Azure Data Lake Analytics which introduces U-SQL as its new data query language. Azure
Machine Learning is crucial to do more advanced analysis on large volumes of data. Also Azure Stream Analytics
is discussed to analyze streams of events.
Finally we take a brief look at how the results of these analyses can be used in Power BI as a reporting tool.
All these technologies are introduced and demonstrated, but participants will also have hands-on labs on each of these technologies.