Call Us: +32 2 466 00 16
Email: info@u2u.be
Follow Us:

Big Data Analysis with Azure Cortana Intelligence Suite

4days
Training code
uadata

Upcoming sessions

25 Sep 2017
20 Nov 2017
15 Jan 2018
Book this course

Day 1

Introduction Cortana Intelligence Suite

Cortana Intelligence Suite (CIS) is a collection of Azure services, so before we can get started with these we must first discuss what we see as Big Data, and why we want to use Big Data technology. Since CIS is part of the Azure stack we also introduce Azure in general.

  • What is Big Data?
  • Overview of Microsoft Azure
  • Pricing
  • The Azure Management Portals
  • Cortana Intelligence Suite Components

Storing your data in Azure Storage

Azure Storage is like a sort of file share that can be used by many of the Azure services, including the CIS. Often the output of one CIS components is stored in Azure Storage before being consumed by another component. In this module you will learn about the different types of storage available in Azure Storage. Also will you become familiar with some of the tools to load and manage files in Azure storage, such as Visual Studio and the Microsoft Azure Storage Explorer.
So even though Azure Storage is not part of CIS, it is essential to know it in order to use CIS.

  • The advantages of storing data in the Cloud
  • Microsoft Azure Storage Concepts
  • Working with Azure Tables
  • Azure blob storage
  • Tools for storing data in Azure Storage

Azure SQL Database

An easy way to create a business intelligence solution in the cloud is by taking SQL Server -- familiar to many Microsoft BI developers -- and run it in the cloud. Backup and high availability happen automatically, and we can use nearly all the skills and tools we used on a local SQL Server on this cloud based solution as well. This module shows you how to get started with Azure SQL databases. But it is not only relevant as a service on its own, in later modules you will discover that many services from CIS can access the data in Azure SQL Databases as well.

  • Azure SQL Database feature set
  • Connecting your apps with Azure SQL Database
  • Migrating data to Azure SQL Database
  • Basic, Standard and Premium tier
  • Comparing performance: DTUs, transaction rates and benchmarks
  • Elastic Database Pools
  • High Availability and Disaster Recovery

Day 2

Azure Data Warehouse

Azure SQL Databases have their limitations as well: you can't buy storage without also buying extra compute, and vice versa. Also a single database can only grow till 1 TB in size. Azure Data Warehouse is a service aiming at an analytical workload on data volumes hunderds of times larger than what Azure SQL databases can handle. Yet at the same time we can keep on using the familiar T-SQL query language, or we can connect traditional applications such as Excel and Management Studio to interact with this service. But storage and compute can be scaled independantly and nearly instantly, making sure we pay only for what we want!
You will learn how to setup tables to get the best performance out of Azure Data Warehouse, see efficient data loading techniques via CTAS (Create Table As Select) and PolyBase and get a basic insight in performance tuning.

  • What is Azure Data Warehouse?
  • Setup
  • Creating tables
  • Loading data via external tables and PolyBase
  • Performance

Azure Data Lake Store and Analytics

Azure Data Lake Store (ADLS) and Azure Data Lake Analytics (ADLA) are like bread and butter: you can use them seperately, but they are often used together. Azure Data Lake Store is comparable to Azure Storage, but it has a few features which make it better suited for Big Data projects, such as support for much larger data volumes, Azure Active Directory integration and it is accessible from Hadoop systems. This makes it ideal for setting up a data lake: a spot to store lots of historical data that maybe isn't properly formatted for analysis yet. But hey, we rarely drink directly from a lake, do we? So to turn the 'raw' data in our data lake into something 'pure' and consumable, we need to apply some cleansing and/or analytics upon this. And that is where Azure Data Lake Analytics comes into play. Using a 'Unified SQL' language (U-SQL) it allows us to use a mixture of the relational language SQL and the object-oriented c# language to convert raw data into analysis.

  • What is a data lake?
  • Setup Azure Data Lake Storage
  • Loading data
  • Setup Azure Data Lake Analytics
  • Getting started with U-SQL
  • EXTRACT, SELECT, INSERT and OUTPUT
  • U-SQL projects in Visual Studio
  • Running U-SQL jobs locally

Azure Data Catalog

How do you find back all the relevant data that your business stores, spread over the sometimes hundreds of databases, cubes and reports? To help you in this task, you need a database of databases, which stores only meta-data such as table and column names, descriptions etc. This is exactly what the Azure Data Catalog is all about, and in this module you learn how to create, fill and query this catalog.

  • What is a data catalog
  • Creating an Azure Data Catalog
  • The Azure Data Catalog portal
  • Collecting and uploading meta-data

Day 3

Azure Data Factory

Not only do we want to store data and run analysis on this, we also need a scheduler to move our data to the proper services and then run the relevant analysis on top of this. When the data is stored and analysed on premise we typically use ETL tools such as SQL Server Integration Services for this. But what if the data is stored in the cloud? Then we need Azure Data Factory, the cloud-based ETL service. First we need to get used to the terminology, then we can start creating the proper objects in the portal, using the wizard or in Visual Studio.

  • Introducing Data Factories
  • Creating linked services and data sets
  • Combining activities into pipelines
  • Build a complete flow with the wizard
  • Using Visual Studio to create or modify data factories
  • Monitoring and managing data factories

Azure Event Hubs

All the topics covered so far mainly focus on analyzing data at rest. But what if you want to analyze a never ending stream of incoming events, such as in Internet-of-things (IoT) applications? In this module we focus on buffering and timestamping streams of incoming events. The next module is on Azure Stream Analytics and shows how to analyze these streams of events in an easy way. Microsoft extended the T-SQL language with a few temporal concepts such as sliding windows. With these we can develop an event processing application in a matter of minutes.

  • Collecting streams of events
  • Setup Azure Service Bus and Event hubs
  • Managing Event Hubs
  • Consumer groups
  • Sending and consuming events

Azure Stream Analytics

  • Real-time analytics and event handling
  • Create Azure Stream Analytic jobs
  • Configure security
  • Connecting inputs and outputs
  • Writing Stream Analytic queries
  • Scaling

HDInsight

For many people Big Data processing is synonim with Hadoop. This open source big data eco system is very popular, and is part of the Azure stack under the name HDInsight. In this module we mainly focus on how to setup HDInsight, discuss the data storage options and illustrate the more popular Hadoop frameworks such as Hive, Pig and Spark. HDInsight is a big collection of complex tools, don't expect to become an expert in each of these. If you're new to Hadoop, it gives you some overview such that you know what is possible. If you are a data scientist with Hadoop experience is shows you enough to know how to get started with this on the Azure stack.

  • Setting up an HDInsight cluster
  • Tools for loading data
  • Map-Reduce and YARN
  • Hive
  • Pig
  • Spark

Day 4

Azure Machine Learning

Just remembering a bunch of things doesn't make somebody smart, but the skill to learn from 'old knowledge' and apply this on unseen situations is what makes somebody smart. That's exactly the purpose of machine learning. There are many frameworks to do machine learning, such as MLLib in Spark, or Mahout in Hadoop, or R in SQL Server. In Azure Machine Learning Microsoft created a framework that is easy enough such that non-programmers can use a simple GUI to build machine learning models. But machine learning experts can use their Python or R skills as well to do very advanced things in Azure Machine Learning that go beyond the scope of the GUI. Another great feature of Azure Machine Learning is the deployment feature: Once you learned the right model, with a few click (and zero coding!) you create a webservice such that you can call your model from nearly any applications!
In this last day of the Cortana Intelligence training we start by looking into the basic of machine learning, then you will learn how to solve classification and regression problems and we conclude by creating and consuming machine learning web services.

  • Getting started in ML Studio
  • Accessing data sets
  • Data preparation: Filters, Manipulation, Sample and Split, Scale and reduce
  • R and Python scripts
  • Feature selection
  • Exploring the different modeling techniques for classification, regression, clustering and collaborative filtering
  • Model training
  • Scoring datasets
  • Evaluate the models
  • Create a scoring experiment
  • Create and configure web service

Cortana Intelligence Suite (CIS) is a collection of Azure services to load, store and analyze data in the cloud. Although each of these services can be used independently from one another, they will often be chained together to process data in the cloud.

In this course we first look into how data can be loaded into cloud storage in an automated fashion using Azure Event Hub and Azure Data Factory.

Then we investigate how data can be stored. We look into the traditional solutions using Azure Storage and Azure SQL Databases. But we also investigate newer technologies such as Azure SQL Data Warehouse and Data Lake Storage for dealing with large and very large volumes of data.

The next step is analyzing the data. We discuss HDInsight with its traditional Hadoop technologies such as Hive and Pig, but we also touch upon Azure Data Lake Analytics which introduces U-SQL as its new data query language. Azure Machine Learning is crucial to do more advanced analysis on large volumes of data. Also Azure Stream Analytics is discussed to analyze streams of events.

Finally we take a brief look at how the results of these analyses can be used in Power BI as a reporting tool.

All these technologies are introduced and demonstrated, but participants will also have hands-on labs on each of these technologies.

This course focusses on developers, administrators and project managers who are considering migrating existing databases or developing new data centric applications in the Microsoft Azure cloud. Some familiarity with relational database systems such as SQL Server is handy. Prior knowledge of Azure is not required.

© 2017 U2U All rights reserved.