Data mining introduction
- What is data mining
- Why and when to use
The CRISP-DM methodology
The Cross-industry standard process for data mining (CRISP-DM) is a methodology for running and documenting the enterprise data mining process. It is technology independant. In this module we introduce this methodology in general, each of the next modules corresponds with a phase in this CRISP-DM methodology.
- The need for a methodology
- CRISP-DM reference model
- Introducing the 6 phases
- Alternative methodologies
Business and Data understanding
We must first be able to identify usefull data mining goals in our business. In this module you learn about the most common data mining goals such as regression and classification.
- Business objectives
- Data mining goals
- Data collection
- Exploring and validating data
Since data is the key ingredient in any data mining process, we must take care it is of good quality. This module explains how the concept of data quality is different for data mining than it is for other business intelligence processes.
- Data selection
- Data cleaning
- Feature extraction
- Data integration and formatting
Modeling is the actual process of building mathematical models based on the data. You will get an overview of the different modeling techniques such as decission trees, neural networks, logistic regression, association rules and more!
- Modeling techniques
- Test design
- Model building
- Model assessment
- Data sources and data source views
- Decision trees
- Naïve Bayes
- Association rules
- Neural networks
- Time series
- Sequence clustering
Before a model can be used in production we must first be sure it's good enough. This can be done with statistical measurements, but also human inspection can be important.
- Model testing
- Model filtering
Data mining clients
For most of this training we use Visual Studio and Management Studio to build our models. But the applications which will query these models will be different. In this module we first show the Excel add-in for creating and consuming data mining models. Then the integration with Reporting Services is illustrated. We end with showing how .Net programmers can build applications on top of these models as well.
- Using the Excel data mining add-in
- Introduction into Data Mining eXpressions (DMX)
- Building DMX queries
- DMX in Reporting Services
- DMX in .Net applications
The business world is full of uncertainties. Nobody knows which customers are going
to switch to the competitors, how sales will evolve over the next months,... This
is why companies create models which help them tame this uncertainty. Data mining
(or predictive analytics)
is one of the techniques that help companies build models, which can then help decision
makers in their daily job. But data mining can do more than just that: data quality
control, data cleansing, analyzing social media, ... the list of machine learning applications
is nearly endless.
The goal of this course is twofold:
- Introducing participants in the world of data mining by means of a methodology:
data mining is much more than running a data mining tool, but also involves data
preparation, feature selection and extraction, evaluation of the data mining models,...
To achieve this, we will in this course study and apply the CRISP-DM data mining
methodology. You can't becoming a data scientist without a scientific methodology!
- A second goal in this data mining training is to become familiar with the SQL Server data mining tool: study
and apply the different data mining modeling techniques (decision trees, neural
networks,...) and the model evaluation tools available in SQL Server Analysis Services
This course is intended for people with no prior data mining knowledge who want
to understand when data mining can be used, and how to use it with SQL Server Analysis
Services. The target audience are BI developers who plan to develop data mining
solutions, as well as project managers who need to understand the key aspects of
building a data mining solution.
Prior knowledge on Analysis Services Multi-Dimensional is not needed, but we assume familiarity with
relational databases. Small parts of this course use Excel, .Net coding skills and Reporting
Services; at least a passive knowledge on these technologies is useful to
participate in the whole course.