Description

The underlying patterns in your data hold vital insights; unearth them with cutting-edge clustering and classification techniques in R

This course is your complete guide to both supervised and unsupervised learning using R. This course covers all the main aspects of practical data science; if you take this course, there is no need to take other courses or buy books on R-based data science. In this age of big data, companies across the Globe use R to sift through the avalanche of information at their disposal. By becoming proficient in unsupervised and supervised learning in R, you can give your company a competitive edge and take your career to the next level. Over the course of research, the author realized that almost all the R data science courses and books out there do take account of the multidimensional nature of the topic. This course will give you a robust grounding in the main aspects of machine learning: clustering and classification. Unlike other R instructors, the author digs deep into R's machine learning features and give you a one-of-a-kind grounding in data science! You will go all the way from carrying out data reading & cleaning to machine learning, to finally implementing powerful machine learning algorithms and evaluating their performance via R. The following topics will be covered: - • A full introduction to the R Framework for data science • Data structures and reading in R, including CSV, Excel, and HTML data • How to pre-process and clean data by removing NAs/No data, visualization • Machine learning, supervised learning, and unsupervised learning in R • Model building and selection and much more! The course will help you implement methods using real data obtained from different sources. Many courses use made-up data that does not empower students to implement R-based data science in real life. After taking this course, you'll easily use data science packages such as Caret to work with real data in R. You'll even understand concepts such as unsupervised learning, dimension reduction, and supervised learning. All the code and supporting files for this course are available at - https://github.com/PacktPublishing/Clustering-and-Classification-with-Machine-Learning-in-R

What You Will Learn

Read-in data into the R environment from different sources
Carry out basic data pre-processing and wrangling in R Studio
Implement unsupervised/clustering techniques such as K-means clustering
Implement dimensional reduction techniques (PCA) and feature selection
Implement supervised learning techniques/classification such as Random Forests
Evaluate model performance and learn the best practices for evaluating machine learning model accuracy

Audience

This course is for students interested in getting started with data science applications in the R Studio environment. Students wishing to learn how to implement unsupervised learning on real data. Anyone with prior exposure to R who wants to get started with practical data science.

Approach

Every video is packed with hands-on instructions and clear explanations. Real data has been used to demonstrate how to implement these techniques in real life, on your data.

Key Features

Provides in-depth training in everything you need to know to get started with practical R data science * Jargon-free and suitable for people who have a non-mathematical background * In-depth coverage of the latest unsupervised and supervised techniques

Github Repo

https://github.com/packtpublishing/clustering-and-classification-with-machine-learning-in-r

About the Author

Minerva Singh

Minerva Singh is a PhD graduate from Cambridge University where she specialized in Tropical Ecology. She is also a part-time Data Scientist. As part of her research, she must carry out extensive data analysis, including spatial data analysis. For this purpose, she prefers to use a combination of freeware tools: R, QGIS, and Python. She does most of her spatial data analysis work using R and QGIS. Apart from being free, these are very powerful tools for data visualization, processing, and analysis. She also holds an MPhil degree in Geography and Environment from Oxford University. She has honed her statistical and data analysis skills through several MOOCs, including The Analytics Edge and Statistical. In addition to spatial data analysis, she is also proficient in statistical analysis, machine learning, and data mining.

Course Outline

1. Introduction to the Course

1. Welcome to Clustering & Classification with Machine Learning in R

Introduction to the Course: Welcome to Clustering & Classification with Machine Learning in R

2. Installing R and R Studio

Introduction to the Course: Installing R and R Studio

2. Read in Data from Different Sources in R

1. Read in CSV & Excel Data

Read in Data from Different Sources in R: Read in CSV & Excel Data

2. Read in Unzipped Folder

Read in Data from Different Sources in R: Read in Unzipped Folder

3. Read in Online CSV

Read in Data from Different Sources in R: Read in Online CSV

4. Read in Googlesheets

Read in Data from Different Sources in R: Read in Googlesheets

5. Read in Data from Online HTML Tables-Part 1

Read in Data from Different Sources in R: Read in Data from Online HTML Tables-Part 1

6. Read in Data from Online HTML Tables-Part 2

Read in Data from Different Sources in R: Read in Data from Online HTML Tables-Part 2

7. Read Data from a Database

Read in Data from Different Sources in R: Read Data from a Database

3. Data Pre-processing and Visualization

1. Remove Missing Values

Data Pre-processing and Visualization: Remove Missing Values

2. More Data Cleaning

Data Pre-processing and Visualization: More Data Cleaning

3. Introduction to dplyr for Data Summarizing-Part 1

Data Pre-processing and Visualization: Introduction to dplyr for Data Summarizing-Part 1

4. Introduction to dplyr for Data Summarizing-Part 2

Data Pre-processing and Visualization: Introduction to dplyr for Data Summarizing-Part 2

5. Exploratory Data Analysis (EDA): Basic Visualizations with R

Data Pre-processing and Visualization: Exploratory Data Analysis (EDA): Basic Visualizations with R

6. More Exploratory Data Analysis with xda

Data Pre-processing and Visualization: More Exploratory Data Analysis with xda

7. Data Exploration & Visualization With dplyr & ggplot2

Data Pre-processing and Visualization: Data Exploration & Visualization With dplyr & ggplot2

8. Associations Between Quantitative Variables- Theory

Data Pre-processing and Visualization: Associations Between Quantitative Variables- Theory

9. Testing for Correlation

Data Pre-processing and Visualization: Testing for Correlation

10. Evaluate the Relation Between Nominal Variables

Data Pre-processing and Visualization: Evaluate the Relation Between Nominal Variables

11. Cramer's V for Examining the Strength of Association Between Nominal Variable

Data Pre-processing and Visualization: Cramer's V for Examining the Strength of Association Between Nominal Variable

4. Machine Learning for Data Science

1. How is Machine Learning Different from Statistical Data Analysis?

Machine Learning for Data Science: How is Machine Learning Different from Statistical Data Analysis?

2. What is Machine Learning (ML) About? Some Theoretical Pointers

Machine Learning for Data Science: What is Machine Learning (ML) About? Some Theoretical Pointers

5. Unsupervised Learning in R

1. K-Means Clustering

Unsupervised Learning in R: K-Means Clustering

2. Other Ways of Selecting Cluster Numbers

Unsupervised Learning in R: Other Ways of Selecting Cluster Numbers

3. Fuzzy K-Means Clustering

Unsupervised Learning in R: Fuzzy K-Means Clustering

4. Weighted k-means

Unsupervised Learning in R: Weighted k-means

5. Partitioning Around Meloids (PAM)

Unsupervised Learning in R: Partitioning Around Meloids (PAM)

6. Hierarchical Clustering in R

Unsupervised Learning in R: Hierarchical Clustering in R

7. Expectation-Maximization (EM) in R

Unsupervised Learning in R: Expectation-Maximization (EM) in R

8. DBSCAN Clustering in R

Unsupervised Learning in R: DBSCAN Clustering in R

9. Cluster a Mixed Dataset

Unsupervised Learning in R: Cluster a Mixed Dataset

10. Should We Even Do Clustering?

Unsupervised Learning in R: Should We Even Do Clustering?

11. Assess Clustering Performance

Unsupervised Learning in R: Assess Clustering Performance

12. Which Clustering Algorithm to Choose?

Unsupervised Learning in R: Which Clustering Algorithm to Choose?

6. Feature/Dimension Reduction

1. Dimension Reduction-theory

Feature/Dimension Reduction: Dimension Reduction-theory

2. Principal Component Analysis (PCA)

Feature/Dimension Reduction: Principal Component Analysis (PCA)

3. More on PCA

Feature/Dimension Reduction: More on PCA

4. Multidimensional Scaling

Feature/Dimension Reduction: Multidimensional Scaling

5. Singular Value Decomposition (SVD)

Feature/Dimension Reduction: Singular Value Decomposition (SVD)

7. Feature Selection to Select the Most Relevant Predictors

1. Removing Highly Correlated Predictor Variables

Feature Selection to Select the Most Relevant Predictors: Removing Highly Correlated Predictor Variables

2. Variable Selection Using LASSO Regression

Feature Selection to Select the Most Relevant Predictors: Variable Selection Using LASSO Regression

3. Variable Selection with FSelector

Feature Selection to Select the Most Relevant Predictors: Variable Selection with FSelector

4. Boruta Analysis for Feature Selection

Feature Selection to Select the Most Relevant Predictors: Boruta Analysis for Feature Selection

8. Supervised Learning Theory

1. Some Basic Supervised Learning Concepts

Supervised Learning Theory: Some Basic Supervised Learning Concepts

2. Pre-processing for Supervised Learning

Supervised Learning Theory: Pre-processing for Supervised Learning

9. Supervised Learning: Classification

1. What are GLMs?

Supervised Learning: Classification: What are GLMs?

2. Logistic Regression Models as Binary Classifiers

Supervised Learning: Classification: Logistic Regression Models as Binary Classifiers

3. Binary Classifier with PCA

Supervised Learning: Classification: Binary Classifier with PCA

4. Some Pointers on Evaluating Accuracy

Supervised Learning: Classification: Some Pointers on Evaluating Accuracy

5. Obtain Binary Classification Accuracy Metrics

Supervised Learning: Classification: Obtain Binary Classification Accuracy Metrics

6. More on Binary Accuracy Measures

Supervised Learning: Classification: More on Binary Accuracy Measures

7. Linear Discriminant Analysis

Supervised Learning: Classification: Linear Discriminant Analysis

8. Our Multi-class Classification Problem

Supervised Learning: Classification: Our Multi-class Classification Problem

9. Classification Trees

Supervised Learning: Classification: Classification Trees

10. More on Classification Tree Visualization

Supervised Learning: Classification: More on Classification Tree Visualization

11. Classification with Party Package

Supervised Learning: Classification: Classification with Party Package

12. Decision Trees

Supervised Learning: Classification: Decision Trees

13. Random Forest (RF) Classification

Supervised Learning: Classification: Random Forest (RF) Classification

14. Examine Individual Variable Importance for Random Forests

Supervised Learning: Classification: Examine Individual Variable Importance for Random Forests

15. GBM Classification

Supervised Learning: Classification: GBM Classification

16. Support Vector Machines (SVM) for Classification

Supervised Learning: Classification: Support Vector Machines (SVM) for Classification

17. More SVM for Classification

Supervised Learning: Classification: More SVM for Classification

18. Variable Importance in SVM Modelling with rminer

Supervised Learning: Classification: Variable Importance in SVM Modelling with rminer

10. Additional Lectures

1. Fuzzy C-Means Clustering

Additional Lectures: Fuzzy C-Means Clustering

Course Images

Clustering and Classification with Machine Learning in R

By Packt

Booking options

Highlights