Description

The course is crafted to reflect the most in-demand workplace skills. It will help you understand all the essential concepts and methodologies with regards to PySpark. This course provides a detailed compilation of all the basics, which will motivate you to make quick progress and experience much more than what you have learned.

The hottest buzzwords in the Big Data analytics industry are Python and Apache Spark. PySpark supports the collaboration of Python and Apache Spark. In this course, you'll start right from the basics and proceed to the advanced levels of data analysis. From cleaning data to building features and implementing machine learning (ML) models, you'll learn how to execute end-to-end workflows using PySpark. Right through the course, you'll be using PySpark to perform data analysis. You'll explore Spark RDDs, Dataframes, and a bit of Spark SQL queries. Also, you'll explore the transformations and actions that can be performed on the data using Spark RDDs and Dataframes. You'll also explore the ecosystem of Spark and Hadoop and their underlying architecture. You'll use the Databricks environment to run the Spark scripts and explore it as well. Finally, you'll have a taste of Spark with AWS cloud. You'll see how we can leverage AWS storages, databases, computations, and how Spark can communicate with different AWS services and get its required data. By the end of this course, you'll be able to understand and implement the concepts of PySpark and AWS to solve real-world problems. The code bundles are available here: https://github.com/PacktPublishing/PySpark-and-AWS-Master-Big-Data-with-PySpark-and-AWS

What You Will Learn

Learn the importance of Big Data
Explore the Spark and Hadoop architecture and ecosystem
Learn about PySpark Dataframes and PySpark DataFrames actions
Use PySpark DataFrames transformations
Apply collaborative filtering to develop a recommendation system using ALS models

Audience

This course requires python programming experience as a prerequisite.

Approach

In this learning-by-doing course, every theoretical explanation is followed by practical implementation. At the end of each concept, homework/tasks/activities/quizzes along with solutions are assigned. This is to evaluate and promote your learning based on the previous concepts and methods you have learned. Most of these activities will be coding-based, as the aim is to get you up and running with implementations.

Key Features

Relate the concepts and practical aspects of Spark and AWS with real-world problems * Implement any project that requires PySpark knowledge from scratch * Know the theory and practical aspects of PySpark and AWS

Github Repo

https://github.com/PacktPublishing/PySpark-and-AWS-Master-Big-Data-with-PySpark-and-AWS

About the Author

AI Sciences

AI Sciences are a group of experts, PhDs, and artificial intelligence practitioners, including computer science, machine learning, and Statistics. Some work in big companies such as Amazon, Google, Facebook, Microsoft, KPMG, BCG, and IBM. AI sciences produce a series of courses dedicated to beginners and newcomers on techniques and methods of machine learning, statistics, artificial intelligence, and data science. They aim to help those who wish to understand techniques more easily and start with less theory and less extended reading. Today, they publish more comprehensive courses on specific topics for wider audiences. Their courses have successfully helped more than 100,000 students master AI and data science.

Course Images

PySpark and AWS: Master Big Data with PySpark and AWS

By Packt

Booking options

Highlights

Description

What You Will Learn

Audience

Approach

Key Features

Github Repo

About the Author

AI Sciences

Course Outline

1. Introduction

2. Introduction to Hadoop, Spark EcoSystems and Architectures

3. Spark RDDs

4. Spark DFs

5. Collaborative filtering

6. Spark Streaming

7. ETL Pipeline

8. Project - Change Data Capture / Replication On Going

Course Content

About The Provider

Tags

Reviews