Description

A complete course on Sqoop, Flume, and Hive: Ideal for achieving CCA175 and Hortonworks Spark Certification

In this course, you will start by learning about the Hadoop Distributed File System (HDFS) and the most common Hadoop commands required to work with HDFS. Next, you'll be introduced to Sqoop Import, which will help you gain insights into the lifecycle of the Sqoop command and how to use the import command to migrate data from MySQL to HDFS, and from MySQL to Hive. In addition to this, you will get up to speed with Sqoop Export for migrating data effectively, along with using Apache Flume to ingest data. As you progress, you will delve into Apache Hive, external and managed tables, working with different files, and Parquet and Avro. Toward the concluding section, you will focus on Spark DataFrames and Spark SQL. By the end of this course, you will have gained comprehensive insights into big data ingestion and analytics with Flume, Sqoop, Hive, and Spark. All code and supporting files are available at - https://github.com/PacktPublishing/Master-Big-Data-Ingestion-and-Analytics-with-Flume-Sqoop-Hive-and-Spark

What You Will Learn

Explore the Hadoop Distributed File System (HDFS) and commands
Get to grips with the lifecycle of the Sqoop command
Use the Sqoop Import command to migrate data from MySQL to HDFS and Hive
Understand split-by and boundary queries
Use the incremental mode to migrate data from MySQL to HDFS
Employ Sqoop Export to migrate data from HDFS to MySQL
Discover Spark DataFrames and gain insights into working with different file formats and compression

Audience

This course is for anyone who wants to learn Sqoop and Flume or those looking to achieve CCA and HDP certification.

Approach

A complete course packed with step-by-step instructions, working examples, and helpful advice. This course is systematically divided into small sections that will help you understand each part individually and learn at your own pace.

Key Features

Learn Sqoop, Flume, and Hive and successfully achieve CCA175 and Hortonworks Spark Certification * Understand the Hadoop Distributed File System (HDFS), along with exploring Hadoop commands to work effectively with HDFS

Github Repo

https://github.com/packtpublishing/master-big-data-ingestion-and-analytics-with-flume-sqoop-hive-and-spark

About the Author

Navdeep Kaur

Navdeep Kaur - Technical Trainer Navdeep Kaur is a big data professionals with 11 years of industry experience in different technologies and domains. She has a keen interest in providing training in new technologies. She has received CCA175 Hadoop and Spark developer certification and AWS solution architect certification. She loves guiding people and helping them achieves new goals.

Course Outline

1. Hadoop Introduction

1. HDFS and Hadoop Commands

Hadoop Introduction: HDFS and Hadoop Commands

2. Sqoop Import

1. Sqoop Introduction

Sqoop Import: Sqoop Introduction

2. Managing Target Directories

Sqoop Import: Managing Target Directories

3. Working with Different File Formats

Sqoop Import: Working with Different File Formats

4. Working with Different Compressions

Sqoop Import: Working with Different Compressions

5. Conditional Imports

Sqoop Import: Conditional Imports

6. Split-by and Boundary Queries

Sqoop Import: Split-by and Boundary Queries

7. Field delimeters

Sqoop Import: Field delimeters

8. Incremental Appends

Sqoop Import: Incremental Appends

9. Sqoop Hive Import

Sqoop Import: Sqoop Hive Import

10. Sqoop List Tables/Database

Sqoop Import: Sqoop List Tables/Database

11. Sqoop Import Practice1

Sqoop Import: Sqoop Import Practice1

12. Sqoop Import Practice2

Sqoop Import: Sqoop Import Practice2

13. Sqoop Import Practice3

Sqoop Import: Sqoop Import Practice3

3. Sqoop Export

1. Export from Hdfs to Mysql

Sqoop Export: Export from Hdfs to Mysql

2. Export from Hive to Mysql

Sqoop Export: Export from Hive to Mysql

4. Apache Flume

1. Flume Introduction & Architecture

Apache Flume: Flume Introduction & Architecture

2. Exec Source and Logger Sink

Apache Flume: Exec Source and Logger Sink

3. Moving data from Twitter to HDFS

Apache Flume: Moving data from Twitter to HDFS

4. Moving data from NetCat to HDFS

Apache Flume: Moving data from NetCat to HDFS

5. Flume Interceptors

Apache Flume: Flume Interceptors

6. Flume Interceptor Example

Apache Flume: Flume Interceptor Example

7. Flume Multi-Agent Flow

Apache Flume: Flume Multi-Agent Flow

8. Flume Consolidation

Apache Flume: Flume Consolidation

5. Apache Hive

1. Hive Introduction

Apache Hive: Hive Introduction

2. Hive Database

Apache Hive: Hive Database

3. Hive Managed Tables

Apache Hive: Hive Managed Tables

4. Hive External Tables

Apache Hive: Hive External Tables

5. Hive Inserts

Apache Hive: Hive Inserts

6. Hive Analytics

Apache Hive: Hive Analytics

7. Working with Parquet

Apache Hive: Working with Parquet

8. Compressing Parquet

Apache Hive: Compressing Parquet

9. Working with Fixed File Format

Apache Hive: Working with Fixed File Format

10. Alter Command

Apache Hive: Alter Command

11. Hive String Functions

Apache Hive: Hive String Functions

12. Hive Date Functions

Apache Hive: Hive Date Functions

13. Hive Partitioning

Apache Hive: Hive Partitioning

14. Hive Bucketing

Apache Hive: Hive Bucketing

6. Spark Introduction

1. Spark Introduction

Spark Introduction: Spark Introduction

2. Resilient Distributed Datasets

Spark Introduction: Resilient Distributed Datasets

3. Cluster Overview

Spark Introduction: Cluster Overview

4. Directed Acyclic Graph (DAG) & Stages

Spark Introduction: Directed Acyclic Graph (DAG) & Stages

7. Spark Transformations & Actions

1. Map/FlatMap Transformation

Spark Transformations & Actions: Map/FlatMap Transformation

2. Filter/Intersection

Spark Transformations & Actions: Filter/Intersection

3. Union/Distinct Transformation

Spark Transformations & Actions: Union/Distinct Transformation

4. GroupByKey/ Group people based on Birthday months

Spark Transformations & Actions: GroupByKey/ Group people based on Birthday months

5. ReduceByKey / Total Number of students in each Subject

Spark Transformations & Actions: ReduceByKey / Total Number of students in each Subject

6. SortByKey / Sort students based on their rollno

Spark Transformations & Actions: SortByKey / Sort students based on their rollno

7. MapPartition / MapPartitionWithIndex

Spark Transformations & Actions: MapPartition / MapPartitionWithIndex

8. Change number of Partitions

Spark Transformations & Actions: Change number of Partitions

9. Join / Join email address based on customer name

Spark Transformations & Actions: Join / Join email address based on customer name

10. Spark Actions

Spark Transformations & Actions: Spark Actions

8. Spark RDD Practice

1. Scala Tuples

Spark RDD Practice: Scala Tuples

2. Extract Error Logs from log files

Spark RDD Practice: Extract Error Logs from log files

3. Frequency of word in Text File

Spark RDD Practice: Frequency of word in Text File

4. Population of each City

Spark RDD Practice: Population of each City

5. Orders placed by Customers

Spark RDD Practice: Orders placed by Customers

6. Movie Average Rating greater than 3

Spark RDD Practice: Movie Average Rating greater than 3

9. Spark Dataframes & Spark SQL

1. Dataframe Intro

Spark Dataframes & Spark SQL: Dataframe Intro

2. Dafaframe from Json Files

Spark Dataframes & Spark SQL: Dafaframe from Json Files

3. Dataframe from Parquet Files

Spark Dataframes & Spark SQL: Dataframe from Parquet Files

4. Dataframe from CSV Files

Spark Dataframes & Spark SQL: Dataframe from CSV Files

5. Dataframe from Avro/XML Files

Spark Dataframes & Spark SQL: Dataframe from Avro/XML Files

6. Working with Different Compressions

Spark Dataframes & Spark SQL: Working with Different Compressions

7. DataFrame API Part1

Spark Dataframes & Spark SQL: DataFrame API Part1

8. DataFrame API Part2

Spark Dataframes & Spark SQL: DataFrame API Part2

9. Spark SQL

Spark Dataframes & Spark SQL: Spark SQL

10. Working with Hive Tables in Spark

Spark Dataframes & Spark SQL: Working with Hive Tables in Spark

Course Images

Master Big Data Ingestion and Analytics with Flume, Sqoop, Hive and Spark

By Packt

Booking options

Highlights