Cademy logoCademy Marketplace

Course Images

Big Data for Architects

Big Data for Architects

🔥 Limited Time Offer 🔥

Get a 10% discount on your first order when you use this promo code at checkout: MAY24BAN3X

  • 30 Day Money Back Guarantee
  • Completion Certificate
  • 24/7 Technical Support

Highlights

  • On-Demand course

  • 8 hours 19 minutes

  • All levels

Description

This course will help you explore the world of Big Data technologies and frameworks. You will develop skills that will help you to pick the right Big Data technology and framework for your job and build the confidence to design robust Big Data pipelines.

Do you want a guide that will help you to pick the right Big Data technology for your project? Or do you want to get a solid understanding of the Big Data architecture and pipelines? This course will help you out. After highlighting the course structure and learning objectives, the course will take you through the steps needed for setting up the environment. Next, you will understand the Big Data logical architecture, study the evolution of Big Data technologies, and explore Big Data pipelines. Moving along, you will become familiar with ingestion frameworks, such as Kafka, Flume, Nifi, and Sqoop. Next, you will learn about key storage frameworks, such as HDFS, HBase, Kudu, and Cassandra. Finally, you will go through the various data formats and uncover key data processing and data analysis frameworks. By the end of this course, you will have a good understanding of the Big Data architecture and technologies and will have developed the skills to build real-world Big Data pipelines. All the resources and support files for this course are available at https://github.com/PacktPublishing/Big-Data-for-Architects

What You Will Learn

Create a Google account and a Dataproc cluster
Understand the Big Data architecture and pipelines
Study factors to consider while comparing ingestion frameworks
Gain a solid understanding of storage frameworks
Distinguish between text and binary data format
Find the key differences between the Spark, Tez, and Flink frameworks
Build a scalable Extract, Transform, Load (ETL) pipeline with Kafka Connect

Audience

If you are a software engineer, who is looking to build Big Data pipelines or planning to appear for certifications such as CCA175 or CCA159, this video course is for you. A basic understanding of Big Data is needed to get started with this course.

Approach

With the help of simple explanations, white-board sessions, and interesting activities, this course will make you familiar with the Big Data architecture and technologies. It will make you confident to design Big Data pipelines using modern frameworks.

Key Features

Get a holistic picture of the Big Data ecosystem * Become an expert in choosing Big Data technology as per the requirements * Get ready to build end-to-end Big Data batch and streaming pipelines

Github Repo

https://github.com/PacktPublishing/Big-Data-for-Architects

About the Author

Bhavuk Chawla

Bhavuk Chawla has over 16 years of experience in IT, more than 8 years of experience implementing Cloud/ML/AI/Big Data Science related projects. He is an official instructor for Google, Confluent, and Cloudera. He has delivered and continues to deliver his training sessions in various companies including Google Singapore, Microsoft Bengaluru (Bangalore), Starbucks Coffee Seattle, Adobe India, EMEA Region, and more. He was recognized by Cloudera as the Instructor of the Year 2016 (APAC) for his exceptionally high ratings received in various training sessions.

Course Outline

1. Introduction

1. Course Structure and Approach

This video highlights the course structure and explains how to approach the course.

2. Course Pre-Requisites

This video focuses on the course pre-requisites.

3. Course Audience

This video focuses on the course audience.

4. About the Author

This video introduces you to the author.


2. Setting Up the Environment

1. Setting up a Google Cloud Account

This video demonstrates how to set up a Google Cloud account.

2. Creating a Dataproc Cluster

This video explains how to create a Dataproc cluster.

3. Google Cloud Platform (GCP) Account Best Practices

This video focuses on the best practices when using a GCP account.


3. Holistic View of Architectures and Pipelines

1. Big Data Logical Architecture

This video explains the Big Data logical architecture.

2. Evolution of Big Data Technologies

This video focuses on the evolution of Big Data technologies.

3. Key Big Data Architectures

This video explains key Big Data architectures.

4. Typical Big Data Batch Pipeline

This video introduces you to the typical Big Data batch pipeline.

5. Typical Big Data Streaming Pipeline

This video explains the typical Big Data streaming pipeline.

6. Example 01: Big Data Streaming Pipeline

This video presents an example of Big Data streaming pipeline.

7. Example 02: Big Data Streaming Pipeline

This video presents another example of a Big Data streaming pipeline.


4. Key Ingestion/Dataflow Frameworks

1. Factors to Consider while Comparing Ingestion Frameworks

This video highlights the factors to consider while comparing ingestion frameworks.

2. Kafka Versus Flume

This video highlights the difference between Kafka and Flume.

3. NiFi Versus Kafka

This video provides the difference between NiFi and Kafka.

4. Sqoop Versus Flume

This video explains the difference between Sqoop and Flume.

5. Sqoop Versus Kafka Connect

This video highlights the difference between Sqoop and Kafka Connect.

6. Installing NiFi

This video demonstrates how to install NiFi.

7. Installing Kafka

This video explains how to install Kafka.

8. Hands-on Kafka and NiFi Integration Background

This video provides a background of Kafka and NiFi integration.

9. Integrating Kafka and NiFi

This video shows how to integrate Kafka and NiFi.


5. Key Storage Frameworks

1. Factors to Consider while Comparing Storage Frameworks

This video highlights the factors to consider while comparing storage frameworks.

2. Hadoop Distributed File System (HDFS) Versus HBase

This video highlights the difference between HDFS and HBase.

3. HBase Versus Kudu

This video explains the difference between HBase and Kudu.

4. Hadoop Distributed File System (HDFS) Versus Kudu

This video provides the difference between HDFS and Kudu.

5. HBase Versus Cassandra

This video highlights the difference between HBase and Cassandra.


6. Data formats

1. Text Versus Binary

This video highlights the difference between text and binary.

2. Interoperability

This video focuses on interoperability.

3. Row-Oriented Versus Column-Oriented

This video explains the difference between row-oriented and column-oriented.

4. Splittable Formats

This video introduces you to splittable formats.

5. Schema Evolution

This video focuses on schema evolution.

6. Comparing Data Formats

This video compares data formats.

7. Installing Sqoop on Dataproc Cluster

This video demonstrates how to install Sqoop on Dataproc cluster.

8. Hands-on Big Data Batch Pipeline Using the Avro Format

This video focuses on Big Data batch pipeline using the Avro format.


7. Key Data Processing Frameworks

1. Factors to Consider while Comparing Processing Frameworks

This video highlights the factors to consider while comparing processing frameworks.

2. MapReduce (MR) Versus Spark Logical Architecture

This video highlights the difference between MR and Spark logical architecture.

3. MapReduce (MR) Versus Spark Performance

This video provides the difference between MR and Spark performance.

4. Spark Versus Tez

This video explains the difference between Spark and Tez.

5. Spark Versus Flink

This video highlights the difference between Spark and Flink.

6. Kafka Streams Versus Spark Streaming

This video highlights the difference between Kafka streams and Spark streaming.

7. Spark 2.x Streaming Versus Spark 1.x Streaming

This video provides the difference between Spark 2.x streaming and Spark 1.x streaming.

8. Spark Core Versus Spark Structured Query Language (SQL)

This video explains the difference between Spark core and Spark SQL.

9. Integrating Kafka and Spark Streaming

This video demonstrates how to integrate Kafka and Spark streaming.


8. Key Data Analysis Frameworks

1. Factors to Consider while Comparing Analysis Frameworks

This video highlights the factors to consider while comparing analysis frameworks.

2. Hive Versus Impala

This video highlights the difference between Hive and Impala.

3. Hive Versus Pig

This video provides the difference between Hive and Pig.

4. Hive Versus Spark Structured Query Language (SQL)

This video explains the difference between Hive and Spark SQL.

5. Hive Versus Hive Live Long and Process (LLAP) Versus Impala

This video highlights the difference between Hive, Hive LLAP, and Impala.

6. Hive Versus KSQL

This video provides the difference between Hive and KSQL.

7. KSQL Versus KSQLDB

This video explains the difference between KSQL and KSQLDB.

8. Hands-On KSQL

This video explains how to work with KSQL.

9. Writing to a Stream and Table Using KSQL

This video demonstrates how to write to a stream and table using KSQL.

10. Streaming Extract, Transform, Load (ETL) Pipeline Background

This video provides a background of how to stream the ETL pipeline.

11. Building a Scalable Extract, Transform, Load (ETL) Pipeline with Kafka Connect - Part 1

This is the first part of the two-part video that shows how to build a scalable ETL pipeline with Kafka Connect.

12. Building a Scalable Extract, Transform, Load (ETL) Pipeline with Kafka Connect - Part 2

This is the second part of the two-part video that demonstrates how to build a scalable ETL pipeline with the Kafka Connect.


9. Delta Lake

1. Delta Architecture

This video explains the Delta architecture in detail.

2. Why Delta Lake

Let's understand why Delta Lake is important in this lesson.

3. Challenges with Delta Lake

This video talks about the different challenges with Delta Lake.

4. Delta Lake Demo

Let's take a look at Delta Lake demonstration in this video session.


10. Additional Material

1. Solr Versus Elasticsearch

This video highlights the difference between Solr and Elasticsearch.

2. Cloudera Search Versus Solr

This video provides the difference between Cloudera search and Solr.

3. Oozie Versus Airflow

This video explains the difference between Oozie and Airflow.

4. KSQL Versus KStreams

This video highlights the difference between KSQL and KStreams.


11. Summary

1. Conclusion

This video provides the course conclusion.

Course Content

  1. Big Data for Architects

About The Provider

Packt
Packt
Birmingham
Founded in 2004 in Birmingham, UK, Packt’s mission is to help the world put software to work in new ways, through the delivery of effective learning and i...
Read more about Packt

Tags

Reviews