Description

Get to grips with real-time stream processing using PySpark as well as Spark structured streaming and apply that knowledge to build stream processing solutions. This course is example-driven and follows a working session-like approach.

Take your first steps towards discovering, learning, and using Apache Spark 3.0. We will be taking a live coding approach in this carefully structured course and explaining all the core concepts needed along the way. In this course, we will understand the real-time stream processing concepts, Spark structured streaming APIs, and architecture. We will work with file streams, Kafka source, and integrating Spark with Kafka. Next, we will learn about state-less and state-full streaming transformations. Then cover windowing aggregates using Spark stream. Next, we will cover watermarking and state cleanup. After that, we will cover streaming joins and aggregation, handling memory problems with streaming joins. Finally, learn to create arbitrary streaming sinks. By the end of this course, you will be able to create real-time stream processing applications using Apache Spark. All the resources for the course are available at https://github.com/PacktPublishing/Real-time-stream-processing-using-Apache-Spark-3-for-Python-developers

What You Will Learn

Explore state-less and state-full streaming transformations
Windowing aggregates using Spark stream
Learn Watermarking and state cleanup
Implement streaming joins and aggregations
Handling memory problems with streaming joins
Learn to create arbitrary streaming sinks

Audience

This course is designed for software engineers and architects who are willing to design and develop big data engineering projects using Apache Spark. It is also designed for programmers and developers who are aspiring to grow and learn data engineering using Apache Spark.

For this course, you need to know Spark fundamentals and should be exposed to Spark Dataframe APIs. Also, you should know Kafka fundamentals and have a working knowledge of Apache Kafka. One should also have programming knowledge of Python programming.

Approach

This course is example-driven and follows a working session-like approach. The course delivers live coding sessions and explains the concepts along the way.

Key Features

Learn real-time stream processing concepts * Understand Spark structured streaming APIs and architecture * Work with file streams, Kafka source, and integrating Spark with Kafka

Github Repo

https://github.com/PacktPublishing/Real-time-stream-processing-using-Apache-Spark-3-for-Python-developers

About the Author

Scholar Nest

ScholarNest is a small team of people passionate about helping others learn and grow in their careers by bridging the gap between their existing and required skills. Together, they have over 40+ years of experience in IT as a developer, architect, consultant, trainer, and mentor. They have worked with international software services organizations on various data-centric and Big Data projects. It is a team of firm believers in lifelong continuous learning and skill development. To popularize the importance of continuous learning, they started publishing free training videos on their YouTube channel. They conceptualized the notion of continuous learning, creating a journal of our learning under the Learning Journal banner.

Course Images

Real-Time Stream Processing Using Apache Spark 3 for Python Developers

By Packt

Booking options

Highlights

Description

What You Will Learn

Audience

Approach

Key Features

Github Repo

About the Author

Scholar Nest

Course Outline

1. Before you Start

2. Setup your Environment

3. Getting started with Spark Structured Streaming

4. Spark Streaming with Kafka

5. Windowing and Aggregates

6. Stream Processing and Joins

7. Keep Learning

Course Content

About The Provider

Tags

Reviews