• Professional Development
  • Medicine & Nursing
  • Arts & Crafts
  • Health & Wellbeing
  • Personal Development

36 Apache Spark courses

🔥 Limited Time Offer 🔥

Get a 10% discount on your first order when you use this promo code at checkout: MAY24BAN3X

DP-900T00 Microsoft Azure Data Fundamentals

By Nexus Human

Duration 1 Days 6 CPD hours This course is intended for The audience for this course is individuals who want to learn the fundamentals of database concepts in a cloud environment, get basic skilling in cloud data services, and build their foundational knowledge of cloud data services within Microsoft Azure. Overview Describe core data concepts Identify considerations for relational data on Azure Describe considerations for working with non-relational data on Azure Describe an analytics workload on Azure In this course, students will gain foundational knowledge of core data concepts and related Microsoft Azure data services. Students will learn about core data concepts such as relational, non-relational, big data, and analytics, and build their foundational knowledge of cloud data services within Microsoft Azure. Students will explore fundamental relational data concepts and relational database services in Azure. They will explore Azure storage for non-relational data and the fundamentals of Azure Cosmos DB. Students will learn about large-scale data warehousing, real-time analytics, and data visualization. 1 - EXPLORE CORE DATA CONCEPTS * Identify data formats * Explore file storage * Explore databases * Explore transactional data processing * Explore analytical data processing 2 - EXPLORE DATA ROLES AND SERVICES * Explore job roles in the world of data * Identify data services 3 - EXPLORE FUNDAMENTAL RELATIONAL DATA CONCEPTS * Understand relational data * Understand normalization * Explore SQL * Describe database objects 4 - EXPLORE RELATIONAL DATABASE SERVICES IN AZURE * Describe Azure SQL services and capabilities * Describe Azure services for open-source databases 5 - EXPLORE AZURE STORAGE FOR NON-RELATIONAL DATA * Explore Azure blob storage * Explore Azure DataLake Storage Gen2 * Explore Azure Files * Explore Azure Tables 6 - EXPLORE FUNDAMENTALS OF AZURE COSMOS DB * Describe Azure Cosmos DB * Identify Azure Cosmos DB APIs 7 - EXPLORE FUNDAMENTALS OF LARGE-SCALE DATA WAREHOUSING * Describe data warehousing architecture * Explore data ingestion pipelines * Explore analytical data stores 8 - EXPLORE FUNDAMENTALS OF REAL-TIME ANALYTICS * Understand batch and stream processing * Explore common elements of stream processing architecture * Explore Azure Stream Analytics * Explore Apache Spark on Microsoft Azure 9 - EXPLORE FUNDAMENTALS OF DATA VISUALIZATION * Describe Power BI tools and workflow * Describe core concepts of data modeling * Describe considerations for data visualization

DP-900T00 Microsoft Azure Data Fundamentals
Delivered OnlineTwo days, Jun 24th, 13:00 + 3 more
£595

PySpark and AWS: Master Big Data with PySpark and AWS

By Packt

The course is crafted to reflect the most in-demand workplace skills. It will help you understand all the essential concepts and methodologies with regards to PySpark. This course provides a detailed compilation of all the basics, which will motivate you to make quick progress and experience much more than what you have learned.

PySpark and AWS: Master Big Data with PySpark and AWS
Delivered Online On Demand
£101.99

Building Batch Data Analytics Solutions on AWS

By Nexus Human

Duration 1 Days 6 CPD hours This course is intended for This course is intended for: Data platform engineers Architects and operators who build and manage data analytics pipelines Overview In this course, you will learn to: Compare the features and benefits of data warehouses, data lakes, and modern data architectures Design and implement a batch data analytics solution Identify and apply appropriate techniques, including compression, to optimize data storage Select and deploy appropriate options to ingest, transform, and store data Choose the appropriate instance and node types, clusters, auto scaling, and network topology for a particular business use case Understand how data storage and processing affect the analysis and visualization mechanisms needed to gain actionable business insights Secure data at rest and in transit Monitor analytics workloads to identify and remediate problems Apply cost management best practices In this course, you will learn to build batch data analytics solutions using Amazon EMR, an enterprise-grade Apache Spark and Apache Hadoop managed service. You will learn how Amazon EMR integrates with open-source projects such as Apache Hive, Hue, and HBase, and with AWS services such as AWS Glue and AWS Lake Formation. The course addresses data collection, ingestion, cataloging, storage, and processing components in the context of Spark and Hadoop. You will learn to use EMR Notebooks to support both analytics and machine learning workloads. You will also learn to apply security, performance, and cost management best practices to the operation of Amazon EMR. MODULE A: OVERVIEW OF DATA ANALYTICS AND THE DATA PIPELINE * Data analytics use cases * Using the data pipeline for analytics MODULE 1: INTRODUCTION TO AMAZON EMR * Using Amazon EMR in analytics solutions * Amazon EMR cluster architecture * Interactive Demo 1: Launching an Amazon EMR cluster * Cost management strategies MODULE 2: DATA ANALYTICS PIPELINE USING AMAZON EMR: INGESTION AND STORAGE * Storage optimization with Amazon EMR * Data ingestion techniques MODULE 3: HIGH-PERFORMANCE BATCH DATA ANALYTICS USING APACHE SPARK ON AMAZON EMR * Apache Spark on Amazon EMR use cases * Why Apache Spark on Amazon EMR * Spark concepts * Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell * Transformation, processing, and analytics * Using notebooks with Amazon EMR * Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR MODULE 4: PROCESSING AND ANALYZING BATCH DATA WITH AMAZON EMR AND APACHE HIVE * Using Amazon EMR with Hive to process batch data * Transformation, processing, and analytics * Practice Lab 2: Batch data processing using Amazon EMR with Hive * Introduction to Apache HBase on Amazon EMR MODULE 5: SERVERLESS DATA PROCESSING * Serverless data processing, transformation, and analytics * Using AWS Glue with Amazon EMR workloads * Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions MODULE 6: SECURITY AND MONITORING OF AMAZON EMR CLUSTERS * Securing EMR clusters * Interactive Demo 3: Client-side encryption with EMRFS * Monitoring and troubleshooting Amazon EMR clusters * Demo: Reviewing Apache Spark cluster history MODULE 7: DESIGNING BATCH DATA ANALYTICS SOLUTIONS * Batch data analytics use cases * Activity: Designing a batch data analytics workflow MODULE B: DEVELOPING MODERN DATA ARCHITECTURES ON AWS * Modern data architectures

Building Batch Data Analytics Solutions on AWS
Delivered on-request, onlineDelivered Online
Price on Enquiry

Cloudera Data Scientist Training

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview Overview of data science and machine learning at scale Overview of the Hadoop ecosystem Working with HDFS data and Hive tables using Hue Introduction to Cloudera Data Science Workbench Overview of Apache Spark 2 Reading and writing data Inspecting data quality Cleansing and transforming data Summarizing and grouping data Combining, splitting, and reshaping data Exploring data Configuring, monitoring, and troubleshooting Spark applications Overview of machine learning in Spark MLlib Extracting, transforming, and selecting features Building and evaluating regression models Building and evaluating classification models Building and evaluating clustering models Cross-validating models and tuning hyperparameters Building machine learning pipelines Deploying machine learning models Spark, Spark SQL, and Spark MLlib PySpark and sparklyr Cloudera Data Science Workbench (CDSW) Hue This workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges. Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment. The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. OVERVIEW OF DATA SCIENCE AND MACHINE LEARNING AT SCALE OVERVIEW OF THE HADOOP ECOSYSTEM WORKING WITH HDFS DATA AND HIVE TABLES USING HUE INTRODUCTION TO CLOUDERA DATA SCIENCE WORKBENCH OVERVIEW OF APACHE SPARK 2 READING AND WRITING DATA INSPECTING DATA QUALITY CLEANSING AND TRANSFORMING DATA SUMMARIZING AND GROUPING DATA COMBINING, SPLITTING, AND RESHAPING DATA EXPLORING DATA CONFIGURING, MONITORING, AND TROUBLESHOOTING SPARK APPLICATIONS OVERVIEW OF MACHINE LEARNING IN SPARK MLLIB EXTRACTING, TRANSFORMING, AND SELECTING FEATURES BUILDING AND EVAUATING REGRESSION MODELS BUILDING AND EVALUATING CLASSIFICATION MODELS BUILDING AND EVALUATING CLUSTERING MODELS CROSS-VALIDATING MODELS AND TUNING HYPERPARAMETERS BUILDING MACHINE LEARNING PIPELINES DEPLOYING MACHINE LEARNING MODELS ADDITIONAL COURSE DETAILS: Nexus Humans Cloudera Data Scientist Training training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Cloudera Data Scientist Training course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Cloudera Data Scientist Training
Delivered on-request, onlineDelivered Online
Price on Enquiry

Professional Certificate Course in Big Data Infrastructure in London 2024

4.9(261)

By Metropolitan School of Business & Management UK

Dive into the heart of Big Data Infrastructure, exploring storage systems, distributed file frameworks, and processing paradigms. This course provides a comprehensive understanding of key components like HDFS, Apache Spark, and Cassandra, offering insights into their architecture, use cases, and real-world applications. This course is a deep dive into the complex landscape of Big Data Infrastructure. From unravelling the architecture of Apache Spark to dissecting the benefits of distributed file systems, participants gain expertise in assessing, comparing, and implementing various Big Data storage and processing systems. Scalability, fault-tolerance, and industry-specific case studies add practical depth to theoretical knowledge. After the successful completion of this course, you will be able to: * Understand the Components of Big Data Infrastructure, Including Storage Systems, Distributed File Systems, and Processing Frameworks. * Identify the Characteristics and Benefits of Distributed File Systems Such as Hadoop Distributed File System (H.D.F.S). * Describe the Architecture and Capabilities of Apache Spark and its Role in Big Data Processing. * Recognise the Use Cases and Benefits of Apache Cassandra as a Distributed N..O.S.Q.L Database. * Compare and Contrast Different Big Data Storage and Processing Systems Such as Hadoop, Spark, and Cassandra. * Understand the Scalability and Fault-tolerance Mechanisms Used in Big Data Infrastructure, Such as Sharding and Replication. * Appreciate the Challenges Associated with Deploying and Managing Big Data Infrastructure, Such as Hardware and Software Configuration and Security Considerations. Explore the intricacies of Big Data Infrastructure, from understanding storage systems to unraveling the nuances of distributed file frameworks and processing engines. Gain a comprehensive view of scalability, fault-tolerance mechanisms, and industry-specific challenges through engaging case studies. Equip yourself to navigate the dynamic landscape of Big Data with confidence and expertise. * VIDEO - COURSE STRUCTURE AND ASSESSMENT GUIDELINES Watch this video to gain further insight. * NAVIGATING THE MSBM STUDY PORTAL Watch this video to gain further insight. * INTERACTING WITH LECTURES/LEARNING COMPONENTS Watch this video to gain further insight. * BIG DATA INFRASTRUCTURE Self-paced pre-recorded learning content on this topic. * BIG DATA INFRASTRUCTURE Put your knowledge to the test with this quiz. Read each question carefully and choose the response that you feel is correct. All MSBM courses are accredited by the relevant partners and awarding bodies. Please refer to MSBM accreditation in about us for more details. There are no strict entry requirements for this course. Work experience will be an added advantage to understanding the content of the course. The certificate is designed to enhance the learner's knowledge in the field. This certificate is for everyone who is eager to know more and get updated on current ideas in their respective field. We recommend this certificate for the following audience. * Big Data Infrastructure Engineer * Hadoop Administrator * Spark Developer * Cassandra Database Administrator * Big Data Solutions Architect * Data Infrastructure Manager * NoSQL Database Analyst * Big Data Consultant AVERAGE COMPLETION TIME 2 Weeks ACCREDITATION 3 CPD Hours LEVEL Advanced START TIME Anytime 100% ONLINE Study online with ease. UNLIMITED ACCESS 24/7 unlimited access with pre-recorded lectures. LOW FEES Our fees are low and easy to pay online.

Professional Certificate Course in Big Data Infrastructure in London 2024
Delivered Online On Demand
£28

Building Big Data Pipelines with PySpark MongoDB and Bokeh

By Apex Learning

OVERVIEW This comprehensive course on Building Big Data Pipelines with PySpark MongoDB and Bokeh will deepen your understanding on this topic. After successful completion of this course you can acquire the required skills in this sector. This Building Big Data Pipelines with PySpark MongoDB and Bokeh comes with accredited certification from CPD, which will enhance your CV and make you worthy in the job market.    So enrol in this course today to fast-track your career ladder. HOW WILL I GET MY CERTIFICATE? You may have to take a quiz or a written test online during or after the course. After successfully completing the course, you will be eligible for the certificate. WHO IS THIS COURSE FOR? There is no experience or previous qualifications required for enrolment on this Building Big Data Pipelines with PySpark MongoDB and Bokeh. It is available to all students, of all academic backgrounds. REQUIREMENTS Our Building Big Data Pipelines with PySpark MongoDB and Bokeh is fully compatible with PC's, Mac's, Laptop, Tablet and Smartphone devices. This course has been designed to be fully compatible with tablets and smartphones so you can access your course on Wi-Fi, 3G or 4G. There is no time limit for completing this course, it can be studied in your own time at your own pace. CAREER PATH Learning this new skill will help you to advance in your career. It will diversify your job options and help you develop new techniques to keep up with the fast-changing world. This skillset will help you to- * Open doors of opportunities * Increase your adaptability * Keep you relevant * Boost confidence And much more! COURSE CURRICULUM 7 sections • 25 lectures • 05:04:00 total length •Introduction: 00:10:00 •Python Installation: 00:03:00 •Installing Third Party Libraries: 00:03:00 •Installing Apache Spark: 00:12:00 •Installing Java (Optional): 00:05:00 •Testing Apache Spark Installation: 00:06:00 •Installing MongoDB: 00:04:00 •Installing NoSQL Booster for MongoDB: 00:07:00 •Integrating PySpark with Jupyter Notebook: 00:05:00 •Data Extraction: 00:19:00 •Data Transformation: 00:15:00 •Loading Data into MongoDB: 00:13:00 •Data Pre-processing: 00:19:00 •Building the Predictive Model: 00:12:00 •Creating the Prediction Dataset: 00:08:00 •Loading the Data Sources from MongoDB: 00:17:00 •Creating a Map Plot: 00:33:00 •Creating a Bar Chart: 00:09:00 •Creating a Magnitude Plot: 00:15:00 •Creating a Grid Plot: 00:09:00 •Installing Visual Studio Code: 00:05:00 •Creating the PySpark ETL Script: 00:24:00 •Creating the Machine Learning Script: 00:30:00 •Creating the Dashboard Server: 00:21:00 •Source Code and Notebook: 00:00:00

Building Big Data Pipelines with PySpark MongoDB and Bokeh
Delivered Online On Demand
£12

Building Recommender Systems with Machine Learning and AI

By Packt

Are you fascinated with Netflix and YouTube recommendations and how they accurately recommend content that you would like to watch? Are you looking for a practical course that will teach you how to build intelligent recommendation systems? This course will show you how to build accurate recommendation systems in Python using real-world examples.

Building Recommender Systems with Machine Learning and AI
Delivered Online On Demand
£44.99

Develop Big Data Pipelines with R & Sparklyr & Tableau

By Apex Learning

OVERVIEW This comprehensive course on Develop Big Data Pipelines with R & Sparklyr & Tableau will deepen your understanding on this topic. After successful completion of this course you can acquire the required skills in this sector. This Develop Big Data Pipelines with R & Sparklyr & Tableau comes with accredited certification from CPD, which will enhance your CV and make you worthy in the job market.    So enrol in this course today to fast-track your career ladder. HOW WILL I GET MY CERTIFICATE? You may have to take a quiz or a written test online during or after the course. After successfully completing the course, you will be eligible for the certificate. WHO IS THIS COURSE FOR? There is no experience or previous qualifications required for enrolment on this Develop Big Data Pipelines with R & Sparklyr & Tableau. It is available to all students, of all academic backgrounds. REQUIREMENTS Our Develop Big Data Pipelines with R & Sparklyr & Tableau is fully compatible with PC's, Mac's, Laptop, Tablet and Smartphone devices. This course has been designed to be fully compatible with tablets and smartphones so you can access your course on Wi-Fi, 3G or 4G. There is no time limit for completing this course, it can be studied in your own time at your own pace. CAREER PATH Learning this new skill will help you to advance in your career. It will diversify your job options and help you develop new techniques to keep up with the fast-changing world. This skillset will help you to- * Open doors of opportunities * Increase your adaptability * Keep you relevant * Boost confidence And much more! COURSE CURRICULUM 6 sections • 20 lectures • 02:59:00 total length •Introduction: 00:12:00 •R Installation: 00:05:00 •Installing Apache Spark: 00:12:00 •Installing Java (Optional): 00:05:00 •Testing Apache Spark Installation: 00:03:00 •Installing Sparklyr: 00:07:00 •Data Extraction: 00:06:00 •Data Transformation: 00:18:00 •Data Exporting: 00:07:00 •Data Pre-processing: 00:18:00 •Building the Predictive Model: 00:10:00 •Creating the Prediction Dataset: 00:10:00 •Installing Tableau: 00:02:00 •Loading the Data Sources: 00:05:00 •Creating a Geo Map: 00:12:00 •Creating a Bar Chart: 00:08:00 •Creating a Donut Chart: 00:15:00 •Creating the Magnitude Chart: 00:09:00 •Creating the Dashboard: 00:15:00 •Source Code: 00:00:00

Develop Big Data Pipelines with R & Sparklyr & Tableau
Delivered Online On Demand
£12

AWS Certified Data Analytics Specialty (2023) Hands-on

By Packt

This course covers the important topics needed to pass the AWS Certified Data Analytics-Specialty exam (AWS DAS-C01). You will learn about Kinesis, EMR, DynamoDB, and Redshift, and get ready for the exam by working through quizzes, exercises, and practice exams, along with exploring essential tips and techniques.

AWS Certified Data Analytics Specialty (2023) Hands-on
Delivered Online On Demand
£68.99

Streaming Big Data with Spark Streaming, Scala, and Spark 3!

By Packt

In this course, we will process massive streams of real-time data using Spark Streaming and create Spark applications using the Scala programming language (v2.12). We will also get our hands-on with some real live Twitter data, simulated streams of Apache access logs, and even data used to train machine learning models.

Streaming Big Data with Spark Streaming, Scala, and Spark 3!
Delivered Online On Demand
£74.99