• Professional Development
  • Medicine & Nursing
  • Arts & Crafts
  • Health & Wellbeing
  • Personal Development

12 Apache Spark courses delivered Live Online

🔥 Limited Time Offer 🔥

Get a 10% discount on your first order when you use this promo code at checkout: MAY24BAN3X

Mastering Scala with Apache Spark for the Modern Data Enterprise (TTSK7520)

By Nexus Human

Duration 5 Days 30 CPD hours This course is intended for This intermediate and beyond level course is geared for experienced technical professionals in various roles, such as developers, data analysts, data engineers, software engineers, and machine learning engineers who want to leverage Scala and Spark to tackle complex data challenges and develop scalable, high-performance applications across diverse domains. Practical programming experience is required to participate in the hands-on labs. Overview Working in a hands-on learning environment led by our expert instructor you'll: Develop a basic understanding of Scala and Apache Spark fundamentals, enabling you to confidently create scalable and high-performance applications. Learn how to process large datasets efficiently, helping you handle complex data challenges and make data-driven decisions. Gain hands-on experience with real-time data streaming, allowing you to manage and analyze data as it flows into your applications. Acquire practical knowledge of machine learning algorithms using Spark MLlib, empowering you to create intelligent applications and uncover hidden insights. Master graph processing with GraphX, enabling you to analyze and visualize complex relationships in your data. Discover generative AI technologies using GPT with Spark and Scala, opening up new possibilities for automating content generation and enhancing data analysis. Embark on a journey to master the world of big data with our immersive course on Scala and Spark! Mastering Scala with Apache Spark for the Modern Data Enterprise is a five day hands on course designed to provide you with the essential skills and tools to tackle complex data projects using Scala programming language and Apache Spark, a high-performance data processing engine. Mastering these technologies will enable you to perform a wide range of tasks, from data wrangling and analytics to machine learning and artificial intelligence, across various industries and applications.Guided by our expert instructor, you?ll explore the fundamentals of Scala programming and Apache Spark while gaining valuable hands-on experience with Spark programming, RDDs, DataFrames, Spark SQL, and data sources. You?ll also explore Spark Streaming, performance optimization techniques, and the integration of popular external libraries, tools, and cloud platforms like AWS, Azure, and GCP. Machine learning enthusiasts will delve into Spark MLlib, covering basics of machine learning algorithms, data preparation, feature extraction, and various techniques such as regression, classification, clustering, and recommendation systems. INTRODUCTION TO SCALA * Brief history and motivation * Differences between Scala and Java * Basic Scala syntax and constructs * Scala's functional programming features INTRODUCTION TO APACHE SPARK * Overview and history * Spark components and architecture * Spark ecosystem * Comparing Spark with other big data frameworks BASICS OF SPARK PROGRAMMING SPARKCONTEXT AND SPARKSESSION * Resilient Distributed Datasets (RDDs) * Transformations and Actions * Working with DataFrames SPARK SQL AND DATA SOURCES * Spark SQL library and its advantages * Structured and semi-structured data sources * Reading and writing data in various formats (CSV, JSON, Parquet, Avro, etc.) * Data manipulation using SQL queries BASIC RDD OPERATIONS * Creating and manipulating RDDs * Common transformations and actions on RDDs * Working with key-value data BASIC DATAFRAME AND DATASET OPERATIONS * Creating and manipulating DataFrames and Datasets * Column operations and functions * Filtering, sorting, and aggregating data INTRODUCTION TO SPARK STREAMING * Overview of Spark Streaming * Discretized Stream (DStream) operations * Windowed operations and stateful processing PERFORMANCE OPTIMIZATION BASICS * Best practices for efficient Spark code * Broadcast variables and accumulators * Monitoring Spark applications INTEGRATING EXTERNAL LIBRARIES AND TOOLS, SPARK STREAMING * Using popular external libraries, such as Hadoop and HBase * Integrating with cloud platforms: AWS, Azure, GCP * Connecting to data storage systems: HDFS, S3, Cassandra, etc. INTRODUCTION TO MACHINE LEARNING BASICS * Overview of machine learning * Supervised and unsupervised learning * Common algorithms and use cases INTRODUCTION TO SPARK MLLIB * Overview of Spark MLlib * MLlib's algorithms and utilities * Data preparation and feature extraction LINEAR REGRESSION AND CLASSIFICATION * Linear regression algorithm * Logistic regression for classification * Model evaluation and performance metrics CLUSTERING ALGORITHMS * Overview of clustering algorithms * K-means clustering * Model evaluation and performance metrics COLLABORATIVE FILTERING AND RECOMMENDATION SYSTEMS * Overview of recommendation systems * Collaborative filtering techniques * Implementing recommendations with Spark MLlib INTRODUCTION TO GRAPH PROCESSING * Overview of graph processing * Use cases and applications of graph processing * Graph representations and operations * Introduction to Spark GraphX * Overview of GraphX * Creating and transforming graphs * Graph algorithms in GraphX BIG DATA INNOVATION! USING GPT AND GENERATIVE AI TECHNOLOGIES WITH SPARK AND SCALA * Overview of generative AI technologies * Integrating GPT with Spark and Scala * Practical applications and use cases Bonus Topics / Time Permitting INTRODUCTION TO SPARK NLP * Overview of Spark NLP Preprocessing text data * Text classification and sentiment analysis PUTTING IT ALL TOGETHER * Work on a capstone project that integrates multiple aspects of the course, including data processing, machine learning, graph processing, and generative AI technologies.

Mastering Scala with Apache Spark for the Modern Data Enterprise (TTSK7520)
Delivered on-request, onlineDelivered Online
Price on Enquiry

DP-601T00 Implementing a Lakehouse with Microsoft Fabric

By Nexus Human

Duration 1 Days 6 CPD hours This course is intended for The primary audience for this course is data professionals who are familiar with data modeling, extraction, and analytics. It is designed for professionals who are interested in gaining knowledge about Lakehouse architecture, the Microsoft Fabric platform, and how to enable end-to-end analytics using these technologies. Job role: Data Analyst, Data Engineer, Data Scientist Overview Describe end-to-end analytics in Microsoft Fabric Describe core features and capabilities of lakehouses in Microsoft Fabric Create a lakehouse Ingest data into files and tables in a lakehouse Query lakehouse tables with SQL Configure Spark in a Microsoft Fabric workspace Identify suitable scenarios for Spark notebooks and Spark jobs Use Spark dataframes to analyze and transform data Use Spark SQL to query data in tables and views Visualize data in a Spark notebook Understand Delta Lake and delta tables in Microsoft Fabric Create and manage delta tables using Spark Use Spark to query and transform data in delta tables Use delta tables with Spark structured streaming Describe Dataflow (Gen2) capabilities in Microsoft Fabric Create Dataflow (Gen2) solutions to ingest and transform data Include a Dataflow (Gen2) in a pipeline This course is designed to build your foundational skills in data engineering on Microsoft Fabric, focusing on the Lakehouse concept. This course will explore the powerful capabilities of Apache Spark for distributed data processing and the essential techniques for efficient data management, versioning, and reliability by working with Delta Lake tables. This course will also explore data ingestion and orchestration using Dataflows Gen2 and Data Factory pipelines. This course includes a combination of lectures and hands-on exercises that will prepare you to work with lakehouses in Microsoft Fabric. INTRODUCTION TO END-TO-END ANALYTICS USING MICROSOFT FABRIC * Explore end-to-end analytics with Microsoft Fabric * Data teams and Microsoft Fabric * Enable and use Microsoft Fabric * Knowledge Check GET STARTED WITH LAKEHOUSES IN MICROSOFT FABRIC * Explore the Microsoft Fabric Lakehouse * Work with Microsoft Fabric Lakehouses * Exercise - Create and ingest data with a Microsoft Fabric Lakehouse USE APACHE SPARK IN MICROSOFT FABRIC * Prepare to use Apache Spark * Run Spark code * Work with data in a Spark dataframe * Work with data using Spark SQL * Visualize data in a Spark notebook * Exercise - Analyze data with Apache Spark WORK WITH DELTA LAKE TABLES IN MICROSOFT FABRIC * Understand Delta Lake * Create delta tables * Work with delta tables in Spark * Use delta tables with streaming data * Exercise - Use delta tables in Apache Spark INGEST DATA WITH DATAFLOWS GEN2 IN MICROSOFT FABRIC * Understand Dataflows (Gen2) in Microsoft Fabric * Explore Dataflows (Gen2) in Microsoft Fabric * Integrate Dataflows (Gen2) and Pipelines in Microsoft Fabric * Exercise - Create and use a Dataflow (Gen2) in Microsoft Fabric

DP-601T00 Implementing a Lakehouse with Microsoft Fabric
Delivered OnlineTwo days, Aug 26th, 13:00 + 2 more
£595

DP-203T00 Data Engineering on Microsoft Azure

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for The primary audience for this course is data professionals, data architects, and business intelligence professionals who want to learn about data engineering and building analytical solutions using data platform technologies that exist on Microsoft Azure. The secondary audience for this course includes data analysts and data scientists who work with analytical solutions built on Microsoft Azure. In this course, the student will learn how to implement and manage data engineering workloads on Microsoft Azure, using Azure services such as Azure Synapse Analytics, Azure Data Lake Storage Gen2, Azure Stream Analytics, Azure Databricks, and others. The course focuses on common data engineering tasks such as orchestrating data transfer and transformation pipelines, working with data files in a data lake, creating and loading relational data warehouses, capturing and aggregating streams of real-time data, and tracking data assets and lineage. Prerequisites Successful students start this course with knowledge of cloud computing and core data concepts and professional experience with data solutions. AZ-900T00 Microsoft Azure Fundamentals DP-900T00 Microsoft Azure Data Fundamentals 1 - INTRODUCTION TO DATA ENGINEERING ON AZURE * What is data engineering * Important data engineering concepts * Data engineering in Microsoft Azure 2 - INTRODUCTION TO AZURE DATA LAKE STORAGE GEN2 * Understand Azure Data Lake Storage Gen2 * Enable Azure Data Lake Storage Gen2 in Azure Storage * Compare Azure Data Lake Store to Azure Blob storage * Understand the stages for processing big data * Use Azure Data Lake Storage Gen2 in data analytics workloads 3 - INTRODUCTION TO AZURE SYNAPSE ANALYTICS * What is Azure Synapse Analytics * How Azure Synapse Analytics works * When to use Azure Synapse Analytics 4 - USE AZURE SYNAPSE SERVERLESS SQL POOL TO QUERY FILES IN A DATA LAKE * Understand Azure Synapse serverless SQL pool capabilities and use cases * Query files using a serverless SQL pool * Create external database objects 5 - USE AZURE SYNAPSE SERVERLESS SQL POOLS TO TRANSFORM DATA IN A DATA LAKE * Transform data files with the CREATE EXTERNAL TABLE AS SELECT statement * Encapsulate data transformations in a stored procedure * Include a data transformation stored procedure in a pipeline 6 - CREATE A LAKE DATABASE IN AZURE SYNAPSE ANALYTICS * Understand lake database concepts * Explore database templates * Create a lake database * Use a lake database 7 - ANALYZE DATA WITH APACHE SPARK IN AZURE SYNAPSE ANALYTICS * Get to know Apache Spark * Use Spark in Azure Synapse Analytics * Analyze data with Spark * Visualize data with Spark 8 - TRANSFORM DATA WITH SPARK IN AZURE SYNAPSE ANALYTICS * Modify and save dataframes * Partition data files * Transform data with SQL 9 - USE DELTA LAKE IN AZURE SYNAPSE ANALYTICS * Understand Delta Lake * Create Delta Lake tables * Create catalog tables * Use Delta Lake with streaming data * Use Delta Lake in a SQL pool 10 - ANALYZE DATA IN A RELATIONAL DATA WAREHOUSE * Design a data warehouse schema * Create data warehouse tables * Load data warehouse tables * Query a data warehouse 11 - LOAD DATA INTO A RELATIONAL DATA WAREHOUSE * Load staging tables * Load dimension tables * Load time dimension tables * Load slowly changing dimensions * Load fact tables * Perform post load optimization 12 - BUILD A DATA PIPELINE IN AZURE SYNAPSE ANALYTICS * Understand pipelines in Azure Synapse Analytics * Create a pipeline in Azure Synapse Studio * Define data flows * Run a pipeline 13 - USE SPARK NOTEBOOKS IN AN AZURE SYNAPSE PIPELINE * Understand Synapse Notebooks and Pipelines * Use a Synapse notebook activity in a pipeline * Use parameters in a notebook 14 - PLAN HYBRID TRANSACTIONAL AND ANALYTICAL PROCESSING USING AZURE SYNAPSE ANALYTICS * Understand hybrid transactional and analytical processing patterns * Describe Azure Synapse Link 15 - IMPLEMENT AZURE SYNAPSE LINK WITH AZURE COSMOS DB * Enable Cosmos DB account to use Azure Synapse Link * Create an analytical store enabled container * Create a linked service for Cosmos DB * Query Cosmos DB data with Spark * Query Cosmos DB with Synapse SQL 16 - IMPLEMENT AZURE SYNAPSE LINK FOR SQL * What is Azure Synapse Link for SQL? * Configure Azure Synapse Link for Azure SQL Database * Configure Azure Synapse Link for SQL Server 2022 17 - GET STARTED WITH AZURE STREAM ANALYTICS * Understand data streams * Understand event processing * Understand window functions 18 - INGEST STREAMING DATA USING AZURE STREAM ANALYTICS AND AZURE SYNAPSE ANALYTICS * Stream ingestion scenarios * Configure inputs and outputs * Define a query to select, filter, and aggregate data * Run a job to ingest data 19 - VISUALIZE REAL-TIME DATA WITH AZURE STREAM ANALYTICS AND POWER BI * Use a Power BI output in Azure Stream Analytics * Create a query for real-time visualization * Create real-time data visualizations in Power BI 20 - INTRODUCTION TO MICROSOFT PURVIEW * What is Microsoft Purview? * How Microsoft Purview works * When to use Microsoft Purview 21 - INTEGRATE MICROSOFT PURVIEW AND AZURE SYNAPSE ANALYTICS * Catalog Azure Synapse Analytics data assets in Microsoft Purview * Connect Microsoft Purview to an Azure Synapse Analytics workspace * Search a Purview catalog in Synapse Studio * Track data lineage in pipelines 22 - EXPLORE AZURE DATABRICKS * Get started with Azure Databricks * Identify Azure Databricks workloads * Understand key concepts 23 - USE APACHE SPARK IN AZURE DATABRICKS * Get to know Spark * Create a Spark cluster * Use Spark in notebooks * Use Spark to work with data files * Visualize data 24 - RUN AZURE DATABRICKS NOTEBOOKS WITH AZURE DATA FACTORY * Understand Azure Databricks notebooks and pipelines * Create a linked service for Azure Databricks * Use a Notebook activity in a pipeline * Use parameters in a notebook ADDITIONAL COURSE DETAILS: Nexus Humans DP-203T00 Data Engineering on Microsoft Azure training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the DP-203T00 Data Engineering on Microsoft Azure course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

DP-203T00 Data Engineering on Microsoft Azure
Delivered Online5 days, Jun 24th, 13:00 + 4 more
£2380

DP-900T00 Microsoft Azure Data Fundamentals

By Nexus Human

Duration 1 Days 6 CPD hours This course is intended for The audience for this course is individuals who want to learn the fundamentals of database concepts in a cloud environment, get basic skilling in cloud data services, and build their foundational knowledge of cloud data services within Microsoft Azure. Overview Describe core data concepts Identify considerations for relational data on Azure Describe considerations for working with non-relational data on Azure Describe an analytics workload on Azure In this course, students will gain foundational knowledge of core data concepts and related Microsoft Azure data services. Students will learn about core data concepts such as relational, non-relational, big data, and analytics, and build their foundational knowledge of cloud data services within Microsoft Azure. Students will explore fundamental relational data concepts and relational database services in Azure. They will explore Azure storage for non-relational data and the fundamentals of Azure Cosmos DB. Students will learn about large-scale data warehousing, real-time analytics, and data visualization. 1 - EXPLORE CORE DATA CONCEPTS * Identify data formats * Explore file storage * Explore databases * Explore transactional data processing * Explore analytical data processing 2 - EXPLORE DATA ROLES AND SERVICES * Explore job roles in the world of data * Identify data services 3 - EXPLORE FUNDAMENTAL RELATIONAL DATA CONCEPTS * Understand relational data * Understand normalization * Explore SQL * Describe database objects 4 - EXPLORE RELATIONAL DATABASE SERVICES IN AZURE * Describe Azure SQL services and capabilities * Describe Azure services for open-source databases 5 - EXPLORE AZURE STORAGE FOR NON-RELATIONAL DATA * Explore Azure blob storage * Explore Azure DataLake Storage Gen2 * Explore Azure Files * Explore Azure Tables 6 - EXPLORE FUNDAMENTALS OF AZURE COSMOS DB * Describe Azure Cosmos DB * Identify Azure Cosmos DB APIs 7 - EXPLORE FUNDAMENTALS OF LARGE-SCALE DATA WAREHOUSING * Describe data warehousing architecture * Explore data ingestion pipelines * Explore analytical data stores 8 - EXPLORE FUNDAMENTALS OF REAL-TIME ANALYTICS * Understand batch and stream processing * Explore common elements of stream processing architecture * Explore Azure Stream Analytics * Explore Apache Spark on Microsoft Azure 9 - EXPLORE FUNDAMENTALS OF DATA VISUALIZATION * Describe Power BI tools and workflow * Describe core concepts of data modeling * Describe considerations for data visualization

DP-900T00 Microsoft Azure Data Fundamentals
Delivered OnlineTwo days, Jun 24th, 13:00 + 3 more
£595

Building Batch Data Analytics Solutions on AWS

By Nexus Human

Duration 1 Days 6 CPD hours This course is intended for This course is intended for: Data platform engineers Architects and operators who build and manage data analytics pipelines Overview In this course, you will learn to: Compare the features and benefits of data warehouses, data lakes, and modern data architectures Design and implement a batch data analytics solution Identify and apply appropriate techniques, including compression, to optimize data storage Select and deploy appropriate options to ingest, transform, and store data Choose the appropriate instance and node types, clusters, auto scaling, and network topology for a particular business use case Understand how data storage and processing affect the analysis and visualization mechanisms needed to gain actionable business insights Secure data at rest and in transit Monitor analytics workloads to identify and remediate problems Apply cost management best practices In this course, you will learn to build batch data analytics solutions using Amazon EMR, an enterprise-grade Apache Spark and Apache Hadoop managed service. You will learn how Amazon EMR integrates with open-source projects such as Apache Hive, Hue, and HBase, and with AWS services such as AWS Glue and AWS Lake Formation. The course addresses data collection, ingestion, cataloging, storage, and processing components in the context of Spark and Hadoop. You will learn to use EMR Notebooks to support both analytics and machine learning workloads. You will also learn to apply security, performance, and cost management best practices to the operation of Amazon EMR. MODULE A: OVERVIEW OF DATA ANALYTICS AND THE DATA PIPELINE * Data analytics use cases * Using the data pipeline for analytics MODULE 1: INTRODUCTION TO AMAZON EMR * Using Amazon EMR in analytics solutions * Amazon EMR cluster architecture * Interactive Demo 1: Launching an Amazon EMR cluster * Cost management strategies MODULE 2: DATA ANALYTICS PIPELINE USING AMAZON EMR: INGESTION AND STORAGE * Storage optimization with Amazon EMR * Data ingestion techniques MODULE 3: HIGH-PERFORMANCE BATCH DATA ANALYTICS USING APACHE SPARK ON AMAZON EMR * Apache Spark on Amazon EMR use cases * Why Apache Spark on Amazon EMR * Spark concepts * Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell * Transformation, processing, and analytics * Using notebooks with Amazon EMR * Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR MODULE 4: PROCESSING AND ANALYZING BATCH DATA WITH AMAZON EMR AND APACHE HIVE * Using Amazon EMR with Hive to process batch data * Transformation, processing, and analytics * Practice Lab 2: Batch data processing using Amazon EMR with Hive * Introduction to Apache HBase on Amazon EMR MODULE 5: SERVERLESS DATA PROCESSING * Serverless data processing, transformation, and analytics * Using AWS Glue with Amazon EMR workloads * Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions MODULE 6: SECURITY AND MONITORING OF AMAZON EMR CLUSTERS * Securing EMR clusters * Interactive Demo 3: Client-side encryption with EMRFS * Monitoring and troubleshooting Amazon EMR clusters * Demo: Reviewing Apache Spark cluster history MODULE 7: DESIGNING BATCH DATA ANALYTICS SOLUTIONS * Batch data analytics use cases * Activity: Designing a batch data analytics workflow MODULE B: DEVELOPING MODERN DATA ARCHITECTURES ON AWS * Modern data architectures

Building Batch Data Analytics Solutions on AWS
Delivered on-request, onlineDelivered Online
Price on Enquiry

Cloudera Data Scientist Training

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview Overview of data science and machine learning at scale Overview of the Hadoop ecosystem Working with HDFS data and Hive tables using Hue Introduction to Cloudera Data Science Workbench Overview of Apache Spark 2 Reading and writing data Inspecting data quality Cleansing and transforming data Summarizing and grouping data Combining, splitting, and reshaping data Exploring data Configuring, monitoring, and troubleshooting Spark applications Overview of machine learning in Spark MLlib Extracting, transforming, and selecting features Building and evaluating regression models Building and evaluating classification models Building and evaluating clustering models Cross-validating models and tuning hyperparameters Building machine learning pipelines Deploying machine learning models Spark, Spark SQL, and Spark MLlib PySpark and sparklyr Cloudera Data Science Workbench (CDSW) Hue This workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges. Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment. The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. OVERVIEW OF DATA SCIENCE AND MACHINE LEARNING AT SCALE OVERVIEW OF THE HADOOP ECOSYSTEM WORKING WITH HDFS DATA AND HIVE TABLES USING HUE INTRODUCTION TO CLOUDERA DATA SCIENCE WORKBENCH OVERVIEW OF APACHE SPARK 2 READING AND WRITING DATA INSPECTING DATA QUALITY CLEANSING AND TRANSFORMING DATA SUMMARIZING AND GROUPING DATA COMBINING, SPLITTING, AND RESHAPING DATA EXPLORING DATA CONFIGURING, MONITORING, AND TROUBLESHOOTING SPARK APPLICATIONS OVERVIEW OF MACHINE LEARNING IN SPARK MLLIB EXTRACTING, TRANSFORMING, AND SELECTING FEATURES BUILDING AND EVAUATING REGRESSION MODELS BUILDING AND EVALUATING CLASSIFICATION MODELS BUILDING AND EVALUATING CLUSTERING MODELS CROSS-VALIDATING MODELS AND TUNING HYPERPARAMETERS BUILDING MACHINE LEARNING PIPELINES DEPLOYING MACHINE LEARNING MODELS ADDITIONAL COURSE DETAILS: Nexus Humans Cloudera Data Scientist Training training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Cloudera Data Scientist Training course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Cloudera Data Scientist Training
Delivered on-request, onlineDelivered Online
Price on Enquiry

Data Engineering on Google Cloud

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This class is intended for experienced developers who are responsible for managing big data transformations including: Extracting, loading, transforming, cleaning, and validating data. Designing pipelines and architectures for data processing. Creating and maintaining machine learning and statistical models. Querying datasets, visualizing query results and creating reports Overview Design and build data processing systems on Google Cloud Platform. Leverage unstructured data using Spark and ML APIs on Cloud Dataproc. Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow. Derive business insights from extremely large datasets using Google BigQuery. Train, evaluate and predict using machine learning models using TensorFlow and Cloud ML. Enable instant insights from streaming data Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hand-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data. INTRODUCTION TO DATA ENGINEERING * Explore the role of a data engineer. * Analyze data engineering challenges. * Intro to BigQuery. * Data Lakes and Data Warehouses. * Demo: Federated Queries with BigQuery. * Transactional Databases vs Data Warehouses. * Website Demo: Finding PII in your dataset with DLP API. * Partner effectively with other data teams. * Manage data access and governance. * Build production-ready pipelines. * Review GCP customer case study. * Lab: Analyzing Data with BigQuery. BUILDING A DATA LAKE * Introduction to Data Lakes. * Data Storage and ETL options on GCP. * Building a Data Lake using Cloud Storage. * Optional Demo: Optimizing cost with Google Cloud Storage classes and Cloud Functions. * Securing Cloud Storage. * Storing All Sorts of Data Types. * Video Demo: Running federated queries on Parquet and ORC files in BigQuery. * Cloud SQL as a relational Data Lake. * Lab: Loading Taxi Data into Cloud SQL. BUILDING A DATA WAREHOUSE * The modern data warehouse. * Intro to BigQuery. * Demo: Query TB+ of data in seconds. * Getting Started. * Loading Data. * Video Demo: Querying Cloud SQL from BigQuery. * Lab: Loading Data into BigQuery. * Exploring Schemas. * Demo: Exploring BigQuery Public Datasets with SQL using INFORMATION_SCHEMA. * Schema Design. * Nested and Repeated Fields. * Demo: Nested and repeated fields in BigQuery. * Lab: Working with JSON and Array data in BigQuery. * Optimizing with Partitioning and Clustering. * Demo: Partitioned and Clustered Tables in BigQuery. * Preview: Transforming Batch and Streaming Data. INTRODUCTION TO BUILDING BATCH DATA PIPELINES * EL, ELT, ETL. * Quality considerations. * How to carry out operations in BigQuery. * Demo: ELT to improve data quality in BigQuery. * Shortcomings. * ETL to solve data quality issues. EXECUTING SPARK ON CLOUD DATAPROC * The Hadoop ecosystem. * Running Hadoop on Cloud Dataproc. * GCS instead of HDFS. * Optimizing Dataproc. * Lab: Running Apache Spark jobs on Cloud Dataproc. SERVERLESS DATA PROCESSING WITH CLOUD DATAFLOW * Cloud Dataflow. * Why customers value Dataflow. * Dataflow Pipelines. * Lab: A Simple Dataflow Pipeline (Python/Java). * Lab: MapReduce in Dataflow (Python/Java). * Lab: Side Inputs (Python/Java). * Dataflow Templates. * Dataflow SQL. MANAGE DATA PIPELINES WITH CLOUD DATA FUSION AND CLOUD COMPOSER * Building Batch Data Pipelines visually with Cloud Data Fusion. * Components. * UI Overview. * Building a Pipeline. * Exploring Data using Wrangler. * Lab: Building and executing a pipeline graph in Cloud Data Fusion. * Orchestrating work between GCP services with Cloud Composer. * Apache Airflow Environment. * DAGs and Operators. * Workflow Scheduling. * Optional Long Demo: Event-triggered Loading of data with Cloud Composer, Cloud Functions, Cloud Storage, and BigQuery. * Monitoring and Logging. * Lab: An Introduction to Cloud Composer. INTRODUCTION TO PROCESSING STREAMING DATA * Processing Streaming Data. SERVERLESS MESSAGING WITH CLOUD PUB/SUB * Cloud Pub/Sub. * Lab: Publish Streaming Data into Pub/Sub. CLOUD DATAFLOW STREAMING FEATURES * Cloud Dataflow Streaming Features. * Lab: Streaming Data Pipelines. HIGH-THROUGHPUT BIGQUERY AND BIGTABLE STREAMING FEATURES * BigQuery Streaming Features. * Lab: Streaming Analytics and Dashboards. * Cloud Bigtable. * Lab: Streaming Data Pipelines into Bigtable. ADVANCED BIGQUERY FUNCTIONALITY AND PERFORMANCE * Analytic Window Functions. * Using With Clauses. * GIS Functions. * Demo: Mapping Fastest Growing Zip Codes with BigQuery GeoViz. * Performance Considerations. * Lab: Optimizing your BigQuery Queries for Performance. * Optional Lab: Creating Date-Partitioned Tables in BigQuery. INTRODUCTION TO ANALYTICS AND AI * What is AI?. * From Ad-hoc Data Analysis to Data Driven Decisions. * Options for ML models on GCP. PREBUILT ML MODEL APIS FOR UNSTRUCTURED DATA * Unstructured Data is Hard. * ML APIs for Enriching Data. * Lab: Using the Natural Language API to Classify Unstructured Text. BIG DATA ANALYTICS WITH CLOUD AI PLATFORM NOTEBOOKS * What's a Notebook. * BigQuery Magic and Ties to Pandas. * Lab: BigQuery in Jupyter Labs on AI Platform. PRODUCTION ML PIPELINES WITH KUBEFLOW * Ways to do ML on GCP. * Kubeflow. * AI Hub. * Lab: Running AI models on Kubeflow. CUSTOM MODEL BUILDING WITH SQL IN BIGQUERY ML * BigQuery ML for Quick Model Building. * Demo: Train a model with BigQuery ML to predict NYC taxi fares. * Supported Models. * Lab Option 1: Predict Bike Trip Duration with a Regression Model in BQML. * Lab Option 2: Movie Recommendations in BigQuery ML. CUSTOM MODEL BUILDING WITH CLOUD AUTOML * Why Auto ML? * Auto ML Vision. * Auto ML NLP. * Auto ML Tables.

Data Engineering on Google Cloud
Delivered on-request, onlineDelivered Online
Price on Enquiry

Python With Data Science

By Nexus Human

Duration 2 Days 12 CPD hours This course is intended for Audience: Data Scientists, Software Developers, IT Architects, and Technical Managers. Participants should have the general knowledge of statistics and programming Also familiar with Python Overview ? NumPy, pandas, Matplotlib, scikit-learn ? Python REPLs ? Jupyter Notebooks ? Data analytics life-cycle phases ? Data repairing and normalizing ? Data aggregation and grouping ? Data visualization ? Data science algorithms for supervised and unsupervised machine learning Covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. PYTHON FOR DATA SCIENCE * ? Using Modules * ? Listing Methods in a Module * ? Creating Your Own Modules * ? List Comprehension * ? Dictionary Comprehension * ? String Comprehension * ? Python 2 vs Python 3 * ? Sets (Python 3+) * ? Python Idioms * ? Python Data Science ?Ecosystem? * ? NumPy * ? NumPy Arrays * ? NumPy Idioms * ? pandas * ? Data Wrangling with pandas' DataFrame * ? SciPy * ? Scikit-learn * ? SciPy or scikit-learn? * ? Matplotlib * ? Python vs R * ? Python on Apache Spark * ? Python Dev Tools and REPLs * ? Anaconda * ? IPython * ? Visual Studio Code * ? Jupyter * ? Jupyter Basic Commands * ? Summary APPLIED DATA SCIENCE * ? What is Data Science? * ? Data Science Ecosystem * ? Data Mining vs. Data Science * ? Business Analytics vs. Data Science * ? Data Science, Machine Learning, AI? * ? Who is a Data Scientist? * ? Data Science Skill Sets Venn Diagram * ? Data Scientists at Work * ? Examples of Data Science Projects * ? An Example of a Data Product * ? Applied Data Science at Google * ? Data Science Gotchas * ? Summary DATA ANALYTICS LIFE-CYCLE PHASES * ? Big Data Analytics Pipeline * ? Data Discovery Phase * ? Data Harvesting Phase * ? Data Priming Phase * ? Data Logistics and Data Governance * ? Exploratory Data Analysis * ? Model Planning Phase * ? Model Building Phase * ? Communicating the Results * ? Production Roll-out * ? Summary REPAIRING AND NORMALIZING DATA * ? Repairing and Normalizing Data * ? Dealing with the Missing Data * ? Sample Data Set * ? Getting Info on Null Data * ? Dropping a Column * ? Interpolating Missing Data in pandas * ? Replacing the Missing Values with the Mean Value * ? Scaling (Normalizing) the Data * ? Data Preprocessing with scikit-learn * ? Scaling with the scale() Function * ? The MinMaxScaler Object * ? Summary DESCRIPTIVE STATISTICS COMPUTING FEATURES IN PYTHON * ? Descriptive Statistics * ? Non-uniformity of a Probability Distribution * ? Using NumPy for Calculating Descriptive Statistics Measures * ? Finding Min and Max in NumPy * ? Using pandas for Calculating Descriptive Statistics Measures * ? Correlation * ? Regression and Correlation * ? Covariance * ? Getting Pairwise Correlation and Covariance Measures * ? Finding Min and Max in pandas DataFrame * ? Summary DATA AGGREGATION AND GROUPING * ? Data Aggregation and Grouping * ? Sample Data Set * ? The pandas.core.groupby.SeriesGroupBy Object * ? Grouping by Two or More Columns * ? Emulating the SQL's WHERE Clause * ? The Pivot Tables * ? Cross-Tabulation * ? Summary DATA VISUALIZATION WITH MATPLOTLIB * ? Data Visualization ? What is matplotlib? ? Getting Started with matplotlib ? The Plotting Window ? The Figure Options ? The matplotlib.pyplot.plot() Function ? The matplotlib.pyplot.bar() Function ? The matplotlib.pyplot.pie () Function ? Subplots ? Using the matplotlib.gridspec.GridSpec Object ? The matplotlib.pyplot.subplot() Function ? Hands-on Exercise ? Figures ? Saving Figures to File ? Visualization with pandas ? Working with matplotlib in Jupyter Notebooks ? Summary DATA SCIENCE AND ML ALGORITHMS IN SCIKIT-LEARN * ? Data Science, Machine Learning, AI? * ? Types of Machine Learning * ? Terminology: Features and Observations * ? Continuous and Categorical Features (Variables) * ? Terminology: Axis * ? The scikit-learn Package * ? scikit-learn Estimators * ? Models, Estimators, and Predictors * ? Common Distance Metrics * ? The Euclidean Metric * ? The LIBSVM format * ? Scaling of the Features * ? The Curse of Dimensionality * ? Supervised vs Unsupervised Machine Learning * ? Supervised Machine Learning Algorithms * ? Unsupervised Machine Learning Algorithms * ? Choose the Right Algorithm * ? Life-cycles of Machine Learning Development * ? Data Split for Training and Test Data Sets * ? Data Splitting in scikit-learn * ? Hands-on Exercise * ? Classification Examples * ? Classifying with k-Nearest Neighbors (SL) * ? k-Nearest Neighbors Algorithm * ? k-Nearest Neighbors Algorithm * ? The Error Rate * ? Hands-on Exercise * ? Dimensionality Reduction * ? The Advantages of Dimensionality Reduction * ? Principal component analysis (PCA) * ? Hands-on Exercise * ? Data Blending * ? Decision Trees (SL) * ? Decision Tree Terminology * ? Decision Tree Classification in Context of Information Theory * ? Information Entropy Defined * ? The Shannon Entropy Formula * ? The Simplified Decision Tree Algorithm * ? Using Decision Trees * ? Random Forests * ? SVM * ? Naive Bayes Classifier (SL) * ? Naive Bayesian Probabilistic Model in a Nutshell * ? Bayes Formula * ? Classification of Documents with Naive Bayes * ? Unsupervised Learning Type: Clustering * ? Clustering Examples * ? k-Means Clustering (UL) * ? k-Means Clustering in a Nutshell * ? k-Means Characteristics * ? Regression Analysis * ? Simple Linear Regression Model * ? Linear vs Non-Linear Regression * ? Linear Regression Illustration * ? Major Underlying Assumptions for Regression Analysis * ? Least-Squares Method (LSM) * ? Locally Weighted Linear Regression * ? Regression Models in Excel * ? Multiple Regression Analysis * ? Logistic Regression * ? Regression vs Classification * ? Time-Series Analysis * ? Decomposing Time-Series * ? Summary LAB EXERCISES * Lab 1 - Learning the Lab Environment * Lab 2 - Using Jupyter Notebook * Lab 3 - Repairing and Normalizing Data * Lab 4 - Computing Descriptive Statistics * Lab 5 - Data Grouping and Aggregation * Lab 6 - Data Visualization with matplotlib * Lab 7 - Data Splitting * Lab 8 - k-Nearest Neighbors Algorithm * Lab 9 - The k-means Algorithm * Lab 10 - The Random Forest Algorithm

Python With Data Science
Delivered on-request, onlineDelivered Online
Price on Enquiry

Developer Training for Spark and Hadoop

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for Hadoop Developers Overview Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:How data is distributed, stored, and processed in a Hadoop clusterHow to use Sqoop and Flume to ingest dataHow to process distributed data with Apache SparkHow to model structured data as tables in Impala and HiveHow to choose the best data storage format for different data usage patternsBest practices for data storage This training course is the best preparation for the challenges faced by Hadoop developers. Participants will learn to identify which tool is the right one to use in a given situation, and will gain hands-on experience in developing using those tools. COURSE OUTLINE * Introduction * Introduction to Hadoop and the Hadoop Ecosystem * Hadoop Architecture and HDFS * Importing Relational Data with Apache Sqoop * Introduction to Impala and Hive * Modeling and Managing Data with Impala and Hive * Data Formats * Data Partitioning * Capturing Data with Apache Flume * Spark Basics * Working with RDDs in Spark * Writing and Deploying Spark Applications * Parallel Programming with Spark * Spark Caching and Persistence * Common Patterns in Spark Data Processing * Spark SQL and DataFrames * Conclusion ADDITIONAL COURSE DETAILS: Nexus Humans Developer Training for Spark and Hadoop training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Developer Training for Spark and Hadoop course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Developer Training for Spark and Hadoop
Delivered on-request, onlineDelivered Online
Price on Enquiry

Cloudera Training for Apache HBase

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is appropriate for developers and administrators who intend to use HBase. Overview Skills learned on the course include:The use cases and usage occasions for HBase, Hadoop, and RDBMSUsing the HBase shell to directly manipulate HBase tablesDesigning optimal HBase schemas for efficient data storage and recoveryHow to connect to HBase using the Java API, configure the HBase cluster, and administer an HBase clusterBest practices for identifying and resolving performance bottlenecks Cloudera University?s four-day training course for Apache HBase enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second. INTRODUCTION TO HADOOP & HBASE * What Is Big Data? * Introducing Hadoop * Hadoop Components * What Is HBase? * Why Use HBase? * Strengths of HBase * HBase in Production * Weaknesses of HBase HBASE TABLES * HBase Concepts * HBase Table Fundamentals * Thinking About Table Design THE HBASE SHELL * Creating Tables with the HBase Shell * Working with Tables * Working with Table Data HBASE ARCHITECTURE FUNDAMENTALS * HBase Regions * HBase Cluster Architecture * HBase and HDFS Data Locality HBASE SCHEMA DESIGN * General Design Considerations * Application-Centric Design * Designing HBase Row Keys * Other HBase Table Features BASIC DATA ACCESS WITH THE HBASE API * Options to Access HBase Data * Creating and Deleting HBase Tables * Retrieving Data with Get * Retrieving Data with Scan * Inserting and Updating Data * Deleting Data MORE ADVANCED HBASE API FEATURES * Filtering Scans * Best Practices * HBase Coprocessors HBASE ON THE CLUSTER * How HBase Uses HDFS * Compactions and Splits HBASE READS & WRITES * How HBase Writes Data * How HBase Reads Data * Block Caches for Reading HBASE PERFORMANCE TUNING * Column Family Considerations * Schema Design Considerations * Configuring for Caching * Dealing with Time Series and Sequential Data * Pre-Splitting Regions HBASE ADMINISTRATION AND CLUSTER MANAGEMENT * HBase Daemons * ZooKeeper Considerations * HBase High Availability * Using the HBase Balancer * Fixing Tables with hbck * HBase Security HBASE REPLICATION & BACKUP * HBase Replication * HBase Backup * MapReduce and HBase Clusters USING HIVE & IMPALA WITH HBASE * Using Hive and Impala with HBase APPENDIX A: ACCESSING DATA WITH PYTHON AND THRIFT * Thrift Usage * Working with Tables * Getting and Putting Data * Scanning Data * Deleting Data * Counters * Filters APPENDIX B: OPENTSDB

Cloudera Training for Apache HBase
Delivered on-request, onlineDelivered Online
Price on Enquiry

Educators matching "Apache Spark"

Show all 5
Nobleprog Pakistan

nobleprog pakistan

NobleProg is an international training and consultancy group, delivering high quality courses to every sector, covering: Artificial Intelligence, IT, Management, Applied Statistics. Over the last 17 years, we have trained more than 50,000 people from over 6000 companies and organisations. Our courses include classroom (both public and closed) and instructor-led online giving you choice and flexibility to suit your time, budget and level of expertise. We practice what we preach – we use a great deal of the technologies and methods that we teach, and continuously upgrade and improve our courses, keeping up to date with all the latest developments. Our trainers are hand picked and have been through rigorous checks and interviews, and all courses are evaluated by delegates ensuring continuous feedback and improvement. NobleProg in numbers 17 + years of experience 15 + offices all over the world 1000 + trainers cooperating with NobleProg 1400 + course outlines offered companies 6100 + companies that entrusted us satisfied participant 58 k. + satisfied participants NobleProg - The World’s Local Training Provider Our mission is to provide comprehensive training and consultancy solutions all over the world, in an effective and accessible way, tailored to consumers’ needs . We offer practical, real-world knowledge supported by a full understanding of the theory. Our expert trainers are skilled in the latest knowledge transfer techniques, blending presentation, demonstration and hands-on learning. We understand that our learners are excited to be gaining new skills and we thrive off that energy to deliver exceptional training events. Investing in upskilling or reskilling with NobleProg means you stay ahead. Our catalogue is constantly evolving and we offer the most in-demand courses, Java, JavaScript, SQL, Visual Basic for Applications (VBA), as well as Apache Spark, OpenStack, TensorFlow, Selenium, Artificial Intelligence, Data Analysis. Our offer consists of more than 1,400 training outlines covering more than 120 technologies. At NobleProg we emphasis a need of not only following the latest technological trends, but also anticipating changes. We focus on delivering professional skills and certifications that will have a real impact. See what sets us apart >> NobleProg's history NobleProg was established in 2005 in Krakow, Poland, and has gradually expanded its operations to other global markets since. In just two years the first international branch was opened in London. The overwhelming potential of NobleProg combined with the rising need for self-development programs, especially in the field of technological skills, prompted the company to change the business model into a franchise. By doing so, in a short period of time the company allowed a number of people passionate about education and new technologies to join the NobleProg Team. With each year the territorial reach of NobleProg was further expanding and we now have offices on every continent. NobleProg is the World's Local Training Provider.