27 Big Data Analytics courses in Bradford delivered Live Online

🔥 Limited Time Offer 🔥

Get a 10% discount on your first order when you use this promo code at checkout: MAY24BAN3X

Cisco Splunk for Cisco Integrated Infrastructure (SPLUNK)

By Nexus Human

Duration 2 Days 12 CPD hours This course is intended for The primary audience for this course is as follows: System Engineers System Administrators Architects Channel Partners Data Analysts Overview Upon completing this course, you will be able to meet these overall objectives: Describe how harnessing the power of your machine data enables you to make decisions based on facts, bot intuition or best guesses. Reduce the time you spend investigating incidents by up to 90%. Find and fix problems faster by learning new technical skills for real world scenarios. Get started with Splunk Enterprise, from installation and data onboarding to running search queries to creating simple reports and dashboards. Accelerate time to value with turnkey Splunk integrations for dozens of Cisco products and platforms. Ensure faster, more predictable Splunk deployments with a proven Cisco Validated Design and the latest Cisco UCS server. This course will cover how Splunk software scales to collect and index hundreds of terabytes of data per day, across multi-geography, multi-datacenter and cloud based infrastructures. Using Cisco?s Unified Computing System (UCS) Integrated Infrastructure for Big Data offers linear scalability along with operational simplification for single-rack and multiple-rack deployments. CISCO INTEGRATED INFRASTRUCTURE FOR BIG DATA AND SPLUNK * What is Cisco CPA? * Architecture benefits for Splunk * Components of IIBD and relationship to Splunk Architecture * Cisco UCS Integrated Infrastructure for Big Data with Splunk Enterprise * Splunk- Big Data Analytics * NFS Configurations for the Splunk Frozen Data Storage * NFS Client Configurations on the Indexers SPLUNK- START SEARCHING * Chargeback * Reporting * Building custom reports using the report builder APPLICATION CONTAINERS * Understanding Application Containers UNDERSTANDING ADVANCED TASKS * Task Library & Inputs * CLI & SSH Task * Understanding Compound Tasks * Custom Tasks OPEN AUTOMATION TROUBLESHOOTING * UCS Director Restart * Module Loading * Report Errors * Feature Loading * Report Registration REST API- AUTOMATION * UCS Director Developer Tools * Accessing REST using a REST client * Accessing REST using the REST API browser OPEN AUTOMATION SDK * Overview * Open Automation vs. Custom Tasks * Use Cases UCS DIRECTOR POWERSHELL API * Cisco UCS Director PowerShell Console * Installing & Configuring * Working with Cmdlets CLOUPIA SCRIPT * Structure * Inputs & Outputs * Design * Examples ADDITIONAL COURSE DETAILS: Nexus Humans Cisco Splunk for Cisco Integrated Infrastructure (SPLUNK) training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Cisco Splunk for Cisco Integrated Infrastructure (SPLUNK) course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Cisco Splunk for Cisco Integrated Infrastructure (SPLUNK)

Delivered on-request, onlineDelivered Online

Price on Enquiry

Data Engineering on Google Cloud

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This class is intended for experienced developers who are responsible for managing big data transformations including: Extracting, loading, transforming, cleaning, and validating data. Designing pipelines and architectures for data processing. Creating and maintaining machine learning and statistical models. Querying datasets, visualizing query results and creating reports Overview Design and build data processing systems on Google Cloud Platform. Leverage unstructured data using Spark and ML APIs on Cloud Dataproc. Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow. Derive business insights from extremely large datasets using Google BigQuery. Train, evaluate and predict using machine learning models using TensorFlow and Cloud ML. Enable instant insights from streaming data Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hand-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data. INTRODUCTION TO DATA ENGINEERING * Explore the role of a data engineer. * Analyze data engineering challenges. * Intro to BigQuery. * Data Lakes and Data Warehouses. * Demo: Federated Queries with BigQuery. * Transactional Databases vs Data Warehouses. * Website Demo: Finding PII in your dataset with DLP API. * Partner effectively with other data teams. * Manage data access and governance. * Build production-ready pipelines. * Review GCP customer case study. * Lab: Analyzing Data with BigQuery. BUILDING A DATA LAKE * Introduction to Data Lakes. * Data Storage and ETL options on GCP. * Building a Data Lake using Cloud Storage. * Optional Demo: Optimizing cost with Google Cloud Storage classes and Cloud Functions. * Securing Cloud Storage. * Storing All Sorts of Data Types. * Video Demo: Running federated queries on Parquet and ORC files in BigQuery. * Cloud SQL as a relational Data Lake. * Lab: Loading Taxi Data into Cloud SQL. BUILDING A DATA WAREHOUSE * The modern data warehouse. * Intro to BigQuery. * Demo: Query TB+ of data in seconds. * Getting Started. * Loading Data. * Video Demo: Querying Cloud SQL from BigQuery. * Lab: Loading Data into BigQuery. * Exploring Schemas. * Demo: Exploring BigQuery Public Datasets with SQL using INFORMATION_SCHEMA. * Schema Design. * Nested and Repeated Fields. * Demo: Nested and repeated fields in BigQuery. * Lab: Working with JSON and Array data in BigQuery. * Optimizing with Partitioning and Clustering. * Demo: Partitioned and Clustered Tables in BigQuery. * Preview: Transforming Batch and Streaming Data. INTRODUCTION TO BUILDING BATCH DATA PIPELINES * EL, ELT, ETL. * Quality considerations. * How to carry out operations in BigQuery. * Demo: ELT to improve data quality in BigQuery. * Shortcomings. * ETL to solve data quality issues. EXECUTING SPARK ON CLOUD DATAPROC * The Hadoop ecosystem. * Running Hadoop on Cloud Dataproc. * GCS instead of HDFS. * Optimizing Dataproc. * Lab: Running Apache Spark jobs on Cloud Dataproc. SERVERLESS DATA PROCESSING WITH CLOUD DATAFLOW * Cloud Dataflow. * Why customers value Dataflow. * Dataflow Pipelines. * Lab: A Simple Dataflow Pipeline (Python/Java). * Lab: MapReduce in Dataflow (Python/Java). * Lab: Side Inputs (Python/Java). * Dataflow Templates. * Dataflow SQL. MANAGE DATA PIPELINES WITH CLOUD DATA FUSION AND CLOUD COMPOSER * Building Batch Data Pipelines visually with Cloud Data Fusion. * Components. * UI Overview. * Building a Pipeline. * Exploring Data using Wrangler. * Lab: Building and executing a pipeline graph in Cloud Data Fusion. * Orchestrating work between GCP services with Cloud Composer. * Apache Airflow Environment. * DAGs and Operators. * Workflow Scheduling. * Optional Long Demo: Event-triggered Loading of data with Cloud Composer, Cloud Functions, Cloud Storage, and BigQuery. * Monitoring and Logging. * Lab: An Introduction to Cloud Composer. INTRODUCTION TO PROCESSING STREAMING DATA * Processing Streaming Data. SERVERLESS MESSAGING WITH CLOUD PUB/SUB * Cloud Pub/Sub. * Lab: Publish Streaming Data into Pub/Sub. CLOUD DATAFLOW STREAMING FEATURES * Cloud Dataflow Streaming Features. * Lab: Streaming Data Pipelines. HIGH-THROUGHPUT BIGQUERY AND BIGTABLE STREAMING FEATURES * BigQuery Streaming Features. * Lab: Streaming Analytics and Dashboards. * Cloud Bigtable. * Lab: Streaming Data Pipelines into Bigtable. ADVANCED BIGQUERY FUNCTIONALITY AND PERFORMANCE * Analytic Window Functions. * Using With Clauses. * GIS Functions. * Demo: Mapping Fastest Growing Zip Codes with BigQuery GeoViz. * Performance Considerations. * Lab: Optimizing your BigQuery Queries for Performance. * Optional Lab: Creating Date-Partitioned Tables in BigQuery. INTRODUCTION TO ANALYTICS AND AI * What is AI?. * From Ad-hoc Data Analysis to Data Driven Decisions. * Options for ML models on GCP. PREBUILT ML MODEL APIS FOR UNSTRUCTURED DATA * Unstructured Data is Hard. * ML APIs for Enriching Data. * Lab: Using the Natural Language API to Classify Unstructured Text. BIG DATA ANALYTICS WITH CLOUD AI PLATFORM NOTEBOOKS * What's a Notebook. * BigQuery Magic and Ties to Pandas. * Lab: BigQuery in Jupyter Labs on AI Platform. PRODUCTION ML PIPELINES WITH KUBEFLOW * Ways to do ML on GCP. * Kubeflow. * AI Hub. * Lab: Running AI models on Kubeflow. CUSTOM MODEL BUILDING WITH SQL IN BIGQUERY ML * BigQuery ML for Quick Model Building. * Demo: Train a model with BigQuery ML to predict NYC taxi fares. * Supported Models. * Lab Option 1: Predict Bike Trip Duration with a Regression Model in BQML. * Lab Option 2: Movie Recommendations in BigQuery ML. CUSTOM MODEL BUILDING WITH CLOUD AUTOML * Why Auto ML? * Auto ML Vision. * Auto ML NLP. * Auto ML Tables.

Delivered on-request, onlineDelivered Online

Price on Enquiry

Python With Data Science

By Nexus Human

Duration 2 Days 12 CPD hours This course is intended for Audience: Data Scientists, Software Developers, IT Architects, and Technical Managers. Participants should have the general knowledge of statistics and programming Also familiar with Python Overview ? NumPy, pandas, Matplotlib, scikit-learn ? Python REPLs ? Jupyter Notebooks ? Data analytics life-cycle phases ? Data repairing and normalizing ? Data aggregation and grouping ? Data visualization ? Data science algorithms for supervised and unsupervised machine learning Covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. PYTHON FOR DATA SCIENCE * ? Using Modules * ? Listing Methods in a Module * ? Creating Your Own Modules * ? List Comprehension * ? Dictionary Comprehension * ? String Comprehension * ? Python 2 vs Python 3 * ? Sets (Python 3+) * ? Python Idioms * ? Python Data Science ?Ecosystem? * ? NumPy * ? NumPy Arrays * ? NumPy Idioms * ? pandas * ? Data Wrangling with pandas' DataFrame * ? SciPy * ? Scikit-learn * ? SciPy or scikit-learn? * ? Matplotlib * ? Python vs R * ? Python on Apache Spark * ? Python Dev Tools and REPLs * ? Anaconda * ? IPython * ? Visual Studio Code * ? Jupyter * ? Jupyter Basic Commands * ? Summary APPLIED DATA SCIENCE * ? What is Data Science? * ? Data Science Ecosystem * ? Data Mining vs. Data Science * ? Business Analytics vs. Data Science * ? Data Science, Machine Learning, AI? * ? Who is a Data Scientist? * ? Data Science Skill Sets Venn Diagram * ? Data Scientists at Work * ? Examples of Data Science Projects * ? An Example of a Data Product * ? Applied Data Science at Google * ? Data Science Gotchas * ? Summary DATA ANALYTICS LIFE-CYCLE PHASES * ? Big Data Analytics Pipeline * ? Data Discovery Phase * ? Data Harvesting Phase * ? Data Priming Phase * ? Data Logistics and Data Governance * ? Exploratory Data Analysis * ? Model Planning Phase * ? Model Building Phase * ? Communicating the Results * ? Production Roll-out * ? Summary REPAIRING AND NORMALIZING DATA * ? Repairing and Normalizing Data * ? Dealing with the Missing Data * ? Sample Data Set * ? Getting Info on Null Data * ? Dropping a Column * ? Interpolating Missing Data in pandas * ? Replacing the Missing Values with the Mean Value * ? Scaling (Normalizing) the Data * ? Data Preprocessing with scikit-learn * ? Scaling with the scale() Function * ? The MinMaxScaler Object * ? Summary DESCRIPTIVE STATISTICS COMPUTING FEATURES IN PYTHON * ? Descriptive Statistics * ? Non-uniformity of a Probability Distribution * ? Using NumPy for Calculating Descriptive Statistics Measures * ? Finding Min and Max in NumPy * ? Using pandas for Calculating Descriptive Statistics Measures * ? Correlation * ? Regression and Correlation * ? Covariance * ? Getting Pairwise Correlation and Covariance Measures * ? Finding Min and Max in pandas DataFrame * ? Summary DATA AGGREGATION AND GROUPING * ? Data Aggregation and Grouping * ? Sample Data Set * ? The pandas.core.groupby.SeriesGroupBy Object * ? Grouping by Two or More Columns * ? Emulating the SQL's WHERE Clause * ? The Pivot Tables * ? Cross-Tabulation * ? Summary DATA VISUALIZATION WITH MATPLOTLIB * ? Data Visualization ? What is matplotlib? ? Getting Started with matplotlib ? The Plotting Window ? The Figure Options ? The matplotlib.pyplot.plot() Function ? The matplotlib.pyplot.bar() Function ? The matplotlib.pyplot.pie () Function ? Subplots ? Using the matplotlib.gridspec.GridSpec Object ? The matplotlib.pyplot.subplot() Function ? Hands-on Exercise ? Figures ? Saving Figures to File ? Visualization with pandas ? Working with matplotlib in Jupyter Notebooks ? Summary DATA SCIENCE AND ML ALGORITHMS IN SCIKIT-LEARN * ? Data Science, Machine Learning, AI? * ? Types of Machine Learning * ? Terminology: Features and Observations * ? Continuous and Categorical Features (Variables) * ? Terminology: Axis * ? The scikit-learn Package * ? scikit-learn Estimators * ? Models, Estimators, and Predictors * ? Common Distance Metrics * ? The Euclidean Metric * ? The LIBSVM format * ? Scaling of the Features * ? The Curse of Dimensionality * ? Supervised vs Unsupervised Machine Learning * ? Supervised Machine Learning Algorithms * ? Unsupervised Machine Learning Algorithms * ? Choose the Right Algorithm * ? Life-cycles of Machine Learning Development * ? Data Split for Training and Test Data Sets * ? Data Splitting in scikit-learn * ? Hands-on Exercise * ? Classification Examples * ? Classifying with k-Nearest Neighbors (SL) * ? k-Nearest Neighbors Algorithm * ? k-Nearest Neighbors Algorithm * ? The Error Rate * ? Hands-on Exercise * ? Dimensionality Reduction * ? The Advantages of Dimensionality Reduction * ? Principal component analysis (PCA) * ? Hands-on Exercise * ? Data Blending * ? Decision Trees (SL) * ? Decision Tree Terminology * ? Decision Tree Classification in Context of Information Theory * ? Information Entropy Defined * ? The Shannon Entropy Formula * ? The Simplified Decision Tree Algorithm * ? Using Decision Trees * ? Random Forests * ? SVM * ? Naive Bayes Classifier (SL) * ? Naive Bayesian Probabilistic Model in a Nutshell * ? Bayes Formula * ? Classification of Documents with Naive Bayes * ? Unsupervised Learning Type: Clustering * ? Clustering Examples * ? k-Means Clustering (UL) * ? k-Means Clustering in a Nutshell * ? k-Means Characteristics * ? Regression Analysis * ? Simple Linear Regression Model * ? Linear vs Non-Linear Regression * ? Linear Regression Illustration * ? Major Underlying Assumptions for Regression Analysis * ? Least-Squares Method (LSM) * ? Locally Weighted Linear Regression * ? Regression Models in Excel * ? Multiple Regression Analysis * ? Logistic Regression * ? Regression vs Classification * ? Time-Series Analysis * ? Decomposing Time-Series * ? Summary LAB EXERCISES * Lab 1 - Learning the Lab Environment * Lab 2 - Using Jupyter Notebook * Lab 3 - Repairing and Normalizing Data * Lab 4 - Computing Descriptive Statistics * Lab 5 - Data Grouping and Aggregation * Lab 6 - Data Visualization with matplotlib * Lab 7 - Data Splitting * Lab 8 - k-Nearest Neighbors Algorithm * Lab 9 - The k-means Algorithm * Lab 10 - The Random Forest Algorithm

Delivered on-request, onlineDelivered Online

Price on Enquiry

Cisco Implementing Cisco Tetration Analytics v1.0 (DCITET)

By Nexus Human

Duration 3 Days 18 CPD hours This course is intended for Network Security Operations Workload Application Administrators Security Operations Field Engineers Network Engineers Systems Engineers Technical Solutions Architects Cisco Integrators and Partners Overview After taking this course, you should be able to: Define the Cisco telemetry and analytics approach. Explore common scenarios that Cisco Tetration Analytics can solve. Describe how the Cisco Tetration Analytics platform collects telemetry and other context information. Discuss how relative agents are installed and configured. Explore the operational aspects of the Cisco Tetration Analytics platform. Describe the Cisco Tetration Analytics support for application visibility or application insight based on the Application Dependency Mapping (ADM) feature. List the concepts of the intent-based declarative network management automation model. Describe the Cisco Tetration policy enforcement pipeline, components, functions, and implementation of application policy. Describe how to use Cisco Tetration Analytics for workload protection in order to provide a secure infrastructure for business-critical applications and data. Describe Cisco Tetration Analytics platform use cases in the modern heterogeneous, multicloud data center. List the options for the Cisco Tetration Analytics platform enhancements. Explain how to perform the Cisco Tetration Analytics administration. This course teaches how to deploy, use, and operate CiscoÂ© Tetration Analytics? platform for comprehensive workload-protection and application and network insights across a multicloud infrastructure. You will learn how the Cisco Tetration Analytics platform uses streaming telemetry, behavioral analysis, unsupervised machine learning, analytical intelligence, and big data analytics to deliver pervasive visibility, automated intent-based policy, workload protection, and performance management. EXPLORING CISCO TETRATION * Data Center Challenges * Define and Position Cisco Tetration * Cisco Tetration Features * Cisco Tetration Architecture * Cisco Tetration Deployment Models * Cisco Tetration GUI Overview IMPLEMENTING AND OPERATING CISCO TETRATION * Explore Data Collection * Install the Software Agent * Install the Hardware Agent * Import Context Data * Describe Cisco Tetration Operational Concepts EXAMINING CISCO TETRATION ADM AND APPLICATION INSIGHT * Describe Cisco Tetration Application Insight * Perform ADM * Interpret ADM Results Application Visibility EXAMINING CISCO TETRATION INTENT-BASED NETWORKING * Describe Intent-Based Policy * Examine Policy Features * Implement Policies ENFORCING TETRATION POLICY PIPELINE AND COMPLIANCE * Examine Policy Enforcement * Implement Application Policy * Examine Policy Compliance Verification and Simulation EXAMINING TETRATION SECURITY USE CASES * Examine Workload Security * Attack Prevention * Attack Detection * Attack Remediation EXAMINING IT OPERATIONS USE CASES * Key Features and IT Operations Use Cases * Performing Operations in Neighborhood App-based Use Cases EXAMINING PLATFORM ENHANCEMENT USE CASES * Integrations and Advanced Features * Third-party Integration Examples * Explore Data Platform Capabilities EXPLORING CISCO TETRATION ANALYTICS ADMINISTRATION * Examine User Authentication and Authorization * Examine Cluster Management * Configure Alerts and Syslog ADDITIONAL COURSE DETAILS: Nexus Humans Cisco Implementing Cisco Tetration Analytics v1.0 (DCITET) training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Cisco Implementing Cisco Tetration Analytics v1.0 (DCITET) course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Cisco Implementing Cisco Tetration Analytics v1.0 (DCITET)

Delivered on-request, onlineDelivered Online

Price on Enquiry

DP-601T00 Implementing a Lakehouse with Microsoft Fabric

By Nexus Human

Duration 1 Days 6 CPD hours This course is intended for The primary audience for this course is data professionals who are familiar with data modeling, extraction, and analytics. It is designed for professionals who are interested in gaining knowledge about Lakehouse architecture, the Microsoft Fabric platform, and how to enable end-to-end analytics using these technologies. Job role: Data Analyst, Data Engineer, Data Scientist Overview Describe end-to-end analytics in Microsoft Fabric Describe core features and capabilities of lakehouses in Microsoft Fabric Create a lakehouse Ingest data into files and tables in a lakehouse Query lakehouse tables with SQL Configure Spark in a Microsoft Fabric workspace Identify suitable scenarios for Spark notebooks and Spark jobs Use Spark dataframes to analyze and transform data Use Spark SQL to query data in tables and views Visualize data in a Spark notebook Understand Delta Lake and delta tables in Microsoft Fabric Create and manage delta tables using Spark Use Spark to query and transform data in delta tables Use delta tables with Spark structured streaming Describe Dataflow (Gen2) capabilities in Microsoft Fabric Create Dataflow (Gen2) solutions to ingest and transform data Include a Dataflow (Gen2) in a pipeline This course is designed to build your foundational skills in data engineering on Microsoft Fabric, focusing on the Lakehouse concept. This course will explore the powerful capabilities of Apache Spark for distributed data processing and the essential techniques for efficient data management, versioning, and reliability by working with Delta Lake tables. This course will also explore data ingestion and orchestration using Dataflows Gen2 and Data Factory pipelines. This course includes a combination of lectures and hands-on exercises that will prepare you to work with lakehouses in Microsoft Fabric. INTRODUCTION TO END-TO-END ANALYTICS USING MICROSOFT FABRIC * Explore end-to-end analytics with Microsoft Fabric * Data teams and Microsoft Fabric * Enable and use Microsoft Fabric * Knowledge Check GET STARTED WITH LAKEHOUSES IN MICROSOFT FABRIC * Explore the Microsoft Fabric Lakehouse * Work with Microsoft Fabric Lakehouses * Exercise - Create and ingest data with a Microsoft Fabric Lakehouse USE APACHE SPARK IN MICROSOFT FABRIC * Prepare to use Apache Spark * Run Spark code * Work with data in a Spark dataframe * Work with data using Spark SQL * Visualize data in a Spark notebook * Exercise - Analyze data with Apache Spark WORK WITH DELTA LAKE TABLES IN MICROSOFT FABRIC * Understand Delta Lake * Create delta tables * Work with delta tables in Spark * Use delta tables with streaming data * Exercise - Use delta tables in Apache Spark INGEST DATA WITH DATAFLOWS GEN2 IN MICROSOFT FABRIC * Understand Dataflows (Gen2) in Microsoft Fabric * Explore Dataflows (Gen2) in Microsoft Fabric * Integrate Dataflows (Gen2) and Pipelines in Microsoft Fabric * Exercise - Create and use a Dataflow (Gen2) in Microsoft Fabric

DP-601T00 Implementing a Lakehouse with Microsoft Fabric

Delivered OnlineTwo days, Aug 26th, 13:00 + 2 more

£595

Advanced Data Analysis and Reconciliation

4.3(6)

By dbrownconsulting

Advanced Data Analysis and Reconciliation

Delivered Online3 weeks, Oct 22nd, 08:00

£900

Cloudera Training for Apache HBase

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is appropriate for developers and administrators who intend to use HBase. Overview Skills learned on the course include:The use cases and usage occasions for HBase, Hadoop, and RDBMSUsing the HBase shell to directly manipulate HBase tablesDesigning optimal HBase schemas for efficient data storage and recoveryHow to connect to HBase using the Java API, configure the HBase cluster, and administer an HBase clusterBest practices for identifying and resolving performance bottlenecks Cloudera University?s four-day training course for Apache HBase enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second. INTRODUCTION TO HADOOP & HBASE * What Is Big Data? * Introducing Hadoop * Hadoop Components * What Is HBase? * Why Use HBase? * Strengths of HBase * HBase in Production * Weaknesses of HBase HBASE TABLES * HBase Concepts * HBase Table Fundamentals * Thinking About Table Design THE HBASE SHELL * Creating Tables with the HBase Shell * Working with Tables * Working with Table Data HBASE ARCHITECTURE FUNDAMENTALS * HBase Regions * HBase Cluster Architecture * HBase and HDFS Data Locality HBASE SCHEMA DESIGN * General Design Considerations * Application-Centric Design * Designing HBase Row Keys * Other HBase Table Features BASIC DATA ACCESS WITH THE HBASE API * Options to Access HBase Data * Creating and Deleting HBase Tables * Retrieving Data with Get * Retrieving Data with Scan * Inserting and Updating Data * Deleting Data MORE ADVANCED HBASE API FEATURES * Filtering Scans * Best Practices * HBase Coprocessors HBASE ON THE CLUSTER * How HBase Uses HDFS * Compactions and Splits HBASE READS & WRITES * How HBase Writes Data * How HBase Reads Data * Block Caches for Reading HBASE PERFORMANCE TUNING * Column Family Considerations * Schema Design Considerations * Configuring for Caching * Dealing with Time Series and Sequential Data * Pre-Splitting Regions HBASE ADMINISTRATION AND CLUSTER MANAGEMENT * HBase Daemons * ZooKeeper Considerations * HBase High Availability * Using the HBase Balancer * Fixing Tables with hbck * HBase Security HBASE REPLICATION & BACKUP * HBase Replication * HBase Backup * MapReduce and HBase Clusters USING HIVE & IMPALA WITH HBASE * Using Hive and Impala with HBase APPENDIX A: ACCESSING DATA WITH PYTHON AND THRIFT * Thrift Usage * Working with Tables * Getting and Putting Data * Scanning Data * Deleting Data * Counters * Filters APPENDIX B: OPENTSDB

Delivered on-request, onlineDelivered Online

Price on Enquiry

Mastering Scala with Apache Spark for the Modern Data Enterprise (TTSK7520)

By Nexus Human

Duration 5 Days 30 CPD hours This course is intended for This intermediate and beyond level course is geared for experienced technical professionals in various roles, such as developers, data analysts, data engineers, software engineers, and machine learning engineers who want to leverage Scala and Spark to tackle complex data challenges and develop scalable, high-performance applications across diverse domains. Practical programming experience is required to participate in the hands-on labs. Overview Working in a hands-on learning environment led by our expert instructor you'll: Develop a basic understanding of Scala and Apache Spark fundamentals, enabling you to confidently create scalable and high-performance applications. Learn how to process large datasets efficiently, helping you handle complex data challenges and make data-driven decisions. Gain hands-on experience with real-time data streaming, allowing you to manage and analyze data as it flows into your applications. Acquire practical knowledge of machine learning algorithms using Spark MLlib, empowering you to create intelligent applications and uncover hidden insights. Master graph processing with GraphX, enabling you to analyze and visualize complex relationships in your data. Discover generative AI technologies using GPT with Spark and Scala, opening up new possibilities for automating content generation and enhancing data analysis. Embark on a journey to master the world of big data with our immersive course on Scala and Spark! Mastering Scala with Apache Spark for the Modern Data Enterprise is a five day hands on course designed to provide you with the essential skills and tools to tackle complex data projects using Scala programming language and Apache Spark, a high-performance data processing engine. Mastering these technologies will enable you to perform a wide range of tasks, from data wrangling and analytics to machine learning and artificial intelligence, across various industries and applications.Guided by our expert instructor, you?ll explore the fundamentals of Scala programming and Apache Spark while gaining valuable hands-on experience with Spark programming, RDDs, DataFrames, Spark SQL, and data sources. You?ll also explore Spark Streaming, performance optimization techniques, and the integration of popular external libraries, tools, and cloud platforms like AWS, Azure, and GCP. Machine learning enthusiasts will delve into Spark MLlib, covering basics of machine learning algorithms, data preparation, feature extraction, and various techniques such as regression, classification, clustering, and recommendation systems. INTRODUCTION TO SCALA * Brief history and motivation * Differences between Scala and Java * Basic Scala syntax and constructs * Scala's functional programming features INTRODUCTION TO APACHE SPARK * Overview and history * Spark components and architecture * Spark ecosystem * Comparing Spark with other big data frameworks BASICS OF SPARK PROGRAMMING SPARKCONTEXT AND SPARKSESSION * Resilient Distributed Datasets (RDDs) * Transformations and Actions * Working with DataFrames SPARK SQL AND DATA SOURCES * Spark SQL library and its advantages * Structured and semi-structured data sources * Reading and writing data in various formats (CSV, JSON, Parquet, Avro, etc.) * Data manipulation using SQL queries BASIC RDD OPERATIONS * Creating and manipulating RDDs * Common transformations and actions on RDDs * Working with key-value data BASIC DATAFRAME AND DATASET OPERATIONS * Creating and manipulating DataFrames and Datasets * Column operations and functions * Filtering, sorting, and aggregating data INTRODUCTION TO SPARK STREAMING * Overview of Spark Streaming * Discretized Stream (DStream) operations * Windowed operations and stateful processing PERFORMANCE OPTIMIZATION BASICS * Best practices for efficient Spark code * Broadcast variables and accumulators * Monitoring Spark applications INTEGRATING EXTERNAL LIBRARIES AND TOOLS, SPARK STREAMING * Using popular external libraries, such as Hadoop and HBase * Integrating with cloud platforms: AWS, Azure, GCP * Connecting to data storage systems: HDFS, S3, Cassandra, etc. INTRODUCTION TO MACHINE LEARNING BASICS * Overview of machine learning * Supervised and unsupervised learning * Common algorithms and use cases INTRODUCTION TO SPARK MLLIB * Overview of Spark MLlib * MLlib's algorithms and utilities * Data preparation and feature extraction LINEAR REGRESSION AND CLASSIFICATION * Linear regression algorithm * Logistic regression for classification * Model evaluation and performance metrics CLUSTERING ALGORITHMS * Overview of clustering algorithms * K-means clustering * Model evaluation and performance metrics COLLABORATIVE FILTERING AND RECOMMENDATION SYSTEMS * Overview of recommendation systems * Collaborative filtering techniques * Implementing recommendations with Spark MLlib INTRODUCTION TO GRAPH PROCESSING * Overview of graph processing * Use cases and applications of graph processing * Graph representations and operations * Introduction to Spark GraphX * Overview of GraphX * Creating and transforming graphs * Graph algorithms in GraphX BIG DATA INNOVATION! USING GPT AND GENERATIVE AI TECHNOLOGIES WITH SPARK AND SCALA * Overview of generative AI technologies * Integrating GPT with Spark and Scala * Practical applications and use cases Bonus Topics / Time Permitting INTRODUCTION TO SPARK NLP * Overview of Spark NLP Preprocessing text data * Text classification and sentiment analysis PUTTING IT ALL TOGETHER * Work on a capstone project that integrates multiple aspects of the course, including data processing, machine learning, graph processing, and generative AI technologies.

Mastering Scala with Apache Spark for the Modern Data Enterprise (TTSK7520)

Delivered on-request, onlineDelivered Online

Price on Enquiry

Cloudera Data Scientist Training

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. Overview Overview of data science and machine learning at scale Overview of the Hadoop ecosystem Working with HDFS data and Hive tables using Hue Introduction to Cloudera Data Science Workbench Overview of Apache Spark 2 Reading and writing data Inspecting data quality Cleansing and transforming data Summarizing and grouping data Combining, splitting, and reshaping data Exploring data Configuring, monitoring, and troubleshooting Spark applications Overview of machine learning in Spark MLlib Extracting, transforming, and selecting features Building and evaluating regression models Building and evaluating classification models Building and evaluating clustering models Cross-validating models and tuning hyperparameters Building machine learning pipelines Deploying machine learning models Spark, Spark SQL, and Spark MLlib PySpark and sparklyr Cloudera Data Science Workbench (CDSW) Hue This workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges. Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment. The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful. OVERVIEW OF DATA SCIENCE AND MACHINE LEARNING AT SCALE OVERVIEW OF THE HADOOP ECOSYSTEM WORKING WITH HDFS DATA AND HIVE TABLES USING HUE INTRODUCTION TO CLOUDERA DATA SCIENCE WORKBENCH OVERVIEW OF APACHE SPARK 2 READING AND WRITING DATA INSPECTING DATA QUALITY CLEANSING AND TRANSFORMING DATA SUMMARIZING AND GROUPING DATA COMBINING, SPLITTING, AND RESHAPING DATA EXPLORING DATA CONFIGURING, MONITORING, AND TROUBLESHOOTING SPARK APPLICATIONS OVERVIEW OF MACHINE LEARNING IN SPARK MLLIB EXTRACTING, TRANSFORMING, AND SELECTING FEATURES BUILDING AND EVAUATING REGRESSION MODELS BUILDING AND EVALUATING CLASSIFICATION MODELS BUILDING AND EVALUATING CLUSTERING MODELS CROSS-VALIDATING MODELS AND TUNING HYPERPARAMETERS BUILDING MACHINE LEARNING PIPELINES DEPLOYING MACHINE LEARNING MODELS ADDITIONAL COURSE DETAILS: Nexus Humans Cloudera Data Scientist Training training program is a workshop that presents an invigorating mix of sessions, lessons, and masterclasses meticulously crafted to propel your learning expedition forward. This immersive bootcamp-style experience boasts interactive lectures, hands-on labs, and collaborative hackathons, all strategically designed to fortify fundamental concepts. Guided by seasoned coaches, each session offers priceless insights and practical skills crucial for honing your expertise. Whether you're stepping into the realm of professional skills or a seasoned professional, this comprehensive course ensures you're equipped with the knowledge and prowess necessary for success. While we feel this is the best course for the Cloudera Data Scientist Training course and one of our Top 10 we encourage you to read the course outline to make sure it is the right content for you. Additionally, private sessions, closed classes or dedicated events are available both live online and at our training centres in Dublin and London, as well as at your offices anywhere in the UK, Ireland or across EMEA.

Delivered on-request, onlineDelivered Online

Price on Enquiry

Designing and Building Big Data Applications

By Nexus Human

Duration 4 Days 24 CPD hours This course is intended for This course is best suited to developers, engineers, and architects who want to use use Hadoop and related tools to solve real-world problems. Overview Skills learned in this course include:Creating a data set with Kite SDKDeveloping custom Flume components for data ingestionManaging a multi-stage workflow with OozieAnalyzing data with CrunchWriting user-defined functions for Hive and ImpalaWriting user-defined functions for Hive and ImpalaIndexing data with Cloudera Search Cloudera University?s four-day course for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH). INTRODUCTION APPLICATION ARCHITECTURE * Scenario Explanation * Understanding the Development Environment * Identifying and Collecting Input Data * Selecting Tools for Data Processing and Analysis * Presenting Results to the Use DEFINING & USING DATASETS * Metadata Management * What is Apache Avro? * Avro Schemas * Avro Schema Evolution * Selecting a File Format * Performance Considerations USING THE KITE SDK DATA MODULE * What is the Kite SDK? * Fundamental Data Module Concepts * Creating New Data Sets Using the Kite SDK * Loading, Accessing, and Deleting a Data Set IMPORTING RELATIONAL DATA WITH APACHE SQOOP * What is Apache Sqoop? * Basic Imports * Limiting Results * Improving Sqoop?s Performance * Sqoop 2 CAPTURING DATA WITH APACHE FLUME * What is Apache Flume? * Basic Flume Architecture * Flume Sources * Flume Sinks * Flume Configuration * Logging Application Events to Hadoop DEVELOPING CUSTOM FLUME COMPONENTS * Flume Data Flow and Common Extension Points * Custom Flume Sources * Developing a Flume Pollable Source * Developing a Flume Event-Driven Source * Custom Flume Interceptors * Developing a Header-Modifying Flume Interceptor * Developing a Filtering Flume Interceptor * Writing Avro Objects with a Custom Flume Interceptor MANAGING WORKFLOWS WITH APACHE OOZIE * The Need for Workflow Management * What is Apache Oozie? * Defining an Oozie Workflow * Validation, Packaging, and Deployment * Running and Tracking Workflows Using the CLI * Hue UI for Oozie PROCESSING DATA PIPELINES WITH APACHE CRUNCH * What is Apache Crunch? * Understanding the Crunch Pipeline * Comparing Crunch to Java MapReduce * Working with Crunch Projects * Reading and Writing Data in Crunch * Data Collection API Functions * Utility Classes in the Crunch API WORKING WITH TABLES IN APACHE HIVE * What is Apache Hive? * Accessing Hive * Basic Query Syntax * Creating and Populating Hive Tables * How Hive Reads Data * Using the RegexSerDe in Hive DEVELOPING USER-DEFINED FUNCTIONS * What are User-Defined Functions? * Implementing a User-Defined Function * Deploying Custom Libraries in Hive * Registering a User-Defined Function in Hive EXECUTING INTERACTIVE QUERIES WITH IMPALA * What is Impala? * Comparing Hive to Impala * Running Queries in Impala * Support for User-Defined Functions * Data and Metadata Management UNDERSTANDING CLOUDERA SEARCH * What is Cloudera Search? * Search Architecture * Supported Document Formats INDEXING DATA WITH CLOUDERA SEARCH * Collection and Schema Management * Morphlines * Indexing Data in Batch Mode * Indexing Data in Near Real Time PRESENTING RESULTS TO USERS * Solr Query Syntax * Building a Search UI with Hue * Accessing Impala through JDBC * Powering a Custom Web Application with Impala and Search

Designing and Building Big Data Applications

Delivered on-request, onlineDelivered Online

Price on Enquiry

1 2 3

Educators matching "Big Data Analytics"

Show all 1

empower uk employment training

Bradford

Welcome to Empower UK Employment Training, where your professional growth is our mission. We are a leading provider of bespoke education, offering tailored courses designed to meet the unique needs of every learner. Our team of skilled counsellors are dedicated to providing expert career guidance, helping you navigate your career path with confidence. WE WORK WITH LEADING INTERNATIONAL BRANDS AND BUSINESSES At Empower UK, we understand the importance of continuous professional development. That’s why our courses are designed to not only equip you with the skills you need today but also to foster your long-term career progression. Join us at Empower UK Employment Training and take the next step in your professional journey. WHY CHOOSE US? INTERACTIVE LEARNING SESSIONS AND COURSE PLANS One of the greatest advantages of joining Empower UK’s courses is the opportunity for knowledge acquisition and skill enhancement. Our courses are rich sources of industry-relevant information, perfect for those seeking to upskill. As your career guides, we understand your needs. CPD PROGRESSION Our courses are designed to aid you in your continuous professional development. EFFORTLESS ELEARNING EXPERIENCE Empower UK offers an engaging and informative platform for all learners, new and seasoned, delivering valuable content that will refine your skills. It’s an excellent way to build relationships with your peers, increase your knowledge, and create awareness of your professional potential. OPTIMISED FOR ANY DEVICE Our innovative learning platform is designed with your convenience in mind. Whether you're using a mobile, laptop, or tablet, you can access our courses anytime, anywhere. It's a strategic approach to learning, tailored to your goals and flexible to your lifestyle. AUDIENCE ENGAGEMENT We believe in nurturing our learners, helping them to continually develop their skills and knowledge. ASK US ANY QUESTION This might seem straightforward, but we encourage our learners to ask questions anytime. This not only enriches your learning experience but also fosters a deeper connection with the Empower UK community. Dive into our engaging courses and start your journey towards career advancement today.

Message