Srbija Posted October 8, 2022 #1 Posted October 8, 2022 Spark Sql And Pyspark 3 Using Python 3 Hands-On With Labs Last updated 8/2022 MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz Language: English | Size: 10.03 GB | Duration: 32h 12m A Comprehensive Course on Spark SQL as well as Data Frame APIs using Python 3 with complementary lab access What you'll learn Setup the Single Node Hadoop and Spark using Docker locally or on AWS Cloud9 Review ITVersity Labs (exclusively for ITVersity Lab Customers) All the HDFS Commands that are relevant to validate files and folders in HDFS. Quick recap of Python which is relevant to learn Spark Ability to use Spark SQL to solve the problems using SQL style syntax. Pyspark Dataframe APIs to solve the problems using Dataframe style APIs. Relevance of Spark Metastore to convert Dataframs into Temporary Views so that one can process data in Dataframes using Spark SQL. Apache Spark Application Development Life Cycle Apache Spark Application Execution Life Cycle and Spark UI Setup SSH Proxy to access Spark Application logs Deployment Modes of Spark Applications (Cluster and Client) Passing Application Properties Files and External Dependencies while running Spark Applications Requirements Basic programming skills using any programming language Self support lab (Instructions provided) or ITVersity lab at additional cost for appropriate environment. Minimum memory required based on the environment you are using with 64 bit operating system 4 GB RAM with access to proper clusters or 16 GB RAM to setup environment using Docker Description As part of this course, you will learn all the key skills to build Data Engineering Pipelines using Spark SQL and Spark Data Frame APIs using Python as a Programming language. This course used to be a CCA 175 Spark and Hadoop Developer course for the preparation for the Certification Exam. As of 10/31/2021, the exam is sunset and we have renamed it to Apache Spark 2 and 3 using Python 3 as it covers industry-relevant topics beyond the scope of certification.About Data EngineeringData Engineering is nothing but processing the data depending upon our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc. Apache Spark is evolved as a leading technology to take care of Data Engineering at scale.I have prepared this course for anyone who would like to transition into a Data Engineer role using Pyspark (Python + Spark). I myself am a proven Data Engineering Solution Architect with proven experience in designing solutions using Apache Spark.Let us go through the details about what you will be learning in this course. Keep in mind that the course is created with a lot of hands-on tasks which will give you enough practice using the right tools. Also, there are tons of tasks and exercises to evaluate yourself. We will provide details about Resources or Environments to learn Spark SQL and PySpark 3 using Python 3 as well as Reference Material on GitHub to practice Spark SQL and PySpark 3 using Python 3. Keep in mind that you can either use the cluster at your workplace or set up the environment using provided instructions or use ITVersity Lab to take this course.Setup of Single Node Big Data ClusterMany of you would like to transition to Big Data from Conventional Technologies such as Mainframes, Oracle PL/SQL, etc and you might not have access to Big Data Clusters. It is very important for you set up the environment in the right manner. Don't worry if you do not have the cluster handy, we will guide you through support via Udemy Q&A.Setup Ubuntu-based AWS Cloud9 Instance with the right configurationEnsure Docker is setupSetup Jupyter Lab and other key componentsSetup and Validate Hadoop, Hive, YARN, and SparkAre you feeling a bit overwhelmed about setting up the environment? Don't worry!!! We will provide complementary lab access for up to 2 months. Here are the details.Training using an interactive environment. You will get 2 weeks of lab access, to begin with. If you like the environment, and acknowledge it by providing a 5* rating and feedback, the lab access will be extended to additional 6 weeks (2 months). Feel free to send an email to [email protected] to get complementary lab access. Also, if your employer provides a multi-node environment, we will help you set up the material for the practice as part of the live session. On top of Q&A Support, we also provide required support via live sessions.A quick recap of PythonThis course requires a decent knowledge of Python. To make sure you understand Spark from a Data Engineering perspective, we added a module to quickly warm up with Python. If you are not familiar with Python, then we suggest you go through our other course Data Engineering Essentials - Python, SQL, and Spark.Master required Hadoop Skills to build Data Engineering ApplicationsAs part of this section, you will primarily focus on HDFS commands so that we can copy files into HDFS. The data copied into HDFS will be used as part of building data engineering pipelines using Spark and Hadoop with Python as the Programming Language.Overview of HDFS CommandsCopy Files into HDFS using the put or copyFromLocal command using appropriate HDFS CommandsReview whether the files are copied properly or not to HDFS using HDFS Commands.Get the size of the files using HDFS commands such as du, df, etc.Some fundamental concepts related to HDFS such as block size, replication factor, etc.Data Engineering using Spark SQLLet us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.Getting Started with Spark SQLBasic Transformations using Spark SQLManaging Tables - Basic DDL and DML in Spark SQLManaging Tables - DML and Create Partitioned Tables using Spark SQLOverview of Spark SQL Functions to manipulate strings, dates, null values, etcWindowing Functions using Spark SQL for ranking, advanced aggregations, etc.Data Engineering using Spark Data Frame APIsSpark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.Data Processing Overview using Spark or Pyspark Data Frame APIs.Projecting or Selecting data from Spark Data Frames, renaming columns, providing aliases, dropping columns from Data Frames, etc using Pyspark Data Frame APIs.Processing Column Data using Spark or Pyspark Data Frame APIs - You will be learning functions to manipulate strings, dates, null values, etc.Basic Transformations on Spark Data Frames using Pyspark Data Frame APIs such as Filtering, Aggregations, and Sorting using functions such as filter/where, groupBy with agg, sort or orderBy, etc.Joining Data Sets on Spark Data Frames using Pyspark Data Frame APIs such as join. You will learn inner joins, outer joins, etc using the right examples.Windowing Functions on Spark Data Frames using Pyspark Data Frame APIs to perform advanced Aggregations, Ranking, and Analytic FunctionsSpark Metastore Databases and Tables and integration between Spark SQL and Data Frame APIsApache Spark Application Development and Deployment Life CycleOnce you go through the content related to Spark using Jupyter-based environment, we will also walk you through the details about how the Spark applications are typically developed using Python, deployed as well as reviewed.Setup Python Virtual Environment and Project for Spark Application Development using PycharmUnderstand complete Spark Application Development Lifecycle using Pycharm and PythonBuild zip file for the Spark Application, copy to the environment where it is supposed to run and run.Understand how to review the Spark Application Execution Life Cycle.All the demos are given on our state-of-the-art Big Data cluster. You can avail of one-month complimentary lab access by reaching out to [email protected] with a Udemy receipt. Overview Section 1: Introduction about Spark SQL and PySpark 3 using Python 3 Lecture 1 Introduction to Spark SQL and PySpark 3 using Python 3 Lecture 2 Curriculum for Spark SQL and Pyspark 3 using Python 3 Lecture 3 Purchasing the Spark SQL and PySpark using Python 3 Course Lecture 4 Introduction to Udemy Course Landing Page Lecture 5 Overview of Udemy Course or Video Player Lecture 6 Adding Notes to Course Lectures Lecture 7 Using Course Sidebar to move between lectures Lecture 8 Overview of Support to ITVersity courses on Udemy Lecture 9 Best Practices to get ITVersity Support using Udemy Lecture 10 Resources for Spark SQL and Pyspark 3 using Python 3 Lecture 11 Material for Spark SQL and PySpark 3 using Python 3 Lecture 12 Become Part of ITVersity Data Engineering Community Lecture 13 Rate and Leave Feedback - Spark SQL and PySpark 3 using Python 3 Lecture 14 Udemy for Business Customers - Important Information for about labs for practice Section 2: Using ITVersity Labs for hands-on practice (for ITVersity Lab Customers only) Lecture 15 Setup Development Environment using VS Code Remote Development Extension Pack Lecture 16 Review Data Sets Provided as part of Gateway Nodes of Hadoop and Spark Cluster Lecture 17 Validate HDFS on Multi Node Hadoop and Spark Cluster from Gateway Node Lecture 18 Validate Hive on Hadoop and Spark Multinode Cluster Lecture 19 Review Hadoop HDFS and YARN Property Files on Hadoop and Spark Cluster Lecture 20 Review Hadoop HDFS and YARN Property Files using Visual Studio Code Editor Lecture 21 Review Hive Property Files on Multinode Hadoop and Spark Cluster Lecture 22 Review Spark 2 Property Files and Important Properties Lecture 23 Validate Spark Shell CLI using Spark 2 Lecture 24 Validate Pyspark CLI using Spark 2 Lecture 25 Validate Spark SQL CLI using Spark 2 Lecture 26 Review Spark 3 Property Files and Important Properties Lecture 27 Validate Spark Shell CLI using Spark 3 Lecture 28 Validate Pyspark CLI using Spark 3 Lecture 29 Validate Spark SQL CLI using Spark 3 Section 3: Setup Hadoop and Spark Single Node Cluster on Windows 11 using Docker Lecture 30 Prerequisites for Single Node Hadoop and Spark Cluster on Windows Lecture 31 Overview of Windows System Configuration Lecture 32 Setup Ubuntu on Windows 11 using wsl Lecture 33 Setup and Validate Ubuntu VM on Windows using wsl Lecture 34 Install Docker Desktop on Windows 11 using wsl2 Lecture 35 Overview of Docker Desktop on Windows 11 Lecture 36 Validate Docker Commands using Windows Powershell as well as wsl Ubuntu Lecture 37 Setup Visual Studio Code IDE on Windows Lecture 38 Install Visual Studio Code Extension for Remote Development Lecture 39 Clone GitHub Repository for Pyspark Course using Visual Studio Code Lecture 40 Launching Terminal using Visual Studio Code and WSL Lecture 41 Review Docker Compose File to setup Hadoop and Spark Lab Lecture 42 Start Hadoop and Spark Lab along with Jupyter Lab on Windows 11 Lecture 43 Review the resource utilization of Windows for Hadoop and Spark Lab Lecture 44 Review Docker Desktop for Hadoop and Spark Lab using Docker Lecture 45 Overview of Docker Compose Commands to manage Hadoop and Spark Lab Lecture 46 Validate Hadoop and Spark setup using Docker on Windows Section 4: Setup Hadoop and Spark Single Node Cluster on AWS Cloud9 using Docker Lecture 47 Getting Started with AWS Cloud9 Lecture 48 Creating AWS Cloud9 Environment Lecture 49 Warming up with AWS Cloud9 IDE Lecture 50 Review Operating System Details on AWS Cloud9 Lecture 51 Overview of EC2 Instance related to AWS Cloud9 Lecture 52 Opening ports for AWS Cloud9 Instance Lecture 53 Associating Elastic IPs to AWS Cloud9 Instance Lecture 54 Increase EBS Volume Size of AWS Cloud9 Instance Lecture 55 Setup Docker Compose on AWS Cloud9 Instance Lecture 56 Clone GitHub Repository on AWS Cloud9 for the Course Material Lecture 57 Review Docker Compose File to setup Hadoop and Spark Lab Lecture 58 Start Hadoop and Spark Lab along with Jupyter Lab on Windows 11 Lecture 59 Overview of Docker Compose Commands to manage Hadoop and Spark Lab Lecture 60 Validate Hadoop and Spark setup using Docker Section 5: Python Fundamentals Lecture 61 Introduction and Setting up Python Lecture 62 Basic Programming Constructs Lecture 63 Functions in Python Lecture 64 Python Collections Lecture 65 Map Reduce operations on Python Collections Lecture 66 Setting up Data Sets for Basic I/O Operations Lecture 67 Basic I/O operations and processing data using Collections Section 6: Overview of Hadoop HDFS Commands Lecture 68 Getting help or usage Lecture 69 Listing HDFS Files Lecture 70 Managing HDFS Directories Lecture 71 Copying files from local to HDFS Lecture 72 Copying files from HDFS to local Lecture 73 Getting File Metadata Lecture 74 Previewing Data in HDFS File Lecture 75 HDFS Block Size Lecture 76 HDFS Replication Factor Lecture 77 Getting HDFS Storage Usage Lecture 78 Using HDFS Stat Commands Lecture 79 HDFS File Permissions Lecture 80 Overriding Properties Section 7: Apache Spark 2.x - Data processing - Getting Started Lecture 81 Introduction Lecture 82 Review of Setup Steps for Spark Environment Lecture 83 Using ITVersity labs Lecture 84 Apache Spark Official Documentation (Very Important) Lecture 85 Quick Review of Spark APIs Lecture 86 Spark Modules Lecture 87 Spark Data Structures - RDDs and Data Frames Lecture 88 Develop Simple Application Lecture 89 Apache Spark - Framework Lecture 90 Create Data Frames from Text Files Lecture 91 Create Data Frames from Hive Tables Section 8: Apache Spark using SQL - Getting Started Lecture 92 Getting Started - Overview Lecture 93 Overview of Spark Documentation Lecture 94 Launching and using Spark SQL CLI Lecture 95 Overview of Spark SQL Properties Lecture 96 Running OS Commands using Spark SQL Lecture 97 Understanding Spark Metastore Warehouse Directory Lecture 98 Managing Spark Metastore Databases using Spark SQL Lecture 99 Managing Spark Metastore Tables using Spark SQL Lecture 100 Retrieve Metadata of Spark Metastore Tables using Spark SQL Describe Command Lecture 101 Role of Spark Metastore or Hive Metastore Lecture 102 Exercise - Getting Started with Spark SQL Section 9: Apache Spark using SQL - Basic Transformations using Spark SQL Lecture 103 Basic Transformations using Spark SQL - Introduction Lecture 104 Spark SQL - Overview Lecture 105 Define Problem Statement Lecture 106 Prepare Spark Metastore Tables for Basic Transformations using Spark SQL Lecture 107 Projecting Data using Spark SQL Select Clause Lecture 108 Filtering Data using Spark SQL Where Clause Lecture 109 Joining Tables using Spark SQL - Inner Lecture 110 Joining Tables using Spark SQL - Outer Lecture 111 Aggregating Data using Group By in Spark SQL Lecture 112 Sorting Data using Order By in Spark SQL Lecture 113 Conclusion - Final Solution for the problem statement using Spark SQL Section 10: Apache Spark using SQL - Basic DDL and DML Lecture 114 Introduction to Basic DDL and DML in Spark SQL Lecture 115 Create Spark Metastore Tables using Spark SQL Create Statement Lecture 116 Overview of Data Types used in Spark Metastore Tables Lecture 117 Adding Comments to Spark Metastore Tables using Spark SQL Lecture 118 Loading Data from Local File System Into Tables using Spark SQL Load Statement Lecture 119 Loading Data from HDFS Folders Into Tables using Spark SQL Load Statement Lecture 120 Difference between Load with Append and Overwrite using Spark SQL Load Statement Lecture 121 Creating External Spark Metastore Tables using Spark SQL Lecture 122 Difference between Managed and External Spark Metastore Tables Lecture 123 Overview of File Formats used in Spark Metastore Tables Lecture 124 Drop Spark Metastore Tables and Databases using Spark SQL Lecture 125 Truncating Spark Metastore Tables Lecture 126 Exercise - Managed Spark Metastore Tables Section 11: Apache Spark using SQL - DML and Partitioning Lecture 127 Introduction to DML and Partitioning using Spark SQL on Spark Metastore Tables Lecture 128 Introduction to Partitioning of Spark Metastore Tables using Spark SQL Lecture 129 Creating Spark Metastore Tables using Parquet File Format Lecture 130 Difference between Load and Insert to get data into Spark Metastore Tables Lecture 131 Inserting Data using Stage Table leveraging Spark SQL Lecture 132 Creating Spark Metastore Partitioned Tables using Spark SQL Lecture 133 Adding Partitions to Spark Metastore Tables using Spark SQL Lecture 134 Loading Data into Spark Metastore Partitioned Tables using Spark SQL Lecture 135 Inserting Data into Spark Metastore Partitions using Spark SQL Insert Statement Lecture 136 Using Dynamic Partition Mode while inserting into Spark Partitioned Tables Lecture 137 Exercise - Partitioned Tables using Spark SQL Section 12: Apache Spark using SQL - Pre-defined Functions Lecture 138 Introduction - Overview of Spark SQL Pre-defined Functions Lecture 139 Overview of Spark SQL Pre-defined Functions Lecture 140 Validating Spark SQL Functions Lecture 141 String Manipulation using Spark SQL Functions Lecture 142 Date Manipulation using Spark SQL Functions Lecture 143 Overview of Numeric Functions in Spark SQL Lecture 144 Data Type Conversion using Spark SQL Lecture 145 Dealing with Nulls using Spark SQL Lecture 146 Using CASE and WHEN in Spark SQL Queries Lecture 147 Query Example - Word Count using Spark SQL Section 13: Apache Spark SQL - Windowing Functions Lecture 148 Introduction to Windowing Functions in Spark SQL Lecture 149 Prepare HR Database for Windowing Functions in Spark SQL Lecture 150 Overview of Windowing Functions using Spark SQL Lecture 151 Aggregations using Spark SQL Windowing Functions Lecture 152 Using LEAD or LAG in Spark SQL Windowing Functions Lecture 153 Getting first and last values using Spark SQL Windowing Functions Lecture 154 Ranking using Spark SQL Windowing Functions - rank, dense_rank and row_number Lecture 155 Order of execution of Spark SQL Queries Lecture 156 Overview of Subqueries in Spark SQL Lecture 157 Filtering Window Function Results using Spark SQL Section 14: Apache Spark using Python - Data Processing Overview Lecture 158 Starting Spark Context - pyspark Lecture 159 Overview of Spark Read APIs Lecture 160 Understanding airlines data Lecture 161 Inferring Schema using Spark Data Frame APIs Lecture 162 Previewing Airlines Data using Spark Data Frame APIs Lecture 163 Overview of Data Frame APIs Lecture 164 Overview of Functions on Spark Data Frames Lecture 165 Overview of Spark Write APIs Section 15: Apache Spark using Python - Processing Column Data Lecture 166 Overview of Predefined Functions on Spark Data Frame Columns Lecture 167 Create Dummy Data Frame to explore Functions on Data Frame Columns Lecture 168 Categories of Predefined Functions used on Spark Data Frame Columns Lecture 169 Special Functions for Spark Data Frame Columns - col and lit Lecture 170 Common String Manipulation Functions for Spark Data Frame Columns Lecture 171 Extracting Strings using substring from Spark Data Frame Columns Lecture 172 Extracting Strings using split from Spark Data Frame Columns Lecture 173 Padding Characters around Strings in Spark Data Frame Columns Lecture 174 Trimming Characters from Strings in Spark Data Frame Columns Lecture 175 Date and Time Manipulation Functions for Spark Data Frame Columns Lecture 176 Date and Time Arithmetic on Spark Data Frame Columns Lecture 177 Using Date and Time Trunc Functions on Spark Data Frame Columns Lecture 178 Date and Time Extract Functions for Spark Data Frame Columns Lecture 179 Using to_date and to_timestamp on Spark Data Frame Columns Lecture 180 Using date_format Function on Spark Data Frame Columns Lecture 181 Dealing with Unix Timestamp in Spark Data Frame Columns Lecture 182 Dealing with Nulls in Spark Data Frame Columns Lecture 183 Using CASE and WHEN on Spark Data Frame Columns Section 16: Apache Spark using Python - Basic Transformations Lecture 184 Overview of Basic Transformations on Spark Data Frames Lecture 185 Spark Data Frames for basic transformations Lecture 186 Basic Filtering of Data or rows using where from Spark Data Frames Lecture 187 Filtering Example using dates on Spark Data Frames Lecture 188 Boolean Operators while filtering from Spark Data Frames Lecture 189 Using IN Operator or isin Function while filtering from Spark Data Frames Lecture 190 Using LIKE Operator or like Function while filtering from Spark Data Frames Lecture 191 Using BETWEEN Operator while filtering from Spark Data Frames Lecture 192 Dealing with Nulls while Filtering from Spark Data Frames Lecture 193 Total Aggregations on Spark Data Frames Lecture 194 Aggregate data using groupBy from Spark Data Frames Lecture 195 Aggregate data using rollup on Spark Data Frames Lecture 196 Aggregate data using cube on Spark Data Frames Lecture 197 Overview of Sorting Spark Data Frames Lecture 198 Solution - Problem 1 - Get Total Aggregations Lecture 199 Solution - Problem 2 - Get Total Aggregations By FlightDate Section 17: Apache Spark using Python - Joining Data Sets Lecture 200 Prepare Datasets for Joining Spark Data Frames Lecture 201 Analyze Datasets for Joining Spark Data Frames Lecture 202 Problem Statements for Joining Spark Data Frames Lecture 203 Overview of Joins on Spark Data Frames Lecture 204 Using Inner Joins on Spark Data Frames Lecture 205 Left or Right Outer Join on Spark Data Frames Lecture 206 Solution - Get Flight Count Per US Airport using Spark Data Frame APIs Lecture 207 Solution - Get Flight Count Per US State using Spark Data Frame APIs Lecture 208 Solution - Get Dormant US Airports using Spark Data Frame APIs Lecture 209 Solution - Get Origins without master data using Spark Data Frame APIs Lecture 210 Solution - Get Count of Flights without master data using Spark Data Frame APIs Lecture 211 Solution - Get Count of Flights per Airport without master data Lecture 212 Solution - Get Daily Revenue using Spark Data Frame APIs Lecture 213 Solution - Get Daily Revenue rolled up till Yearly using Spark Data Frame APIs Section 18: Apache Spark using Python - Spark Metastore Lecture 214 Overview of APIs to deal with Spark Metastore Lecture 215 Exploring Spark Catalog Lecture 216 Creating Spark Metastore Tables using catalog Lecture 217 Inferring Schema while creating Spark Metastore Tables using Spark Catalog Lecture 218 Define Schema for Spark Metastore Tables using StructType Lecture 219 Inserting into Existing Spark Metastore Tables using Spark Data Frame APIs Lecture 220 Read and Process data from Spark Metastore Tables using Data Frame APIs Lecture 221 Create Spark Metastore Partitioned Tables using Data Frame APIs Lecture 222 Saving as Spark Metastore Partitioned Table using Data Frame APIs Lecture 223 Creating Temporary Views on top of Spark Data Frames Lecture 224 Using Spark SQL against Temporary Views on Spark Data Frames Section 19: Getting Started with Semi Structured Data using Spark Lecture 225 Introduction to Getting Started with Semi Structured Data using Spark Lecture 226 Create Spark Metastore Table with Special Data Types Lecture 227 Overview of ARRAY Type in Spark Metastore Table Lecture 228 Overview of MAP and STRUCT Type in Spark Metastore Table Lecture 229 Insert Data into Spark Metastore Table with Special Type Columns Lecture 230 Create Spark Data Frame with Special Data Types Lecture 231 Create Spark Data Frame with Special Types using Python List Lecture 232 Insert Spark Data Frame with Special Types into Spark Metastore Table Lecture 233 Review Data in the JSON File with Special Data Types Lecture 234 Setup JSON Data Set to explore Spark APIs on Special Data Type Columns Lecture 235 Read JSON Data with Special Types into Spark Data Frame Lecture 236 Flatten Array Fields in Spark Data Frames using explode and explode_outer Lecture 237 Get Size or Length of Array Type Columns in Spark Data Frame Lecture 238 Concatenate Array Values into Delimited String using Spark APIs Lecture 239 Convert Delimited Strings from Spark Data Frame Columns to Arrays Lecture 240 Setup Data Sets to Build Arrays using Spark.cmproj Lecture 241 Read JSON Data into Spark Data Frame and Review Aggregate Operations Lecture 242 Build Arrays from Flattened Rows of Spark Data Frame Lecture 243 Getting Started with Spark Data Frames with Struct Columns Lecture 244 Concatenate Struct Column Values in Spark Data Frame Lecture 245 Filter Data on Struct Column Attributes in Spark Data Frame Lecture 246 Create Spark Data Frame using Map Type Column Lecture 247 Project Map Values as Columns using Spark Data Frame APIs Lecture 248 Conclusion of Getting Started with Semi Structured Data using Spark Section 20: Process Semi Structured Data using Spark Data Frame APIs Lecture 249 Introduction to Process Semi Structured Data using Spark Data Frame APIs Lecture 250 Review the Data Sets to generate denormalized JSON Data using Spark Lecture 251 Setup JSON Data Sets in HDFS using HDFS Command Lecture 252 Create Spark Data Frames using Data Frame APIs Lecture 253 Join Orders and Order Items using Spark Data Frame APIs Lecture 254 Generate Struct Field for Order Details using Spark Lecture 255 Generate Array of Struct Field for Order Details using Spark Lecture 256 Join Data Sets to generate denormalized JSON Data using Spark Lecture 257 Denormalize Join Results using Spark Data Frame APIs Lecture 258 Write Denormalized Customer Details to JSON Files using Spark Lecture 259 Publish JSON Files for downstream applications Lecture 260 Read Denormalized Data into Spark Data Frame Lecture 261 Filter Denormalized Data Frame using Spark APIs Lecture 262 Perform Aggregations on Denormalized Data Frame using Spark Lecture 263 Flatten Semi Structured Data or Denormalized Data using Spark Lecture 264 Compute Monthly Customer Revenue using Spark on Denormalized Data Lecture 265 Conclusion of Processing Semi Structured Data using Spark Data Frame APIs Section 21: Apache Spark - Application Development Life Cycle Lecture 266 Setup Virtual Environment and Install Pyspark Lecture 267 Getting Started with Pycharm Lecture 268 Passing Run Time Arguments Lecture 269 Accessing OS Environment Variables Lecture 270 Getting Started with Spark Lecture 271 Create Function for Spark Session Lecture 272 Setup Sample Data Lecture 273 Read data from files Lecture 274 Process data using Spark APIs Lecture 275 Write data to files Lecture 276 Validating Writing Data to Files Lecture 277 Productionizing the Code Lecture 278 Setting up Data for Production Validation Lecture 279 Running the application using YARN Lecture 280 Detailed Validation of the Application Section 22: Spark Application Execution Life Cycle and Spark UI Lecture 281 Deploying and Monitoring Spark Applications - Introduction Lecture 282 Overview of Types of Spark Cluster Managers Lecture 283 Setup EMR Cluster with Hadoop and Spark Lecture 284 Overall Capacity of Big Data Cluster with Hadoop and Spark Lecture 285 Understanding YARN Capacity of an Enterprise Cluster Lecture 286 Overview of Hadoop HDFS and YARN Setup on Multi-node Cluster Lecture 287 Overview of Spark Setup on top of Hadoop Lecture 288 Setup Data Set for Word Count application Lecture 289 Develop Word Count Application Lecture 290 Review Deployment Process of Spark Application Lecture 291 Overview of Spark Submit Command Lecture 292 Switch between Python Versions to run Spark Applications or launch Pyspark CLI Lecture 293 Switch between Pyspark Versions to run Spark Applications or launch Pyspark CLI Lecture 294 Review Spark Configuration Properties at Run Time Lecture 295 Develop Shell Script to run Spark Application Lecture 296 Run Spark Application and review default executors Lecture 297 Overview of Spark History Server UI Section 23: Setup SSH Proxy to access Spark Application logs Lecture 298 Setup SSH Proxy to access Spark Application logs - Introduction Lecture 299 Overview of Private and Public ips of servers in the cluster Lecture 300 Overview of SSH Proxy Lecture 301 Setup sshuttle on Mac or Linux Lecture 302 Proxy using sshuttle on Mac or Linux Lecture 303 Accessing Spark Application logs via SSH Proxy using sshuttle on Mac or Linux Lecture 304 Side effects of using SSH Proxy to access Spark Application Logs Lecture 305 Steps to setup SSH Proxy on Windows to access Spark Application Logs Lecture 306 Setup PuTTY and PuTTYgen on Windows Lecture 307 Quick Tour of PuTTY on Windows Lecture 308 Configure Passwordless Login using PuTTYGen Keys on Windows Lecture 309 Run Spark Application on Gateway Node using PuTTY Lecture 310 Configure Tunnel to Gateway Node using PuTTY on Windows for SSH Proxy Lecture 311 Setup Proxy on Windows and validate using Microsoft Edge browser Lecture 312 Understanding Proxying Network Traffic overcoming Windows Caveats Lecture 313 Update Hosts file for worker nodes using private ips Lecture 314 Access Spark Application logs using SSH Proxy Lecture 315 Overview of performing tasks related to Spark Applications using Mac Section 24: Deployment Modes of Spark Applications Lecture 316 Deployment Modes of Spark Applications - Introduction Lecture 317 Default Execution Master Type for Spark Applications Lecture 318 Launch Pyspark using local mode Lecture 319 Running Spark Applications using Local Mode Lecture 320 Overview of Spark CLI Commands such as Pyspark Lecture 321 Accessing Local Files using Spark CLI or Spark Applications Lecture 322 Overview of submitting spark application using client deployment mode Lecture 323 Overview of submitting spark application using cluster deployment mode Lecture 324 Review the default logging while submitting Spark Applications Lecture 325 Changing Spark Application Log Level using custom log4j properties Lecture 326 Submit Spark Application using client mode with log level info Lecture 327 Submit Spark Application using cluster mode with log level info Lecture 328 Submit Spark Applications using SPARK_CONF_DIR with custom properties files Lecture 329 Submit Spark Applications using Properties File Section 25: Passing Application Properties Files and External Dependencies Lecture 330 Passing Application Properties Files and External Dependencies - Introduction Lecture 331 Steps to pass application properties using JSON Lecture 332 Setup Working Directory to pass application properties using JSON Lecture 333 Build the JSON with Application Properties Lecture 334 Explore APIs to process JSON Data using Pyspark Lecture 335 Refactor the Spark Application Code to use properties from JSON Lecture 336 Pass Application Properties to Spark Application using local files in client mod Lecture 337 Pass Application Properties to Spark Application using local files in cluster mo Lecture 338 Pass Application Properties to Spark Application using HDFS files Lecture 339 Steps to pass external Python Libraries using pyfiles Lecture 340 Create required YAML File to externalize application properties Lecture 341 Install PyYAML into specific folder and build zip Lecture 342 Explore APIs to process YAML Data using Pyspark Lecture 343 Refactor the Spark Application Code to use properties from YAML Lecture 344 Pass External Dependencies to Spark Application using local files in client mode Lecture 345 Pass External Dependencies to Spark Apps using local files in cluster mode Lecture 346 Pass External Dependencies to Spark Application using HDFS files Any IT aspirant/professional willing to learn Data Engineering using Apache Spark,Python Developers who want to learn Spark to add the key skill to be a Data Engineer,Scala based Data Engineers who would like to learn Spark using Python as Programming Language Homepage Hidden Content Give reaction to this post to see the hidden content. Hidden Content Give reaction to this post to see the hidden content. Hidden Content Give reaction to this post to see the hidden content.
Recommended Posts
Please sign in to comment
You will be able to leave a comment after signing in
Sign In Now