Build Data Engineering Pipelines using SQL, Python and Spark


  • Laptop with decent configuration (Minimum 4 GB RAM and Dual Core)
  • Free Sign up for GCP with the available credit
  • CS or IT degree or prior IT experience is highly desired


As part of this course, you will learn all the Data Engineering Essentials related to building Data Pipelines using SQL, Python as well as Spark.

About Data Engineering

Data Engineering is nothing but processing the data depending up on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development etc.

Course Details

As part of this course, you will be learning Data Engineering Essentials such as SQL, Programming using Python and Spark. Here is the detailed agenda for the course.

  • Database Essentials – SQL using Postgres
    • Getting Started with Postgres
    • Basic Database Operations (CRUD or Insert, Update, Delete)
    • Writing Basic SQL Queries (Filtering, Joins and Aggregations)
    • Creating Tables and Indexes
    • Partitioning Tables and Indexes
    • Predefined Functions (String Manipulation, Date Manipulation and other functions)
    • Writing Advanced SQL Queries
  • Programming Essentials using Python
    • Perform Database Operations
    • Getting Started with Python
    • Basic Programming Constructs
    • Predefined Functions
    • Overview of Collections – list and set
    • Overview of Collections – dict and tuple
    • Manipulating Collections using loops
    • Understanding Map Reduce Libraries
    • Overview of Pandas Libraries
    • Database Programming – CRUD Operations
    • Database Programming – Batch Operations
  • Setting up Single Node Cluster for Practice
    • Setup Single Node Hadoop Cluster
    • Setup Hive and Spark on Single Node Cluster
  • Introduction to Hadoop eco system
    • Overview of HDFS Commands
  • Data Engineering using Spark SQL
    • Getting Started with Spark SQL
    • Basic Transformations
    • Managing Tables – Basic DDL and DML
    • Managing Tables – DML and Partitioning
    • Overview of Spark SQL Functions
    • Windowing Functions
  • Data Engineering using Spark Data Frame APIs
    • Data Processing Overview
    • Processing Column Data
    • Basic Transformations – Filtering, Aggregations and Sorting
    • Joining Data Sets
    • Windowing Functions – Aggregations, Ranking and Analytic Functions
    • Spark Metastore Databases and Tables

Who this course is for:

  • Computer Science or IT Students or other graduates with passion to get into IT
  • Data Warehouse Developers who want to transition to Data Engineering roles
  • ETL Developers who want to transition to Data Engineering roles
  • Database or PL/SQL Developers who want to transition to Data Engineering roles
  • BI Developers who want to transition to Data Engineering roles
  • QA Engineers to learn about Data Engineering
  • Application Developers to gain Data Engineering Skills

Leave a Reply