DATA ENGINEERING

Categories: Management

About Course

Languages SQL, Python/R
Cloud Platforms – AWS/AZURE/GCP
Databases – Relational, Non-Relational data sources and BIG data
ETL/ELT Pipelines
Tools Power BI/SNOWFLAKE/SPARK
Operating Systems – UNIX/LINUX with shell scripting

Faculty having 18 years of experience in TOP MNCs on Banking, Insurance, Sales etc. domains
In-depth Training with real time use cases
Two projects
Practical Lab for every concept
Internship facility
Interview Preparation
Work Experience
100% Placement Support
Job Support after placement

Topics of Course

DATA ENGINEERING

BigData and Distributed File System
What is BigData?
How to Store and Process BigData
Introduction to Distributed File System
Concepts in Distributed File System
How Distributed File System gives Scalability
Other distributed File Systems
Hadoop
What is Hadoop?
How Hadoop scales in storage and processing
What is MapReduce?
Example of MapReduce
Problems with MapReduce
What is Hdfs?
Hdfs Commands
How Spark resolve the problems of Hadoop
Hadoop 1.0 Vs Hadoop 2.0 Vs Hadoop 3.0
What is Yarn?
How Hadoop EcoSystem is Transformed using Spark
Cluster Mangers for Spark
Changes in architecture Hadoop 1.0 and Hadoop 2.0 and Hadoop 3.0
Spark
What is Spark?
How Spark Was Developed
How Spark Process the Data
How Spark unifies the different Big Data Processing Systems
MapReduce Vs Spark
MPP Vs Distributed Processing
Other Parallel processing frameworks
Python
Basics of Python
Functions in Pythons
Control Structures in python
Object Oriented Programming
File IO
Container Classes in Python – ( Dict, Tuple, Set, List)
Decorators in Python
Core Spark with Pyspark
Installation of spark in windows
MapReduce Word Count Vs Spark Word Count
What is Spark Context?
What is Spark Session?
What are RDDS
Operations on RDDS
What are Transformations
What are Actions
What is DAG?
What is JOB?
What are Stages
What are tasks
What are Containers
Different Transformations
Wider Transformations
Narrow Transformations
What are shared variables
Broadcast variables
Accumulators
Different types of persistence in spark
Persistence v/s Cache
Examples of Different wider and Narrow Transformations
Exercises on Spark RDDs
Spark SQL- DataFrames
What is DataFrame?
RDD v/s DataFrame
What is catalyst engine
How create DataFrames
Creating DataFrames with Schema and without schema
Creating DataFrames from different file formats
Creating Hive Tables from DataFrame
What are managed Hive Tables
What are unmanaged Hive Tables
What is Bucketing in Hive?
What is partitioning in Hive?
Query plans in SQL
How to Optimize the SQL Query
How to change the different configurations for Spark Job
Spark Bucketing vs Hive Bucketing
New Features for Spark SQL in Spark 3.0
Adaptive Query Execution
Dynamic partition pruning
creating complex datafarmes
Performing different types of SQL queries on DataFrames
SQL operators on DataFrames
Exercises on Spark DataFrames
Window Operations on Data Frames
Row wise ordering and ranking functions
Simple aggregation functions
Creating lagged columns
Cumulative Calculations (Running totals and averages)
Combining Windows and Calling Different Columns
Spark SQL- DataSets
What is DataSet?
DataFrame vs DataSet
Why Python is Not having DataSets
Creation of Dataset in Scala
RDD vs DataFrame vs DataSets
Demonstration Of DataSet In Scala
Legacy Spark Streaming
What is Streaming?
Batch vs Streaming
Examples of Streaming Data
Reading Streaming data in Spark
What is Micro Batch Intervel?
What are DStreams
What are Statefull and Stateless
How to apply external RDD on DStreams
What are Window Operations
What is Sliding Intervel?
What is Window interval?
What is Tumbling Interval?
Exercises on Spark Streaming
Structured Streaming
Dstream Vs Structured Stream
Sources of Structured Stream
Sinks of Structured Stream
Unbounded Tables
Output Modes
Structured Streaming with Window Processing
Handling of Late Data in Structured Streaming
Basics of Kafka
What are the real difference between existing message queues and Kafka?
What is Topic?
What is Broker?
What is Partition?
What are producer and Consumer?
How is Kafka cluster formed?
How topic is stored across multiple brokers with partitions
What is Leader, how leader of the partition works
Sample programs with console producer and consumer
How to configure Kafka on AWS and Reading Data in DataBricks Community Edition
Job Optimization
What is Partition?
How to change the number of partitions
How to enable the different joins in spark using configurations
Sort Merge Join
Broad Cast Join
Hash Shuffle Join
Real Time project explanation.

DATA ENGINEERING

DATA ENGINEERING

About Course

What Will You Learn?

Topics of Course

DATA ENGINEERING

BigData and Distributed File System
What is BigData?
How to Store and Process BigData
Introduction to Distributed File System
Concepts in Distributed File System
How Distributed File System gives Scalability
Other distributed File Systems

Spark
What is Spark?
How Spark Was Developed
How Spark Process the Data
How Spark unifies the different Big Data Processing Systems
MapReduce Vs Spark
MPP Vs Distributed Processing
Other Parallel processing frameworks

Python
Basics of Python
Functions in Pythons
Control Structures in python
Object Oriented Programming
File IO
Container Classes in Python – ( Dict, Tuple, Set, List)
Decorators in Python

Spark SQL- DataSets
What is DataSet?
DataFrame vs DataSet
Why Python is Not having DataSets
Creation of Dataset in Scala
RDD vs DataFrame vs DataSets
Demonstration Of DataSet In Scala

Structured Streaming
Dstream Vs Structured Stream
Sources of Structured Stream
Sinks of Structured Stream
Unbounded Tables
Output Modes
Structured Streaming with Window Processing
Handling of Late Data in Structured Streaming

Job Optimization
What is Partition?
How to change the number of partitions
How to enable the different joins in spark using configurations
Sort Merge Join
Broad Cast Join
Hash Shuffle Join
Real Time project explanation.

Your Instructor

netaadmin

DATA ENGINEERING

About Course

What Will You Learn?

Topics of Course

DATA ENGINEERING

BigData and Distributed File System What is BigData?How to Store and Process BigDataIntroduction to Distributed File SystemConcepts in Distributed File SystemHow Distributed File System gives ScalabilityOther distributed File Systems

Spark What is Spark?How Spark Was DevelopedHow Spark Process the DataHow Spark unifies the different Big Data Processing SystemsMapReduce Vs SparkMPP Vs Distributed Processing Other Parallel processing frameworks

Python Basics of PythonFunctions in PythonsControl Structures in pythonObject Oriented Programming File IOContainer Classes in Python – ( Dict, Tuple, Set, List)Decorators in Python

Spark SQL- DataSets What is DataSet?DataFrame vs DataSetWhy Python is Not having DataSetsCreation of Dataset in ScalaRDD vs DataFrame vs DataSetsDemonstration Of DataSet In Scala

Structured Streaming Dstream Vs Structured StreamSources of Structured StreamSinks of Structured StreamUnbounded TablesOutput ModesStructured Streaming with Window ProcessingHandling of Late Data in Structured Streaming

Job Optimization What is Partition?How to change the number of partitionsHow to enable the different joins in spark using configurationsSort Merge JoinBroad Cast JoinHash Shuffle JoinReal Time project explanation.

Your Instructor

netaadmin

Related Courses You may Like

AI & ML

DATA SCIENCE

EMBEDDED SYSTEM WITH AI & ML

BigData and Distributed File System
What is BigData?
How to Store and Process BigData
Introduction to Distributed File System
Concepts in Distributed File System
How Distributed File System gives Scalability
Other distributed File Systems

Spark
What is Spark?
How Spark Was Developed
How Spark Process the Data
How Spark unifies the different Big Data Processing Systems
MapReduce Vs Spark
MPP Vs Distributed Processing
Other Parallel processing frameworks

Python
Basics of Python
Functions in Pythons
Control Structures in python
Object Oriented Programming
File IO
Container Classes in Python – ( Dict, Tuple, Set, List)
Decorators in Python

Spark SQL- DataSets
What is DataSet?
DataFrame vs DataSet
Why Python is Not having DataSets
Creation of Dataset in Scala
RDD vs DataFrame vs DataSets
Demonstration Of DataSet In Scala

Structured Streaming
Dstream Vs Structured Stream
Sources of Structured Stream
Sinks of Structured Stream
Unbounded Tables
Output Modes
Structured Streaming with Window Processing
Handling of Late Data in Structured Streaming

Job Optimization
What is Partition?
How to change the number of partitions
How to enable the different joins in spark using configurations
Sort Merge Join
Broad Cast Join
Hash Shuffle Join
Real Time project explanation.