Are you interested in Joining program? Contact me.
Languages SQL, Python/R
Cloud Platforms – AWS/AZURE/GCP
Databases – Relational, Non-Relational data sources and BIG data
ETL/ELT Pipelines
Tools Power BI/SNOWFLAKE/SPARK
Operating Systems – UNIX/LINUX with shell scripting
Faculty having 18 years of experience in TOP MNCs on Banking, Insurance, Sales etc. domains
In-depth Training with real time use cases
Two projects
Practical Lab for every concept
Internship facility
Interview Preparation
Work Experience
100% Placement Support
Job Support after placement
What is BigData?
How to Store and Process BigData
Introduction to Distributed File System
Concepts in Distributed File System
How Distributed File System gives Scalability
Other distributed File Systems
What is Hadoop?
How Hadoop scales in storage and processing
What is MapReduce?
Example of MapReduce
Problems with MapReduce
What is Hdfs?
Hdfs Commands
How Spark resolve the problems of Hadoop
Hadoop 1.0 Vs Hadoop 2.0 Vs Hadoop 3.0
What is Yarn?
How Hadoop EcoSystem is Transformed using Spark
Cluster Mangers for Spark
Changes in architecture Hadoop 1.0 and Hadoop 2.0 and Hadoop 3.0
What is Spark?
How Spark Was Developed
How Spark Process the Data
How Spark unifies the different Big Data Processing Systems
MapReduce Vs Spark
MPP Vs Distributed Processing
Other Parallel processing frameworks
Basics of Python
Functions in Pythons
Control Structures in python
Object Oriented Programming
File IO
Container Classes in Python – ( Dict, Tuple, Set, List)
Decorators in Python
Installation of spark in windows
MapReduce Word Count Vs Spark Word Count
What is Spark Context?
What is Spark Session?
What are RDDS
Operations on RDDS
What are Transformations
What are Actions
What is DAG?
What is JOB?
What are Stages
What are tasks
What are Containers
Different Transformations
Wider Transformations
Narrow Transformations
What are shared variables
Broadcast variables
Accumulators
Different types of persistence in spark
Persistence v/s Cache
Examples of Different wider and Narrow Transformations
Exercises on Spark RDDs
What is DataFrame?
RDD v/s DataFrame
What is catalyst engine
How create DataFrames
Creating DataFrames with Schema and without schema
Creating DataFrames from different file formats
Creating Hive Tables from DataFrame
What are managed Hive Tables
What are unmanaged Hive Tables
What is Bucketing in Hive?
What is partitioning in Hive?
Query plans in SQL
How to Optimize the SQL Query
How to change the different configurations for Spark Job
Spark Bucketing vs Hive Bucketing
New Features for Spark SQL in Spark 3.0
Adaptive Query Execution
Dynamic partition pruning
creating complex datafarmes
Performing different types of SQL queries on DataFrames
SQL operators on DataFrames
Exercises on Spark DataFrames
Window Operations on Data Frames
Row wise ordering and ranking functions
Simple aggregation functions
Creating lagged columns
Cumulative Calculations (Running totals and averages)
Combining Windows and Calling Different Columns
What is DataSet?
DataFrame vs DataSet
Why Python is Not having DataSets
Creation of Dataset in Scala
RDD vs DataFrame vs DataSets
Demonstration Of DataSet In Scala
What is Streaming?
Batch vs Streaming
Examples of Streaming Data
Reading Streaming data in Spark
What is Micro Batch Intervel?
What are DStreams
What are Statefull and Stateless
How to apply external RDD on DStreams
What are Window Operations
What is Sliding Intervel?
What is Window interval?
What is Tumbling Interval?
Exercises on Spark Streaming
Dstream Vs Structured Stream
Sources of Structured Stream
Sinks of Structured Stream
Unbounded Tables
Output Modes
Structured Streaming with Window Processing
Handling of Late Data in Structured Streaming
What are the real difference between existing message queues and Kafka?
What is Topic?
What is Broker?
What is Partition?
What are producer and Consumer?
How is Kafka cluster formed?
How topic is stored across multiple brokers with partitions
What is Leader, how leader of the partition works
Sample programs with console producer and consumer
How to configure Kafka on AWS and Reading Data in DataBricks Community Edition
What is Partition?
How to change the number of partitions
How to enable the different joins in spark using configurations
Sort Merge Join
Broad Cast Join
Hash Shuffle Join
Real Time project explanation.