Are you interested in Joining program? Contact me.

DATA ENGINEERING

Categories: Management
Wishlist Share
Share Course
Page Link
Share On Social Media

About Course

Languages SQL, Python/R
Cloud Platforms – AWS/AZURE/GCP
Databases – Relational, Non-Relational data sources and BIG data
ETL/ELT Pipelines
Tools Power BI/SNOWFLAKE/SPARK
Operating Systems – UNIX/LINUX with shell scripting

Faculty having 18 years of experience in TOP MNCs on Banking, Insurance, Sales etc. domains
In-depth Training with real time use cases
Two projects
Practical Lab for every concept
Internship facility
Interview Preparation
Work Experience
100% Placement Support
Job Support after placement

What Will You Learn?

  • Use Blender and understand it's interface
  • Understand the principles of modelling
  • Create 3D models with simple colors
  • Learn the basics of animation
  • Explore and have fun with particle effects
  • Create your own materials

Topics of Course

DATA ENGINEERING

  • BigData and Distributed File System

    What is BigData?
    How to Store and Process BigData
    Introduction to Distributed File System
    Concepts in Distributed File System
    How Distributed File System gives Scalability
    Other distributed File Systems

  • Hadoop

    What is Hadoop?
    How Hadoop scales in storage and processing
    What is MapReduce?
    Example of MapReduce
    Problems with MapReduce
    What is Hdfs?
    Hdfs Commands
    How Spark resolve the problems of Hadoop
    Hadoop 1.0 Vs Hadoop 2.0 Vs Hadoop 3.0
    What is Yarn?
    How Hadoop EcoSystem is Transformed using Spark
    Cluster Mangers for Spark
    Changes in architecture Hadoop 1.0 and Hadoop 2.0 and Hadoop 3.0

     

  • Spark

    What is Spark?
    How Spark Was Developed
    How Spark Process the Data
    How Spark unifies the different Big Data Processing Systems
    MapReduce Vs Spark
    MPP Vs Distributed Processing
    Other Parallel processing frameworks

     

  • Python

    Basics of Python
    Functions in Pythons
    Control Structures in python
    Object Oriented Programming
    File IO
    Container Classes in Python – ( Dict, Tuple, Set, List)
    Decorators in Python

  • Core Spark with Pyspark

    Installation of spark in windows
    MapReduce Word Count Vs Spark Word Count
    What is Spark Context?
    What is Spark Session?
    What are RDDS
    Operations on RDDS
    What are Transformations
    What are Actions
    What is DAG?
    What is JOB?
    What are Stages
    What are tasks
    What are Containers
    Different Transformations
    Wider Transformations
    Narrow Transformations
    What are shared variables
    Broadcast variables
    Accumulators
    Different types of persistence in spark
    Persistence v/s Cache
    Examples of Different wider and Narrow Transformations
    Exercises on Spark RDDs

  • Spark SQL- DataFrames

    What is DataFrame?
    RDD v/s DataFrame
    What is catalyst engine
    How create DataFrames
    Creating DataFrames with Schema and without schema
    Creating DataFrames from different file formats
    Creating Hive Tables from DataFrame
    What are managed Hive Tables
    What are unmanaged Hive Tables
    What is Bucketing in Hive?
    What is partitioning in Hive?
    Query plans in SQL
    How to Optimize the SQL Query
    How to change the different configurations for Spark Job
    Spark Bucketing vs Hive Bucketing
    New Features for Spark SQL in Spark 3.0
    Adaptive Query Execution
    Dynamic partition pruning
    creating complex datafarmes
    Performing different types of SQL queries on DataFrames
    SQL operators on DataFrames
    Exercises on Spark DataFrames
    Window Operations on Data Frames
    Row wise ordering and ranking functions
    Simple aggregation functions
    Creating lagged columns
    Cumulative Calculations (Running totals and averages)
    Combining Windows and Calling Different Columns

     

  • Spark SQL- DataSets

    What is DataSet?
    DataFrame vs DataSet
    Why Python is Not having DataSets
    Creation of Dataset in Scala
    RDD vs DataFrame vs DataSets
    Demonstration Of DataSet In Scala

  • Legacy Spark Streaming

    What is Streaming?
    Batch vs Streaming
    Examples of Streaming Data
    Reading Streaming data in Spark
    What is Micro Batch Intervel?
    What are DStreams
    What are Statefull and Stateless
    How to apply external RDD on DStreams
    What are Window Operations
    What is Sliding Intervel?
    What is Window interval?
    What is Tumbling Interval?
    Exercises on Spark Streaming

  • Structured Streaming

    Dstream Vs Structured Stream
    Sources of Structured Stream
    Sinks of Structured Stream
    Unbounded Tables
    Output Modes
    Structured Streaming with Window Processing
    Handling of Late Data in Structured Streaming

  • Basics of Kafka

    What are the real difference between existing message queues and Kafka?
    What is Topic?
    What is Broker?
    What is Partition?
    What are producer and Consumer?
    How is Kafka cluster formed?
    How topic is stored across multiple brokers with partitions
    What is Leader, how leader of the partition works
    Sample programs with console producer and consumer
    How to configure Kafka on AWS and Reading Data in DataBricks Community Edition

  • Job Optimization

    What is Partition?
    How to change the number of partitions
    How to enable the different joins in spark using configurations
    Sort Merge Join
    Broad Cast Join
    Hash Shuffle Join
    Real Time project explanation.

     

Your Instructor

netaadmin