Bigdata,Hadoop,Spark

Trainer Name : Vamsi Krishna 15 yrs exp

Advanced 0(0 Ratings) 0 Students Enrolled
Created By Guttula vamsi krishna Last Updated Thu, 12-Dec-2019 English

About The Instructor
  • 0 Reviews
  • 0 Students
  • 1 Courses
+ View More

Vamsi Krishna is an IT professional with 15+ years of Experience. He worked on various technologies.

He got 11+ years of experience in Bigdata,Hadoop & Spark practice. He is very enthusiastic trainer providing best service to his students.

Please Scroll Down to see the Cource Content.....
Course Schedules
Batch Course Days Class Timings Enroll
Weekday Batch

Online

23 Nov 2019 MON - FRI
(48 Days)
07:00 AM - 08:30 AM IST
(01:30 Hours)
What Will I Learn?
Requirements
+ View More
Description

Curriculum For This Course
378 Lessons
  • What is Big data?
  • Sources of Big data
  • Categories of Big data
  • Characteristics of Big data
  • Use-cases of Big data
  • Traditional RDBMS vs Hadoop
  • Traditional RDBMS vs Hadoop
  • History of Hadoop
  • Understanding Hadoop Architecture
  • Fundamental of HDFS (Blocks, Name Node, Data Node, Secondary Name Node)
  • Block Placement &Rack Awareness
  • HDFS Read/Write
  • Drawback with 1.X Hadoop
  • Introduction to 2.X Hadoop
  • High Availability
  • Making/creating directories
  • Removing/deleting directories
  • Print working directory
  • Change directory
  • Manual pages
  • Help
  • Vi editor
  • Creating empty files
  • Creating file contents
  • Copying file
  • Renaming files
  • Removing files
  • Moving files
  • Listing files and directories
  • Displaying file contents
  • Understanding Hadoop configuration files
  • Hadoop Components- HDFS, MapReduce
  • Overview of Hadoop Processes
  • Overview of Hadoop Distributed File System
  • The building blocks of Hadoop
  • Hands-On Exercise: Using HDFS commands
  • Map Reduce 1(MRv1)
  • Map Reduce Introduction
  • How Map Reduce works?
  • Communication between Job Tracker and Task Tracker
  • Anatomy of a Map Reduce Job Submission
  • MapReduce-2(YARN)
  • Limitations of Current Architecture
  • YARN Architecture
  • Node Manager & Resource Manager
  • DDL Commands
  • Create DB
  • Create table
  • Alter table
  • Drop table
  • Truncate table
  • DML Commands
  • Insert command
  • Update command
  • Delete command
  • SQL Constraints
  • NOT NULL
  • UNIQUE
  • PRIMARY KEY
  • FOREIGN KEY
  • CHECK
  • Aggregate functions
  • AVG ()
  • COUNT ()
  • FIRST ()
  • LAST ()
  • MAX ()
  • MIN ()
  • SUM ()
  • Scalar functions
  • LOWER () / LCASE ()
  • UPPER () / UCASE ()
  • MID ()
  • Joins
  • Cross join
  • Inner join
  • Outer join
  • Left Outer join
  • Right Outer join
  • Views
  • Indexes
  • Setup Java and JDK
  • Install Scala with IntelliJ IDE
  • Develop Hello World Program using Scala
  • Introduction to Scala
  • REPL Overview
  • Declaring Variables
  • Programming Constructs
  • Code Blocks
  • Scala Functions - Getting Started
  • Scala Functions - Higher Order and Anonymous Functions
  • Scala Functions - Operators
  • Object Oriented Constructs - Getting Started
  • Object Oriented Constructs - Getting Started
  • Object Oriented Constructs - Classes
  • Object Oriented Constructs - Companion Objects and Case Class
  • Operators and Functions on Classes
  • External Dependencies and Import
  • Scala Collections - Getting Started
  • Mutable and Immutable Collections
  • Sequence (Seq) - Getting Started
  • Linear Seq vs. Indexed Seq
  • Scala Collections - Primitive Operations
  • Scala Collections - Sorting Data
  • Scala Collections - Grouping Data
  • Scala Collections - Set
  • Scala Collections - Map
  • Tuples in Scala
  • Development Cycle - Developing Source code
  • Development Cycle - Compile source code to jar using SBT
  • Development Cycle - Setup SBT on Windows
  • Development Cycle - Compile changes and run jar with arguments
  • Development Cycle - Setup IntelliJ with Scala
  • Development Cycle - Develop Scala application using SBT in IntelliJ
  • What is Apache Spark & Why Spark?
  • Spark History
  • Unification in Spark
  • Spark ecosystem Vs Hadoop
  • Spark with Hadoop
  • Introduction to Spark’s Python and Scala Shells
  • Spark Standalone Cluster Architecture and its application flow
  • RDD Basics and its characteristics, Creating RDDs
  • RDD Operations
  • Transformations
  • Actions
  • RDD Types
  • Lazy Evaluation
  • Persistence (Caching)
  • Module-Advanced spark programming
  • Accumulators and Fault Tolerance
  • Broadcast Variables
  • Custom Partitioning
  • Dealing with different file formats
  • Hadoop Input and Output Formats
  • Connecting to diverse Data Sources
  • Module-Spark SQL
  • Linking with Spark SQL
  • Initializing Spark SQL
  • Data Frames &Caching
  • Case Classes, Inferred Schema
  • Loading and Saving Data
  • Apache Hive
  • Data Sources/Parquet
  • JSON
  • Spark SQL User Defined Functions (UDFs)
  • Getting started with Kafka
  • Understanding Kafka Producer and Consumer APIs
  • Deep dive into producer and consumer APIs
  • Ingesting Web Server logs into Kafka
  • Getting started with Spark Streaming
  • Getting started with HBASE
  • Integrating Kafka-Spark Streaming-HBASE
  • Introduction
  • Sign up for AWS account
  • Setup Cygwin on Windows
  • Quick Preview of Cygwin
  • Understand Pricing
  • Create first EC2 Instance
  • Connecting to EC2 Instance
  • Understanding EC2 dashboard left menu
  • Different EC2 Instance states
  • Describing EC2 Instance
  • Using elastic IPs to connect to EC2 Instance
  • Using security groups to provide security to EC2 Instance
  • Understanding the concept of bastion server
  • Terminating EC2 Instance and relieving all the resources
  • Create security credentials for AWS account
  • Setting up AWS CLI in Windows
  • Creating s3 bucket
  • Deleting root access keys
  • Enable MFA for root account
  • Introduction to IAM users and customizing sign in link
  • Create first IAM user
  • Create group and add user
  • Configure IAM password policy
  • Understanding IAM best practices
  • AWS managed policies and creating custom policies
  • Assign policy to entities (user and/or group)
  • Creating role for EC2 trusted entity with permissions on s3
  • Assigning role to EC2 instance
  • Introduction to EMR
  • EMR concepts
  • Pre-requisites before setting up EMR cluster
  • Setting up data sets
  • Setup EMR with Spark cluster using quick options
  • Connecting to EMR cluster
  • Submitting spark job on EMR cluster
  • Validating the results
  • Terminating EMR Cluster
  • What is Big data?
  • Sources of Big data
  • Categories of Big data
  • Characteristics of Big data
  • Use-cases of Big data
  • Traditional RDBMS vs Hadoop
  • Traditional RDBMS vs Hadoop
  • History of Hadoop
  • Understanding Hadoop Architecture
  • Fundamental of HDFS (Blocks, Name Node, Data Node, Secondary Name Node)
  • Block Placement &Rack Awareness
  • HDFS Read/Write
  • Drawback with 1.X Hadoop
  • Introduction to 2.X Hadoop
  • High Availability
  • Making/creating directories
  • Removing/deleting directories
  • Print working directory
  • Change directory
  • Manual pages
  • Help
  • Vi editor
  • Creating empty files
  • Creating file contents
  • Copying file
  • Renaming files
  • Removing files
  • Moving files
  • Listing files and directories
  • Displaying file contents
  • Understanding Hadoop configuration files
  • Hadoop Components- HDFS, MapReduce
  • Overview of Hadoop Processes
  • Overview of Hadoop Distributed File System
  • The building blocks of Hadoop
  • Hands-On Exercise: Using HDFS commands
  • Map Reduce 1(MRv1)
  • Map Reduce Introduction
  • How Map Reduce works?
  • Communication between Job Tracker and Task Tracker
  • Anatomy of a Map Reduce Job Submission
  • MapReduce-2(YARN)
  • Limitations of Current Architecture
  • YARN Architecture
  • Node Manager & Resource Manager
  • DDL Commands
  • Create DB
  • Create table
  • Alter table
  • Drop table
  • Truncate table
  • DML Commands
  • Insert command
  • Update command
  • Delete command
  • SQL Constraints
  • NOT NULL
  • UNIQUE
  • PRIMARY KEY
  • FOREIGN KEY
  • CHECK
  • Aggregate functions
  • AVG ()
  • COUNT ()
  • FIRST ()
  • LAST ()
  • MAX ()
  • MIN ()
  • SUM ()
  • Scalar functions
  • LOWER () / LCASE ()
  • UPPER () / UCASE ()
  • MID ()
  • Joins
  • Cross join
  • Inner join
  • Outer join
  • Left Outer join
  • Right Outer join
  • Views
  • Indexes
  • Setup Java and JDK
  • Install Scala with IntelliJ IDE
  • Develop Hello World Program using Scala
  • Introduction to Scala
  • REPL Overview
  • Declaring Variables
  • Programming Constructs
  • Code Blocks
  • Scala Functions - Getting Started
  • Scala Functions - Higher Order and Anonymous Functions
  • Scala Functions - Operators
  • Object Oriented Constructs - Getting Started
  • Object Oriented Constructs - Getting Started
  • Object Oriented Constructs - Classes
  • Object Oriented Constructs - Companion Objects and Case Class
  • Operators and Functions on Classes
  • External Dependencies and Import
  • Scala Collections - Getting Started
  • Mutable and Immutable Collections
  • Sequence (Seq) - Getting Started
  • Linear Seq vs. Indexed Seq
  • Scala Collections - Primitive Operations
  • Scala Collections - Sorting Data
  • Scala Collections - Grouping Data
  • Scala Collections - Set
  • Scala Collections - Map
  • Tuples in Scala
  • Development Cycle - Developing Source code
  • Development Cycle - Compile source code to jar using SBT
  • Development Cycle - Setup SBT on Windows
  • Development Cycle - Compile changes and run jar with arguments
  • Development Cycle - Setup IntelliJ with Scala
  • Development Cycle - Develop Scala application using SBT in IntelliJ
  • What is Apache Spark & Why Spark?
  • Spark History
  • Unification in Spark
  • Spark ecosystem Vs Hadoop
  • Spark with Hadoop
  • Introduction to Spark’s Python and Scala Shells
  • Spark Standalone Cluster Architecture and its application flow
  • RDD Basics and its characteristics, Creating RDDs
  • RDD Operations
  • Transformations
  • Actions
  • RDD Types
  • Lazy Evaluation
  • Persistence (Caching)
  • Module-Advanced spark programming
  • Accumulators and Fault Tolerance
  • Broadcast Variables
  • Custom Partitioning
  • Dealing with different file formats
  • Hadoop Input and Output Formats
  • Connecting to diverse Data Sources
  • Module-Spark SQL
  • Linking with Spark SQL
  • Initializing Spark SQL
  • Data Frames &Caching
  • Case Classes, Inferred Schema
  • Loading and Saving Data
  • Apache Hive
  • Data Sources/Parquet
  • JSON
  • Spark SQL User Defined Functions (UDFs)
  • Getting started with Kafka
  • Understanding Kafka Producer and Consumer APIs
  • Deep dive into producer and consumer APIs
  • Ingesting Web Server logs into Kafka
  • Getting started with Spark Streaming
  • Getting started with HBASE
  • Integrating Kafka-Spark Streaming-HBASE
  • Introduction
  • Sign up for AWS account
  • Setup Cygwin on Windows
  • Quick Preview of Cygwin
  • Understand Pricing
  • Create first EC2 Instance
  • Connecting to EC2 Instance
  • Understanding EC2 dashboard left menu
  • Different EC2 Instance states
  • Describing EC2 Instance
  • Using elastic IPs to connect to EC2 Instance
  • Using security groups to provide security to EC2 Instance
  • Understanding the concept of bastion server
  • Terminating EC2 Instance and relieving all the resources
  • Create security credentials for AWS account
  • Setting up AWS CLI in Windows
  • Creating s3 bucket
  • Deleting root access keys
  • Enable MFA for root account
  • Introduction to IAM users and customizing sign in link
  • Create first IAM user
  • Create group and add user
  • Configure IAM password policy
  • Understanding IAM best practices
  • AWS managed policies and creating custom policies
  • Assign policy to entities (user and/or group)
  • Creating role for EC2 trusted entity with permissions on s3
  • Assigning role to EC2 instance
  • Introduction to EMR
  • EMR concepts
  • Pre-requisites before setting up EMR cluster
  • Setting up data sets
  • Setup EMR with Spark cluster using quick options
  • Connecting to EMR cluster
  • Submitting spark job on EMR cluster
  • Validating the results
  • Terminating EMR Cluster
Student Feedback
0
Average Rating
  • 0%
  • 0%
  • 0%
  • 0%
  • 0%
Rs.11000 Rs.15000
Includes:
  • On Demand Videos
  • 378 Lessons
  • Full Lifetime Access
  • Access On Mobile And Tv