• Self Paced: 14 Classes
  • |
  • CloudxLab™: 90 days - 24x7, Global
  • |
  • Project: 2 wks
Self Paced Classes
Full Access To CloudxLab™ - 90x24 Hrs
Training By Industry Experts
Real Time Project
Earn Certificate In Big Data and Hadoop
24x7 Online Expert Help
Course Query
Self Paced Classes
Starts Time Details Trainer Price
Self Paced Anytime
  • Self Paced Classes
  • Total 14 Classes
Sandeep Giri

(25% off - Early Bird) (Incl. Taxes)
Upcoming Live Courses

  Big Data with Hadoop and Spark - Starts on 14th January, 2017

What?

Our Big Data and Hadoop course is designed to impart knowledge, skills and hands on experience required to become a successful Hadoop Developer, Administrator or Tester.

An in-depth knowledge of the various tools and technologies that come under the Hadoop Eco-System will be covered in details. The focus will be on hands-on experience.

Concepts Covered: Big Data, NoSQL, Streaming, Analytics
Tools Covered: HDFS, MapReduce, Pig, Hive, HBASE, Zookeeper, Flume, Sqoop, Oozie, Spark, Mahout

Why learn Big Data and Hadoop?

Big Data is a collection of massive and complex data sets that are very difficult to manage and process with the existing tools intended for that purpose. Data generation is becoming a more obvious result of our everyday devices becoming cheaper, more powerful, compact and connected. We are generating data all the time such as tweeting, using emails, using facebook, uploading photos etc. Similarly our devices are also connected and are generating data. The result is a gargantuan mass of data that needs to be looked at for informed decision making.

The only way ahead for organisations is to be able to store and process such large amounts of data, and for which, they use Big Data platforms like Hadoop. That proves the high demand of Hadoop Developers, Administrators, Tester and Scientists.

The other way to measure the demand for Big Data and Hadoop technologies is to look at the number of jobs being posted around the world on these technologies.

Also, Big Data features in the top #3 technology trends in organisations as per Forbes and Gartner.

How?

Our classes are conducted live online by our instructors via webinar or hangout. These are not pre-recorded classes. The instructor delivers the class using presentations, collaborative drawing tools, screenshares. All attendees are usually muted during the class. However, they can ask questions in the webinar or hangout chat windows. The instructor answers any questions asked immediately after explaining a concept. The instructor also asks questions during the sessions to ensure maximum student engagement.

Every class is recorded, complete with the screen and the audio, and uploaded to the Learning Management System which is accessible to our attendees for life.

At the end of each session, assignments are provided which the attendees have to submit in the LMS (Learning Management System). The assignments are continuously reviewed by our instructors and teaching assistants. In case we conclude that an attendee requires extra detailing, we schedule extra one-on-one sessions with that attendee.

What makes Big Data and Hadoop course unique?

  • Interactive Classes: More Questions. Less Lectures.
  • Simple explanations to complex topics by industry experts
  • Hands on workshops and real time projects.
  • Quizzes & Assignments
  • Certificate of Course at the end of course
  • A real time project involving Hadoop
  • Lifetime access to course content
  • CloudxLab™ - Access to the cloud infrastructure if learners don't wish to install Hadoop on their computers

What are the prerequisites to join Big Data and Hadoop course?

To be able to take maximum benefit out of this course, you should have knowledge of the following:

  1. Basics Of SQL. You should know the basics of SQL and databases.
  2. A know-how of the basics of programming.We will be providing video classes covering the basics of Java. What is expected of the attendee is the ability to create a directory and see whats inside a file from the command line, and an understanding of 'loops' in any programming language.

In addition, the attendee should have the following hardware infrastructure:

  • A good internet connection. An internet speed of 2mbps is good enough.
  • Access to a computer. Since it is an online course, you would have to install webinar or hangout on your computer.
  • Nice To Have: A power backup for your router as well as computer.
  • Nice To Have: A good quality headphones.

What kind of project / real time experience?

After all sessions are over, we ask for the student's preference for a project. We form teams of 3-4 members and based on their interests we assign a project to each team. A project is usually of three weeks duration. If a team has an idea it wants to work on as a project, we screen the idea and the team can work on it, or we assign a project from the industry. Since it is not possible to provide real data from the industry, we provide data anonymously for projects. We continuously support and guide the teams during projects by conducting regular scheduled meetings and also provide individual assistance.

The projects assigned can also be based on public databases. There are various datasets available for free that can be found on any of the following websites:

A few examples of projects are as follows:

  • Understanding the trends and patterns in BitCoin transaction graphs by qualitative analysis. BitCoin is a virtual currency. The way a coin is mined is based on transaction logs. BitCoin transaction logs keep growing almost every mili second, and therefore, processing these transaction logs is a real challenge.
  • Understanding the correlation between the temperature of various cities and the stock market.
  • Processing Apache Log for ERRORs. Preparing web analytics based on apache weblogs:
    • Which services are slow
    • Which services have a high number of users
    • What is the failure rate of each service
  • Preparing recommendations based on the apache logs.
  • Using social media to compare a brand's marketing campaigns. The testing is basically done using sentiment analysis.

Feedback from our alumni

Mohammad Abunaser

Mohammad Abunaser

Global Solutions Architect
LinkedIn Profile
Ratings: (5.0/5.0)
Review:The interaction with the instructor and the hands on demonstration of how the underlying components (Flume, Sqoop, Pig, HDFS, Hbase, etc) make this course unique. You would have the opportunity to work on a real cluster where Hadoop and other components are installed. Also, the fact that the course spans over multiple weeks makes it possible and easy to study the material and practice on your personal pace. (Five star rating out of five)
Mohan Das

Mohan Das

Quora Profile
Ratings: (5.0/5.0)
Review:Know Big Data - Big data and Hadoop provides great training in several Big Data technologies. On Contents & instructor (particularly the hadoop course which I have taken) Mr.Sandeep was really good and delivers very well. He has very good theoretical and practical knowledge of hadoop system. Great Job! . I highly recommend this as the go to place for anybody wanting to get into Big Data.
Sukhwinder Singh

Sukhwinder Singh

Quora Profile
Ratings: (5.0/5.0)
Review:Words cannot express my gratitude to the KNOW BIG DATA team, especially Sandeep Giri. The course was online but it never felt impersonal. The course was interactive and all concepts and details were described so clearly that it was accessible even if you don’t have a background in computers. They made it easy to understand, implement and relate to your job. It was nice group of people in the class. I wish KNOW BIG DATA team all the success in future!!
Vivek Agarwal

Vivek Agarwal

Quora Profile
Ratings: (5.0/5.0)
Review:Know BigData provides a great platform for the people who want to have a good theoretical as well as practical experience in using the latest stack of tools and technologies related to Hadoop and Big Data . On top of that , the way its instructor, Sandeep Giri interacts and explains the technologies throughout the course duration, makes it a way lot easier and fun to understand. I would definitely recommend everyone interested in learning BigData and Hadoop to make use of all the knowledge and experience Sandeep has to share . I had a really good learning experience with KnowBigData .Looking forward to interact again if it happens !!! All the Best !!
Parveen Kumar

Parveen Kumar

VP - Engineering at CommonFloor
LinkedIn Profile
Ratings: (5.0/5.0)
Review: KnowBigData's "Big data & Hadoop" is one of the best courses I have attended online. Not only the instructor knows the concept extremely well but also very passionate about explaining difficult concept in simple way. The quiz is also very useful after some sessions to revise the learnings.
Gunjan Narulkar

Gunjan Narulkar

Data Scientist at Data Semantics
LinkedIn Profile
Ratings: (5.0/5.0)
Review: Just finished a course on Hadoop basics offered by KnowBigData.. An awesome experience in terms of exposure and learning to the full stack of technology.. but more importantly, to be in discussion with some one as experienced and brilliant and yet so grounded as Sandeep was far more enriching then anything else! I highly recommend this course for a holistic learning experience.
Soma Pandey

Soma Pandey

Consultant - Smart Grid Communications at Essel Vidyut Vitaran Nigam
PhD, Wireless Mesh Networks
LinkedIn Profile
Ratings:
Review: I attended Big Data classes and although I am from wireless communication area, still I was not only able to 'know big data' but also became well versed with it. The course is meticulously designed so as not to leave out any major topic on Big Data and its tools. Sandeep's method of teaching is excellent. Every question he answers with great patience and respect. I will strongly recommend this course to anyone who wishes to take up a career in this field. Other wise also people from any field who wish to diversify in this area must definitely take up this course.
Savita Singh

Savita Singh

Director Engineering, Target Technology Services
LinkedIn Profile
Ratings: (5.0/5.0)
Review: Joined the Hadoop class from Know BIG DATA 5 wks back and its been a motivating experience. Last I coded was 20yrs back and but thanks to the instructor led training - I am executing Pig Latin and Hive commands to solve data problems and look forward to soon be able to complete small projects all by myself. Sandeep has been a great instructor, very very patient, always ready to put in extra time to clarify doubts and work at your pace and schedule.
Dr. Makhan Virdi

Dr. Makhan Virdi

Researcher, NASA - DAAC
LinkedIn Profile
Ratings: (5.0/5.0)
Review:

Big Data with Spark: This is not a typical (online) classroom course. It is not just a series of videos with one way flow of information. Instead, it is a highly interactive setting where the instructor shares insightful details when any question/doubt is raised during the lecture. Sandeep passionately teaches complicated concepts in easy to understand language, supported with good analogies and effective examples. The course is well structured, covering the concepts of Big Data in width and depth. I am currently half-way through the course and I am already working on translating the concepts learned in the class to real world problems.

See More Reviews at FaceBook Page.


Big Data and Hadoop Introduction Session

Big Data and Hadoop Course Curriculum

In the first class, we understand the what, why, and how of Big Data. We study the value proposition of Big Data to the industry. We discuss briefly all the components that are part of the Hadoop eco system. Then we dive deep into architecture of Hadoop File System (HDFS) and Compute Engine - Map Reduce.
The second session focuses on how to use CloudxLab. Understanding HDFS & MapReduce by using the online viewer of CloudxLab, how to install Hadoop on a machine and on the cluster is what we study. This is accompanied by hands-on live demonstration.
In the third session, we discuss in the details Map Reduce in general. We discuss streaming job and demonstrate how to use streaming job. We then try to solve five problems of increasing difficulty using MapReduce.
In the fourth session, we discuss MapReduce in Java and how to perform advanced operations using Java in Hadoop MapReduce.
The agenda of fifth session is Apache Spark. We discuss in detail what is Apache Spark and learn Spark Streaming. In this session, the trainer switches between hands-on live demonstration and presentation slides very frequently.
In sixth Session, we discuss PIG Basics and then dive deep into advanced PIG commands. Also, we discuss these with an example of UDF.
We continue our discussion of PIG. And then we start HIVE Basics discussion.
We dive into details of Hive. We discuss the "what, why and how" of NoSQL Databases. Then we continue to HBASE.
We complete our discussion of HBASE and Zookeeper. Zookeeper is a very important component of Hadoop Ecosystem, just as threads are important in Java. We discuss and give a demonstration of few examples around Zookeeper.
We discuss and demonstrate how to import structured and unstructured data using Apache Flume and Apache Sqoop. Then we get into the details of Oozie workflow.
The primary focus of session 10 is Mahout . We get into the details of every aspect of Mahout.
We give a live demo of how to use Mahout. Then we compare the popular NoSQL solutions and various components which look similar. The sole idea is clarify when to use which tool. We then form a team of 3 members and then decide upon the project.

What Certificate do we provide?

Based on your performance in Quizzes, Assignments and Projects, we provide the certificate in the following forms:

1. Hard Copy

We send a hard copy of the certificate to your address.



Digitally Signed 2. Digitally Signed Copy

We provide the PDF of the certificate that is digitally signed by KnowBigData.com.



3. Share Your Success

Share your course record with employers and educational institutions through a secure permanent URL.



LinkedIn Recommendation & Endorsements 4. LinkedIn Recommendation & Endorsements

We will provide a LinkedIn Recommendation based on your performance. Also, we will endorse you with tags such as Hadoop, Big Data.



Verifiable Certificate 5. Verifiable Certificate

We have provided an online form to validate whether the certificate is correct or not here. This assists recruiters to verify the certificate provided by us.




About the Team

Sandeep Giri

Sandeep Giri

Founder & Chief Instructor

Past Amazon.com, InMobi.com, Founder @ tBits Global, D.E.Shaw

Education Indian Institute of Technology, Roorkee

For last 12 years, Sandeep has been building products and churning large amounts of data for various product firms. He has an all around experience of software development and big data analysis.

Apart from digging data and technologies, Sandeep enjoys conducting interviews and explaining difficult concepts in simple ways.

Read More

Big Data and Hadoop - Frequently Asked Questions

Yes. Java is generally required for understanding MapReduce. MapReduce is a programming paradigm for writing your logic in the form of Mapper and reducer functions. We have worked hard to make it possible to understand MapReduce without the knowledge of Java. Java is only required for a class of 90 mins where we discuss advanced configuration of MapReduce. So, if you qualify the following three criteria, :

  1. Basics Of SQL. You should know the basics of SQL and databases. If you know about filters in SQL, you are expected to understand the course.
  2. A know-how of the basics of programming. If you understand 'loops' in any programming language, and if you are able to create a directory and see whats inside a file from the command line, you are good to get the concepts of this course even if you have not really touched programming for the last 10 years! In addition, we will be providing video classes on the basics of Java .
  3. Nevertheless, we provide a self paced 8 hours course on Java for free. As you as you signup, it would be available in your "My Courses" section.

There are two ways to do practicals.
  1. Using the our CloudxLabTo give our candidates a real experience of big data computing, we have provided a bunch of computers with all the big data technologies running on them since most of the big data technologies make sense only if done using multiple machines. You only have to use SSH Client (putty on windows) to connect to our cluster. Whether you are at home or office, and whether you are using a laptop or a tablet, you would be able to use hadoop. See more details about CloudxLab, here.
  2. Using Virtual MachinesSecond and the traditional way to experiment on Hadoop is to install a Virtual Machine. We will assist you in setting up Virtual Machine. However, most of our students are so happy with our CloudxLab that they hardly install a Virtual Machine.

Yes, we provide our own Certification. At the end of your course, you will work on a real time project. You will receive a Problem Statement along with a data-set to work on our CloudxLab. Once you are successfully through the project (Reviewed by an expert), you will be awarded a certificate with a performance-based grading.If your project is not approved in the first attempt, you can take extra assistance to understand concepts better and reattempt the project free of cost.

Hadoop is one of the hottest career options available today for software engineers. There are around 12,000 jobs currently in U.S. alone for Hadoop developers and the demand for Hadoop developers is far more than the availability. Learn more about career prospects in Hadoop at naukri.com and indeed.com.

Our cluster has all the softwares that are required for the course plus some more components such as GIT and R. In case you require a particular software to be installed on cluster which is not already there, please let us know.


Self Paced Classes
Starts Time Details Trainer Price
Self Paced Anytime
  • Self Paced Classes
  • Total 14 Classes
Sandeep Giri

(25% off - Early Bird) (Incl. Taxes)
Upcoming Live Courses

  Big Data with Hadoop and Spark - Starts on 14th January, 2017