How Can I Learn Hadoop On My Own?

Sept 1, 2013 by Fari Payandeh

1- Open an account with Amazon Web Services

Click here to go to the site

2-  Register for Amazon Elastic Compute Cloud (Amazon EC2)  and Simple Storage Service (S3)

Watch this video for the instructions:

Click here to watch the video

3-  You’d need the book: Hadoop the definitive guide. Follow the instructions in  “Hadoop in the cloud”


Other Hadoop Installations:

Hortonworks Hadoop on Windows Tutorial

Cloudera QuickStart VM

Cloudera Hadoop Demo VM on VirtualBox – Installation

MapR Academy Videos

Hadoop-based data analytics on IBM SmartCloud Tutorial

Install Ubuntu in Oracle VM Virtual Box

Running Hadoop on Ubuntu Linux (Single-Node Cluster)

Writing an Hadoop MapReduce Program in Python

Openstack Training Videos

Developing Big-Data Applications with Apache Hadoop

How to get started with Hadoop – Hello World

Big Data Is Not About WHAT; It’s About HOW

Big Data Studio

16 December 2012 — By Fari Payandeh

I think the person who coined the term “Big Data” should receive an award for creating “The most confusing signature in IT history”. It implies that we are grappling with so much data that we needed this new technology called Big Data. Suppose I am anticipating my Database to grow to 300 TB.  Is that large enough to qualify as Big Data? According to historical trends, by all means yes. Well, In that case I am going to go with a Big Data solution; perhaps Hadoop-NoSql. But, why? Teradata— a tried-and-true  Relational Database–  is more than capable of handling 300 TB of Data. Why would we want to adopt an emerging technology in place of a product with a proven track record?  In short, data being big, alone doesn’t mean that we need a “Big Data solution”.

With the exception of columnar databases such as Sybase IQ, traditional Databases stored the data within tables, and that provided a solution for a variety of applications. As Web 2.0 model gained popularity and more and more customers joined the online market, some Web 2.0 companies had to abandon RDBMS due to its shortcomings to meet the demands of the new era. That said, due to problems inherent in NoSql, a new generation of Database systems led by VoltDB and RainStor may tilt the balance in favor of RDBMS, but the jury is out on that.

Summary: At this point in time, Big Data means adopting NON-RDBMS methods for processing large data sets. As such, it is about HOW.