Simple description of Big Data

Big Data: Let’s say that we wanted to process  the data that could be captured by satellites a few years ago; It was impossible to do so. Reason being that  traditional Databases were incapable of  processing the volume, velocity, and variety (text, images,etc…) of the data. Today we can, and this significant  leap forward is coined “Big Data”. Therefore, Big Data is not just about the size of data. It has more to do with how the data is processed.

Hadoop:  Hadoop has become the  defacto platform for Big Data. Hadoop is the “operating System” that places the foundation for capturing data. Among the companies which contributed to lifting “modern Big Data” off the ground–  Google, Yahoo, Amazon, Facebook– Google is the only one that doesn’t use Hadoop. Although Google’s fingerprints are all over Hadoop, it’s currently utilizing its own Analytics platform called BigQuery, which is a cloud based service.

Hadoop Distributions: Apache, Hortonworks, MapR, Cloudera, Pivotal HD, Amazon, IBM, and Intel. Hortonworks is 100% Open Source Apache Hadoop Distribution.

NoSql Databases: They sit on top of Hadoop and they store the data without schemas. Why? because schemas have contributed to the sluggish performance of traditional Databases when processing large volumes of data. The frontrunners are MongoDB, Cassandra, Google Cloud Datastore, Amazon DynamoDB, Redis, CouchDB, Hbase, Neo4J, MarkDB, Riak, and CouchBase.

Data Analytics: Now that we can process the data gushing forth, we can look for patterns, correlations,  events, anomalies, and invisible nuggets of intelligence to help us make better decisions. The following companies are strong players in that space. IBM, Oracle, Google, Microsoft, SAP, Amazon, Teradata, SAS, Tibco, Statsoft, KXen, and Angoss Software. Pentaho and Jaspersoft are open source. Alteryx has recently made its software free. Tableau has a free public edition.

