Hadoop vs. NoSql vs. Sql vs. NewSql By Example



Fari Payandeh

Sept 8, 2013

Fari Payandeh

Although Mainframe Hierarchical Databases are very much alive today, the Relational Databases (RDBMS) (SQL) have dominated the Database market, and they have done a lot of good. The reason the money we deposit doesn’t go to someone else’s account, our airline reservation ensures that we have a seat on the plane, or we are not blamed for something we didn’t do, etc… RDBMS’ data integrity is due to its adherence to ACID (atomicity, consistency, isolation, and durability) principles. RDBMS technology dates back to the 70’s.

So what changed? Web technology started the revolution. Today, many people shop on Amazon. RDBMS was not designed to handle the number of transactions that take place on Amazon every second. The primary constraining factor was RDBMS’ schema.

NoSql Databases offered an alternative by eliminating schemas at the expense of relaxing ACID principles. Some NoSql vendors have made great strides towards resolving the issue; the solution is called eventual consistency. As for NewSql, why not create a new RDBMS minus RDBMS’ shortcomings utilizing modern programming languages and technology. That is how some of the NewSql vendors came to life.  Other NewSql companies created augmented solutions for MySql.

Hadoop is a different animal altogether. It’s a file system and not a database. Hadoop’s roots are in  internet search engines. Although Hadoop and associates (HBase, Mapreduce, Hive, Pig, Zookeeper) have turned it into a mighty database, Hadoop is an inexpensive, scalable,  distributed filesystem with fault tolerance. Hadoop’s specialty at this point in time is in batch processing, hence suitable for Data Analytics.

Now let’s start with our example: My imaginary video game company recently put our most popular game online after ten years of being in business, shipping our games to retailers around the globe. Our customer information is currently stored in a Sql Server Database  and we have been happy with it. However, since the players started playing the game online, the database is not able to keep up and the users are experiencing delays. As our user base grows rapidly, we spend money buying more and more Hardware/Software but to no avail. Losing customers is our primary concern. Where do we go from here?

We decide to run our online game application in NoSql and NewSql simultaneously by segmenting our online user base. Our objective is to find the optimal solution. The IT department selects NoSql Couchbase (document oriented like MongoDB) and NewSql VoltDB.

Couchbase is open source, has an integrated caching mechanism, and it can automatically spread data across multiple nodes. VoltDB is an ACID compliant RDBMS, fault tolerant, scales horizontally, and possesses a shared-nothing & in-memory architecture. At the end, both systems are able to deliver. I won’t go into the intricacies of each solution because this is an example and comparing these technologies in the real-world will require testing, benchmarking, and in-depth analyses.

Now that the online operations are running smoothly, we want to analyze our data to find out where we should expand our territory. Which are the most suitable countries for marketing our products?  In doing so, we need to merge the Sql Server customer Data Warehouse with the data from the online gaming database and run analytical reports. That’s where Hadoop comes in. We configure a Hadoop system and merge the data from the two data sources. Next, we use Hadoop’s  Mapreduce in conjunction with the open source R  programming language to generate the analytics reports.

Recent Acquisitions In Big Data

Big Data Companies

Fari Payandeh





Fari Payande

Aug 18, 2013

I am no expert in market analysis of any kind, but my personal Big Data Analytics embedded in my small brains which I inherited from my cave-dwelling ancestors tells me that the industry is anticipating huge profits from Big Data.

Big Data vendors and consulting companies are filling their gaps amid growing competition to provide complete solutions. What is interesting is the global reach  for finding good fits:  Yahoo Ztelic-China, Cloudera Myrrix-UK, EMC ScaleIO-Israel, SAP Hybris-Switzerland. Those were the recent accusations by Big Data Mega Stars.

CSC grabbed Infochimps for its analytics capabilities. Cisco brought home Sourcefire to beef up its Cyber Security. Yahoo acquired Zteli  specializing in social-network data. Cloudera took Myrrix to augment its arsenal with machine learning capacity. Raytheon purchased Visual Analytics Inc for its visualization tools. EMC nabbed ScaleIO to bolster its high performance storage offerings. Tibco bought StreamBase to provide real-time analytics. SAP acquired the ecommerce software maker hybris. Cisco landed Composite Software to boost its “Master Data Management”.