Hadoop vs. NoSql vs. Sql vs. NewSql By Example

z-05

z-03

Fari Payandeh

Sept 8, 2013

Fari Payandeh

Although Mainframe Hierarchical Databases are very much alive today, the Relational Databases (RDBMS) (SQL) have dominated the Database market, and they have done a lot of good. The reason the money we deposit doesn’t go to someone else’s account, our airline reservation ensures that we have a seat on the plane, or we are not blamed for something we didn’t do, etc… RDBMS’ data integrity is due to its adherence to ACID (atomicity, consistency, isolation, and durability) principles. RDBMS technology dates back to the 70’s.

So what changed? Web technology started the revolution. Today, many people shop on Amazon. RDBMS was not designed to handle the number of transactions that take place on Amazon every second. The primary constraining factor was RDBMS’ schema.

NoSql Databases offered an alternative by eliminating schemas at the expense of relaxing ACID principles. Some NoSql vendors have made great strides towards resolving the issue; the solution is called eventual consistency. As for NewSql, why not create a new RDBMS minus RDBMS’ shortcomings utilizing modern programming languages and technology. That is how some of the NewSql vendors came to life.  Other NewSql companies created augmented solutions for MySql.

Hadoop is a different animal altogether. It’s a file system and not a database. Hadoop’s roots are in  internet search engines. Although Hadoop and associates (HBase, Mapreduce, Hive, Pig, Zookeeper) have turned it into a mighty database, Hadoop is an inexpensive, scalable,  distributed filesystem with fault tolerance. Hadoop’s specialty at this point in time is in batch processing, hence suitable for Data Analytics.

Now let’s start with our example: My imaginary video game company recently put our most popular game online after ten years of being in business, shipping our games to retailers around the globe. Our customer information is currently stored in a Sql Server Database  and we have been happy with it. However, since the players started playing the game online, the database is not able to keep up and the users are experiencing delays. As our user base grows rapidly, we spend money buying more and more Hardware/Software but to no avail. Losing customers is our primary concern. Where do we go from here?

We decide to run our online game application in NoSql and NewSql simultaneously by segmenting our online user base. Our objective is to find the optimal solution. The IT department selects NoSql Couchbase (document oriented like MongoDB) and NewSql VoltDB.

Couchbase is open source, has an integrated caching mechanism, and it can automatically spread data across multiple nodes. VoltDB is an ACID compliant RDBMS, fault tolerant, scales horizontally, and possesses a shared-nothing & in-memory architecture. At the end, both systems are able to deliver. I won’t go into the intricacies of each solution because this is an example and comparing these technologies in the real-world will require testing, benchmarking, and in-depth analyses.

Now that the online operations are running smoothly, we want to analyze our data to find out where we should expand our territory. Which are the most suitable countries for marketing our products?  In doing so, we need to merge the Sql Server customer Data Warehouse with the data from the online gaming database and run analytical reports. That’s where Hadoop comes in. We configure a Hadoop system and merge the data from the two data sources. Next, we use Hadoop’s  Mapreduce in conjunction with the open source R  programming language to generate the analytics reports.

8 thoughts on “Hadoop vs. NoSql vs. Sql vs. NewSql By Example

    1. Leon, thank you for your input, but Couchbase has a proven record in social gaming, which requires the database to process thousands of transactions per second. “Couchbase Server is proven in many of the most popular social and mobile games. With consistent high performance, easy scalability, and “always-on” capabilities, Couchbase helps you ensure optimal player experience and retention, even when a game goes viral. It’s got a flexible data model that lets you easily add new game features without taking your game offline”. Nonetheless, you are correct in that it may be confusing if it’s not taken within the context of NoSql consistency issues. I removed it.
      http://www.couchbase.com/social-gaming

  1. Hi Fari, nice article thanks for posting. It seems like each kind of SQL – if I can say – has its own use case. Very interesting to see the differences, especially on ACID, some can support it some can’t and some supporting it ‘eventually’, thus not in real time.

    RDMBS systems are out there since the 1970s and are built on solid mathematical theories and principles such as the SET theory in Algebra. These are indisputable models based on numbers and facts which tally up, which are ACID thus ensuring integrity. What do you think of the other systems which are not RDBMS, not ACID compliant Do you think they are less reliable since they are not ACID and not transactional? Is speed worth sacrificing for integrity?

    1. Thank you Kubliay,
      I have been waiting for more real-world use cases to be published so that I know more about the consequences of not being 100% ACID compliant. Apparently NoSql is doing well in the market. Netflix is using Cassandra and HBase, and the following includes MongoDB’s clients: Cisco, Craigslist, Disney, eBay, Forbes, Foursquare, Goldman Sachs, Intuit, LexisNexis, Met Life, MTV,
      Salesforce.com, Shutterfly and Telefonica.
      Amazon has been using its own NoSql. Based on what I have read It is possible to implement ACID in MongoDB by writing the code for it. Cassandra applies eventual consistency. We will know more as more and more companies attempt to use NoSql.

  2. Sachin

    This is my question
    I am a Mainframe Sr System analyst with 9 Yrs experience
    By reading some of you article and get motivated on Big data on learned Hadoop ,PIG Hive etc…. and did practice by installing the cloudera VM in my PC . Can you please let me know, Is there any way that I can get more exposure to Bigdata , I mean whether any chance for me to move to Big data world. I mean to ask , Mainframe + Bigdata ( Knowledge) any recruitment is going on ?

    1. Sachin,

      Your mainframe background will be an asset. That said, I would keep working on Hadoop. You can experiment with Hadoop on Amazon AWS… please see under Hadoop Tutorial menu option.

  3. Pingback: Hadoop for a Data Driven Organisation | Data Blaze Information Technology

  4. Pingback: Hadoop for a Data Driven Organisation | Data Chatter Box

Post:

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s