Big Data Is Not About WHAT; It’s About HOW

Big Data Studio

16 December 2012 — By Fari Payandeh

I think the person who coined the term “Big Data” should receive an award for creating “The most confusing signature in IT history”. It implies that we are grappling with so much data that we needed this new technology called Big Data. Suppose I am anticipating my Database to grow to 300 TB.  Is that large enough to qualify as Big Data? According to historical trends, by all means yes. Well, In that case I am going to go with a Big Data solution; perhaps Hadoop-NoSql. But, why? Teradata— a tried-and-true  Relational Database–  is more than capable of handling 300 TB of Data. Why would we want to adopt an emerging technology in place of a product with a proven track record?  In short, data being big, alone doesn’t mean that we need a “Big Data solution”.

With the exception of columnar databases such as Sybase IQ, traditional Databases stored the data within tables, and that provided a solution for a variety of applications. As Web 2.0 model gained popularity and more and more customers joined the online market, some Web 2.0 companies had to abandon RDBMS due to its shortcomings to meet the demands of the new era. That said, due to problems inherent in NoSql, a new generation of Database systems led by VoltDB and RainStor may tilt the balance in favor of RDBMS, but the jury is out on that.

Summary: At this point in time, Big Data means adopting NON-RDBMS methods for processing large data sets. As such, it is about HOW.

The Curious Case Of Big Data Definition

 Big Data Studio

As Featured On EzineArticles

15 December 2012 — By Fari Payandeh

Most technical people I have talked to think that Big Data is nothing new. They seem to be proceeding on the premise that Big Data’s sole purpose in life is to serve business intelligence.  As someone said to me the other day, “Walmart has been enjoying the fruit of their investment in data warehousing/business intelligence for years; way before there was a Hadoop or NoSql in existence”. True, but Big Data is not about “What”. It’s about “How”. How long does Walmart’s nightly jobs run to transform the raw data into meaningful data (business data) that can be used by its BI tools? Moreover, is Walmart currently processing its unstructured data to add value to its BI strategy?

I watched Werner Vogels, the CTO of Amazon elaborate on what is today called “Big Data” back in 2006. He was talking about how Amazon had made a radical shift from Relational Databases to flat files to store its customer data. He said that Relational Databases weren’t able to meet Amazon’s requirements. What is interesting is that Werner Vogel was referring to the difficulties they were facing in processing the OLTP portion of their business and not DSS. However, today, Big Data encompasses OLTP, DSS, and real-time BI.

Let’s balance the myth against the facts: What is not Big Data? Big Data is not attached to a set of technologies nor is it applicable to every single company that sits on top of huge amounts of data. It is true that the IT industry has made great strides in data caching, I/O throughput, scalability, availability, consistency, real-time data processing, and working with unstructured data. However, those enhancements could have come to life organically by the invisible hands of market dynamics to support the evolution of business intelligence. Where facts and myth deviate is that the myth fails to take account of the likelihood that we would have been where we are today even if there were no likes of Amazon around.

In conclusion, the term “Big Data”, although legitimate in that it is referring to  new ways of processing large amounts of data, is misleading due to the fact that “size” is part of the name, but size types (small, medium, large) are not constants and they change overtime. What was considered a large data set twenty years ago may fall into small category today. I personally would rather refer to it as “Net Data”, alluding to the way  data is spread across many servers on disk files as opposed to Databases.