Big Data Cloud Analytics By Example

Cloud Analytics










Fari Payandeh



Sept 21, 2013

Fari Payandeh

I just created a cloud at home. Suppose I start on a project aiming to create a computer game. I purchase 4 servers and some software. After a couple of weeks I realize that in order to complete the project I’ll need 6 more servers, but I have run out of money. I decide to write an operating system that connects the 4 servers and creates a single virtual platform– simplified version of visualization software such as VMware. I won’t get into the nuts and bolts, suffice it to say that I can now create as many as 12 virtual servers be it Windows, Linux, or UNIX.  Note that the underlying hardware hasn’t changed; rather, I am making  efficient use of them. The cloud masks the infrastructure beneath it. I am going to proceed with my example on the premise that I’ll remain on private cloud (not sharing resources with other systems in a data center)

I complete my video game project and subsequently form a small business offering the game online. A year has gone by and my business is booming but I no longer have the time or the bandwidth to do all the work myself. I weigh the pros and cons, and I decide to move my application to Amazon and let them run it (Software as a Service). Although I have to pay for the service, it is less expensive than running it in-house. Besides, since I don’t have to worry about my application, I can focus on marketing. I’ll be able to cast a wide net and get more subscribers. My investment might pay dividends later.

A couple of years down the road my online game is phenomenally successful. I have over 1000 new subscribers each day and the daily data generated by my application is over 100 TB. This confirms that I made the right decision by choosing cloud services because my internal IT infrastructure couldn’t scale to handle the volume of data. Moreover, the costs could spiral out of control amid the need for rapid expansion and the expenses which accompanies the chaos and the confusion that are inherent in attempting to grow IT in a short period of time. Accommodating 100 TB of data each day will require steady and sustainable resource allocation as well as an elastic infrastructure. Cloud is a suitable solution.

Fast forward to the present time. Bad news! My subscribers are leaving and my business takes a nosedive. I hire a game guru who knows all the tricks of the trade to find out which way the wind is blowing. He tells me that my subscribers are switching to a competitor. My competitor has borrowed ideas from my game and it has produced a smash hit. I go online and take a look at their game. I am at a loss and cannot figure out the underlying cause.

Shortly thereafter, I am told that the competitor has a way of predicting when players might be getting too flustered with the game and call it quits. Their application is designed to utilize nuggets of intelligence to automatically ease the player’s frustration by subtly providing clues at the right time when the player is in a tight corner. Although these cases may be few and far between within each session, providing more opportunities to make headway helps turn the corner. That prevents the user from throwing in the towel.

How is that possible? Well, that’s the job of Data Analytics. The competitor is running SAP HANA’s in-memory Predictive Analytics software on IBM Smartcloud. HANA invokes a function in the game’s application before the barometer reaches the saturation point. We know that all else being equal the longer a game can keep a player engaged, the less likely that he/she will switch to a different game. Engagement is the key.  HANA communicates with the game by sending it signals which modifies the game’s behavior in real-time. In short, it strikes the right balance between hardship and sail-through.

The following includes the list of companies that in my view will in time dominate the Big Data Cloud Analytics market.

Amazon, Google, Microsoft, IBM, Oracle, Pivotal (VMware/EMC)

Google F1 Database: One Step Closer To Discovering The DB Holy Grail


Fari Payandeh





Sept 15, 2013

Fari Payandeh

Google recently replaced its AdWords MySql Database with a Database that they built in-house namely F1 Database. AdWords serves thousand of users, ” which all share a database over 100TB serving up hundreds of thousands of requests per second, and runs SQL queries that scan tens of trillions of data rows per day,” Google said.

After reading Google’s paper on its F1 Database (not open source), I started thinking about its ramifications for Databases in general and Big Data in particular. Google F1 Database paper might trigger new initiatives that eventuate in materializing the phantom (next paragraph). The paper mentions few challenges with F1 DB that need to be addressed. I came away with two lingering issues. First, there is no mention of security. Secondly, it states, “Hide RPC latency, Buffer writes in client, send as one RPC”. What will happen if the network connection between the client and the Database goes down? Will the data be lost? This is a serious problem for operations that need to commit as fast as possible; Airline reservation is one.  I probably misunderstood.

The system resembles a hybrid between Relational and Hierarchical (think mainframe) Databases. What is the Holy Grail  in the Database world?  Relational Databases (RDBMS) are like high-rises comprising many apartments.  What if there are no vacancies and people have lined up to rent from us. The way RDBMS has handled the demand is by adding more floors on top of the high-rise. It is expensive and slows down the day-to-day operations. A new technology (NoSql) emerged a few years ago and solved the space allocation problem. Instead of building new floors we place the tenants in inexpensive houses. Once we run out of vacant houses we give the tenants new houses. The downside? It makes managing the place more difficult and  we might unwittingly  reserve the same house for two different individuals. There are ways to prevent that, but it’s a perplexing task and it places a lot of pressure on the engineers who design the housing complex. The Holy Grail is to discover a method by which we  can combine the best of both worlds and remove the negative.

Following Google’s invaluable tips in the paper, no doubt some engineers are working hard to figure out how to build an F1++ Database. What if they succeed? What will happen to NoSql and NewSql if they produce an open source Database System? The confluence of several forces that are currently shaping open source, Big Data, Mobile, and Cloud technologies might in time make NoSql and the existing NewSql irrelevant– flash-aware applications, shared-nothing architecture, Mapreduce methods, software-defined storage, in-memory computing, shared virtual storage array networks, new compression algorithms, atomic writes, horizontal scalability, software-defined networking, columnar technology,  progress in fault tolerance, database sharding, and solid state drives.

There is one very powerful force that in my view will keep NoSql alive and well for years to come and that is the power of developers. The genie is out of the bottle and all the nuclear fusion combined in the world cannot put it back in there. Speaking from personal experience as a Developer/DBA, I know that developers hate roadblocks. Once they start on something they like to continue working. To get them away from what they are deeply involved in is like taking a pacifier from a baby. For the first time in history, they can get on their generally free and open source bikes and run without the hassle of calling the DBA’s to open the gates for them every 40 miles. NoSql pushed the Database inside the developers’ world and they love it! Is it good for the industry? Perhaps not, but it might just create millions of programming jobs. After all, somebody has to untangle the convoluted code (not to the fault of developers) left behind. Separation of Database and code, as painful as it might be for developers is a necessity. It establishes checks and balances. According to Google’s paper, they have taken those factors into account. Google F1 is a developer friendly Database. Hopefully the trend will continue.

From Google:

F1 is a distributed relational database system built at
Google to support the AdWords business. F1 is a hybrid
database that combines high availability, the scalability of
NoSQL systems like Bigtable, and the consistency and us-
ability of traditional SQL databases. F1 is built on Span-
ner, which provides synchronous cross-datacenter replica-
tion and strong consistency. Synchronous replication im-
plies higher commit latency, but we mitigate that latency
by using a hierarchical schema model with structured data
types and through smart application design. F1 also in-
cludes a fully functional distributed SQL query engine and
automatic change tracking and publishing.