Google F1 Database: One Step Closer To Discovering The DB Holy Grail

Google-F1_Db

Fari Payandeh

 

 

 

 

Sept 15, 2013

Fari Payandeh

Google recently replaced its AdWords MySql Database with a Database that they built in-house namely F1 Database. AdWords serves thousand of users, ” which all share a database over 100TB serving up hundreds of thousands of requests per second, and runs SQL queries that scan tens of trillions of data rows per day,” Google said.

After reading Google’s paper on its F1 Database (not open source), I started thinking about its ramifications for Databases in general and Big Data in particular. Google F1 Database paper might trigger new initiatives that eventuate in materializing the phantom (next paragraph). The paper mentions few challenges with F1 DB that need to be addressed. I came away with two lingering issues. First, there is no mention of security. Secondly, it states, “Hide RPC latency, Buffer writes in client, send as one RPC”. What will happen if the network connection between the client and the Database goes down? Will the data be lost? This is a serious problem for operations that need to commit as fast as possible; Airline reservation is one.  I probably misunderstood.

The system resembles a hybrid between Relational and Hierarchical (think mainframe) Databases. What is the Holy Grail  in the Database world?  Relational Databases (RDBMS) are like high-rises comprising many apartments.  What if there are no vacancies and people have lined up to rent from us. The way RDBMS has handled the demand is by adding more floors on top of the high-rise. It is expensive and slows down the day-to-day operations. A new technology (NoSql) emerged a few years ago and solved the space allocation problem. Instead of building new floors we place the tenants in inexpensive houses. Once we run out of vacant houses we give the tenants new houses. The downside? It makes managing the place more difficult and  we might unwittingly  reserve the same house for two different individuals. There are ways to prevent that, but it’s a perplexing task and it places a lot of pressure on the engineers who design the housing complex. The Holy Grail is to discover a method by which we  can combine the best of both worlds and remove the negative.

Following Google’s invaluable tips in the paper, no doubt some engineers are working hard to figure out how to build an F1++ Database. What if they succeed? What will happen to NoSql and NewSql if they produce an open source Database System? The confluence of several forces that are currently shaping open source, Big Data, Mobile, and Cloud technologies might in time make NoSql and the existing NewSql irrelevant– flash-aware applications, shared-nothing architecture, Mapreduce methods, software-defined storage, in-memory computing, shared virtual storage array networks, new compression algorithms, atomic writes, horizontal scalability, software-defined networking, columnar technology,  progress in fault tolerance, database sharding, and solid state drives.

There is one very powerful force that in my view will keep NoSql alive and well for years to come and that is the power of developers. The genie is out of the bottle and all the nuclear fusion combined in the world cannot put it back in there. Speaking from personal experience as a Developer/DBA, I know that developers hate roadblocks. Once they start on something they like to continue working. To get them away from what they are deeply involved in is like taking a pacifier from a baby. For the first time in history, they can get on their generally free and open source bikes and run without the hassle of calling the DBA’s to open the gates for them every 40 miles. NoSql pushed the Database inside the developers’ world and they love it! Is it good for the industry? Perhaps not, but it might just create millions of programming jobs. After all, somebody has to untangle the convoluted code (not to the fault of developers) left behind. Separation of Database and code, as painful as it might be for developers is a necessity. It establishes checks and balances. According to Google’s paper, they have taken those factors into account. Google F1 is a developer friendly Database. Hopefully the trend will continue.

From Google:

ABSTRACT
F1 is a distributed relational database system built at
Google to support the AdWords business. F1 is a hybrid
database that combines high availability, the scalability of
NoSQL systems like Bigtable, and the consistency and us-
ability of traditional SQL databases. F1 is built on Span-
ner, which provides synchronous cross-datacenter replica-
tion and strong consistency. Synchronous replication im-
plies higher commit latency, but we mitigate that latency
by using a hierarchical schema model with structured data
types and through smart application design. F1 also in-
cludes a fully functional distributed SQL query engine and
automatic change tracking and publishing.

Hadoop vs. NoSql vs. Sql vs. NewSql By Example

z-05

z-03

Fari Payandeh

Sept 8, 2013

Fari Payandeh

Although Mainframe Hierarchical Databases are very much alive today, the Relational Databases (RDBMS) (SQL) have dominated the Database market, and they have done a lot of good. The reason the money we deposit doesn’t go to someone else’s account, our airline reservation ensures that we have a seat on the plane, or we are not blamed for something we didn’t do, etc… RDBMS’ data integrity is due to its adherence to ACID (atomicity, consistency, isolation, and durability) principles. RDBMS technology dates back to the 70’s.

So what changed? Web technology started the revolution. Today, many people shop on Amazon. RDBMS was not designed to handle the number of transactions that take place on Amazon every second. The primary constraining factor was RDBMS’ schema.

NoSql Databases offered an alternative by eliminating schemas at the expense of relaxing ACID principles. Some NoSql vendors have made great strides towards resolving the issue; the solution is called eventual consistency. As for NewSql, why not create a new RDBMS minus RDBMS’ shortcomings utilizing modern programming languages and technology. That is how some of the NewSql vendors came to life.  Other NewSql companies created augmented solutions for MySql.

Hadoop is a different animal altogether. It’s a file system and not a database. Hadoop’s roots are in  internet search engines. Although Hadoop and associates (HBase, Mapreduce, Hive, Pig, Zookeeper) have turned it into a mighty database, Hadoop is an inexpensive, scalable,  distributed filesystem with fault tolerance. Hadoop’s specialty at this point in time is in batch processing, hence suitable for Data Analytics.

Now let’s start with our example: My imaginary video game company recently put our most popular game online after ten years of being in business, shipping our games to retailers around the globe. Our customer information is currently stored in a Sql Server Database  and we have been happy with it. However, since the players started playing the game online, the database is not able to keep up and the users are experiencing delays. As our user base grows rapidly, we spend money buying more and more Hardware/Software but to no avail. Losing customers is our primary concern. Where do we go from here?

We decide to run our online game application in NoSql and NewSql simultaneously by segmenting our online user base. Our objective is to find the optimal solution. The IT department selects NoSql Couchbase (document oriented like MongoDB) and NewSql VoltDB.

Couchbase is open source, has an integrated caching mechanism, and it can automatically spread data across multiple nodes. VoltDB is an ACID compliant RDBMS, fault tolerant, scales horizontally, and possesses a shared-nothing & in-memory architecture. At the end, both systems are able to deliver. I won’t go into the intricacies of each solution because this is an example and comparing these technologies in the real-world will require testing, benchmarking, and in-depth analyses.

Now that the online operations are running smoothly, we want to analyze our data to find out where we should expand our territory. Which are the most suitable countries for marketing our products?  In doing so, we need to merge the Sql Server customer Data Warehouse with the data from the online gaming database and run analytical reports. That’s where Hadoop comes in. We configure a Hadoop system and merge the data from the two data sources. Next, we use Hadoop’s  Mapreduce in conjunction with the open source R  programming language to generate the analytics reports.