November 17, 2011

My Quest to Define “NoSQL”

“NoSQL”. For the last couple of years it’s been an increasingly hot buzzword in Cloud application discussions. This new trend in databases brought with it unprecedented data processing speeds and flexibility to traffic-intensive websites and services.

But what are the specifics? What is NoSQL, really? The term as it is most commonly used has only been in the common lexicon since 2009, after all. I knew it was a label for technologies offering an alternative to the relational database approach that has dominated for decades, the latter championed by well-known systems like Oracle, SQL Server, and MySQL. I knew that NoSQL encompasses exciting new database technologies like MongoDB, Hbase, and Cassandra. I also knew that some of the concepts and technologies under the NoSQL umbrella have been around for ages, niche approaches whose time in the mainstream simply hadn’t come. Beyond that, it was a nebulous term existing in an industry overflowing with products, technologies, and concepts.

However, not being able to concisely define an oft-used term is like having a nagging splinter in my head.

Thus my quest for an end-all definition began. How hard could it be? Light digging betrayed a common tendency to view NoSQL as a saint-like savior from the “oppressive nature” of SQL and relational databases. Freedom from third normal form! Freedom from the tyrannical chains of database schemas! Freedom from associative entities, unintended Cartesian products, foreign key constraints, gox box socks and yill-iga-yaks and everything else “the man” imposed on us all! A veritable revolution!

But there had to be more to it. Surely NoSQL cannot be defined simply in terms of what it’s not? Could a restaurant’s entire menu be defined as “steak and not-steak”? How exactly does one order a “not-steak”? What am I truly implementing with a NoSQL database? What is the concrete, set-in-stone definition of NoSQL?

After much contextual hand wringing, the answer is…still surprisingly vague. Wikipedia, the world’s leading bastion of irrefutable wisdom (at least considered as such for as long as it is convenient for the writing of this post), currently defines NoSQL as:

“…a broad class of database management systems that differ from the classic model of the relational database management system (RDBMS) in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally.”

In other words, NoSQL refers to a slew of disparate DBMS types that are somehow different from the classic aforementioned mainstream relational varieties we all know.  Beyond that, it’s a swamp marsh tangle of “some” and “may” and “usually.” As if these waters weren’t muddied enough, there are those who insist that NoSQL databases cannot even be considered to be truly non-relational.

Oh, and there’s more. Let’s take this to Hudson River levels of murk: consider that NoSQL doesn’t even truly mean “no SQL”…since NoSQL has no unifying language standards, SQL syntax could be a perfectly valid interface to a NoSQL structure. And in fact, a language called “UnQL” is being developed as a standard query language for NoSQL environments. Ironically, and perhaps to the chagrin of the anti-SQL crowd, UnQL’s syntax is largely based upon…

…wait for it…

…SQL. It should then come as no surprise that “NoSQL” is generally now considered to stand for “Not Only SQL”, as opposed to the more intuitive (and perhaps more popular amongst more militant circles) interpretation.

This was not working. My simple search was clouded and so polluted by now that anyone smoking near this toxic confluence of obfuscation would be in danger of triggering the world’s largest three-eyed fish fry.

By this point it’s pretty clear that the name “NoSQL”, while catchy and providing quite the hook with its controversial implications, is simply not very accurate or meaningful. It’s not descriptive of any particular database technology or concept. Rather, the name is representative of a group of database technologies and concepts truly bound in sharing only that which they are not. The “not-steaks”. In other words, my quest ended up right where it began.

So for the sake of my own sanity, with the highest certainty of disagreement from some contingent or another, I am defining NoSQL as any database system that by design does not encourage the traditional concepts of Edgar F. Codd’s relational model or ACID properties. Or, in more laymen’s terms, any database system that lets one layout and extend data with high flexibility and without being overtly concerned with the data integrity and consistency considerations of classic relational database design.

I’ll just stick with this. Vague? Sure. Debatable? Absolutely. At least it’s concise. It’s also for the most part the same ambiguous definition that is typically offered up for the term. I must concede defeat to the extent of being unable to put a smaller box around the concept of NoSQL. Now if I can just embrace the vagary, perhaps that splinter in my head will go away.