SAPB1 Guy / Friday, May 22, 2015 / Categories: SAP Business One NoSQL, OldSQL, NewSQL, In-Memory & SAP HANA article by Chris Hallenbeck We in the database industry have spent a lot of effort trying to differentiate our systems from each other by creating new classifications of database management systems. Until a few years ago, life was easy – we had relational databases and multi-dimensional databases. That was it and these two types of database management systems have stood us well for going on 30 years. There is little overlap in their capabilities, and the pros and cons are well understood. Some have called these OldSQL, but that seems mean spirited given that these databases have basically run the world for several decades. Let’s call them legacy databases. But now we have NoSQL, Hadoop, In-Memory, and NewSQL. Developers, enterprise architects, and CTOs are faced with making sense of this new playing field, discerning marketing from meat, and figuring out where to invest. The best place to begin this discussion is with NoSQL. This moniker was brilliant from a marketing perspective as it brought attention to the very real work being done to reinvigorate the database world. Taxonomically, however, there are no defined set of attributes for NoSQL. Most have given up ACID compliance, but not all. Most are optimized for a single data type, but not all. Most have open source versions, but not all. Most make life easier for the application developer, but not all. What really differentiates these is that they aren’t from the old school…and this is great. Developers, database modelers, and DBAs have faced massive problems that weren’t handled well in legacy databases. These technicians had to fit square pegs into round holes to solve their problems. This trade-off was OK historically as data size was relatively small, data types few, and so having only one or two database systems to learn, manage, and license was a fair trade-off. The problem started with the rise of the web. The web gave birth to massive applications, whose demands were well beyond the capabilities of even the fastest most sophisticated relational databases. A modern web application needs to scale near instantaneously, handle any transaction load, give instantaneous page to page response for desktop and mobile users alike, handle text as easily as numbers, provide analytics in-line with transactional data, and of course be completely dynamic and personalized for the user. To keep this simple, the square pegs being hammered by developers into legacy databases typically fit into a few categories: scaling for transactions, data type support, text analysis, speed, elasticity, and latency. The only way to solve these has been best-of-breed technology, optimized to solve one piece of the puzzle. This has been great as it has allowed the modern web and mobile applications to thrive, but they require huge administrative work for the database administrators and system administrators to know several different systems, and they require a huge amount of code at the application layer to pull all these best of breed NoSQL and legacy databases together to make one complete application. This is fine if you are single application web start-up; but, this won’t fly for enterprises that are struggling to both deliver hundreds of these applications to their internal customers demanding consumer grade experiences and external applications to their customers, partners, suppliers, and prospects. The next group to want to part from the legacy crowd are the NewSQL folks. They are all about high speed transactions and to a lesser extent real-time analytics on that database. What distinguishes them from their high speed transaction brethren in the NoSQL space is that these databases are ACID compliant and so consistency is handled by the database instead of being forced on the application developer. The trade-off here is that the databases need to be very small because their distributed nature, a necessity of their high speed scaling, starts to fall afoul of Brewer’s Theorem(i.e. the CAP theorem). Scaling limitations aside, NewSQL databases fill a highly useful niche. So where does Hadoop fit into all of this? Well it depends. Hadoop is just a framework and not a product and thus it can be many different things depending on what pieces of the framework you chose to leverage and what external components you chose to plug in. It can be a high speed NewSQL transactional system with Cassandra, a read-only warehouse with Hive, a computational system with Map Reduce, a sparse matrix (i.e. columnar family) database with HBase, or an in-memory SQL query engine with Spark. All these products are referred to as Projects. There are tons of other ones, and they work with each other to varying levels. All are developed by separate groups, which can be thought of as separate companies. This is what is so great and so dangerous about Hadoop. It is a data processing framework on which you can really do anything, except ACID transactions yet, and on which constant innovation is happening. That said, it requires combining several Projects developed by different companies, managing each project with separate tools, and only some of which have enterprise support models. So what about in-memory? In-memory is orthogonal to the classification of databases. Any one of the databases that fit into any of the database classes could be in-memory. So what is it? A purist view is that it requires (re)design of the software to run 100% based on an in-memory first approach and in which disks are only seen as a place to initially load data from and to store new data to and so provide durability in the case of hardware failure or power loss (i.e. the D in ACID). It also means designing the system to use the latest is massive parallel processing and movement of data in the memory caches of modern CPUs. Said another way, it is the commercialization of supercomputing techniques. To be clear, my first reaction to in-memory was “What’s the big deal – all software runs in-memory and all databases already have an in-memory cache?” I then found out that it was really different. Unfortunately the marketers have seen how easy it is to coopt the term in-memory for legacy systems and thus make them look at least a little shinier. To be clear “true” in-memory is ground breaking in ways that are difficult to comprehend if you don’t see it in use at a customer. What if a database didn’t need batch? You could remove all but the tiniest latency from your enterprise architectural constraints. What if it didn’t need indexes? You wouldn’t need to tune the database constantly and trade-off write speed for faster reads. What if it didn’t need denormalization or normalization or aggregates for performance tuning? You could have developers focus on truly understanding business objects rather than the quirks of a given database. What if data didn’t need to be ETL’d? You could build virtual data marts in minutes to hours with no data copies, no server provisioning, etc. What if streaming data could be analyzed in-real-time along with decades of history? You could perform correlation analysis on real-time events. What if data of any data type could be mashed up in the database and not in the application tier? Application developers wouldn’t be faced trying to join text, graph, and tabular data in application code. What if the database could do analytics at the same time as transactions? Users wouldn’t be alt-tab’ing between screens wondering what idiot designed software than made them go to different systems to complete one business process. What if the database could do advanced predictive analysis? Data scientist could perform their analysis right on the source transaction system to better understand consumers, markets, and users and deploy their algorithms right on the transaction system to provide real-time insights for the application users. In a larger sense, true in-memory would allow: The largest companies in the world could close their books three weeks earlier. The largest retailers in the world would have real-time point of sale data marts supporting thousands of concurrent users. The largest manufactures in the world would be reducing their Work in Process by 60% and reducing their spare parts inventories by 40%. The largest purchasers in the world would be saving hundreds of millions of dollars just in their Goods Receipt and Invoice Reconciliation processes. The major cities in the world would be reducing crime and improving civic services. The 1200+ start-ups working with SAP would be getting their products to market faster and have agile platforms to enable necessary pivots. The Oil & Gas and power distribution companies around the world would make their pipelines safer by correlating their telemetry data with historical failures to identify statistically weak points in their systems and fix them before they break The medical research institutes around the world would be improving patient outcomes and speeding research by analyzing genome, lifestyle, and clinical data together The real question would then be “is a true in-memory database really a database at all or have we developed something different?” Not a new classification of database, or a better old/no/NewSQL database, but a new type of technology that allows different architectures – simplier architectures. Not an optimization or a characterization of legacy databases, but something seminal. Let’s call it a data platform that has database services – a platform that has square slots for all the square pegs, but does it in a way that is simpler. Let’s call it HANA. Previous Article Keeping Pace: Managing at the speed of real time Next Article 7 Reasons your ERP needs replacing Print 2702 Rate this article: 5.0 Tags: SAP Business OneSAPB1SAP B1SAP HANAAdvanceOne Please login or register to post comments.