Relational Database Management Systems (RDBMS) have been the default in database management for the past three decades at least.
If you work in data, you have probably grown up with programmes like Oracle (the first commercially available RDBMS), Microsoft Access and IDM’s DB2. The foundation for all the skills you have acquired in your role is most likely to be Structured Query Language (SQL), by far the most common domain-specific programming language used to run relational databases.
But times are changing in the world of database management. As data volumes continue to increase at a remarkable rate, limitations in the original relational approach have been exposed. In particular, in highly fluid modern web and cloud-based environments where many different types of data are being aggregated at speed, RDBMS lacks the scalability and efficiency to adapt to rapidly changing requirements.
This has given rise to innovative new database management approaches – first of all the non-relational NoSQL, and then so-called NewSQL, which might be understood as a hybrid adaption of relational programming which incorporates some of the benefits of NoSQL.
But what exactly are the skills implications of this shift, both on business and on professionals working in the field?
Knowing the right NoSQL
The first thing to say is that NoSQL is not a single, unified approach – you can’t go out and ‘learn NoSQL’ the way you can SQL, for example. Rather, NoSQL describes a family of databases and database management systems which divert away from the main relational approach. There are four main types of NoSQL database:
- Key value databases
- Column family stores
- Graph Databases
- Document- Oriented Databases
All of these types have their own particular strengths when compared to RDBMS. Document databases, for example, are very well suited to situations when the type and format of the data being processed has to be changed regularly and rapidly. RDBMS relies on a very rigid structure, known as a schema, which defines the type and structure of the data in every field, what every column and row can record and even how many columns and rows there can be. This allows relational databases to build complex webs of relationships between different records, but still allow information to be found quickly and efficiently. SQL queries are famously user-friendly and efficient, and rely completely on the uniformity of the schema.
However, with a RDBMS, if you want to change the format or the type of data held in one table or field, you have to also change every record that shares that schema. When dealing with very big and very complex data sets involving large numbers of records, you might end up having to make changes and run checks on hundreds or even thousands of fields. Not only is this labour intensive, the risk of errors and inconsistencies is high.
Document-oriented databases overcome this problem by avoiding the use of schema. In this case, the term ‘“document” refers to the fact that each data record is intrinsically complete and self-describing – in other words, it doesn’t need an external schema to define what it contains and/or what it does. This independence makes it easy to work with lots of different types of data at scale. Document-oriented databases are also a natural fit for languages like HTML, XML and JSON, and are well suited to applications running in web environments.
Document-oriented databases do not resolve all the drawbacks you might come across with a RDBMS, however. In other circumstances, you would be much better off choosing another NoSQL approach, or perhaps opting for a combination of NoSQL and SQL (hence the evolution of NewSQL).
In terms of how this changes the skills requirements of data management professionals, it seems certain that for the foreseeable future at least, their work will remain grounded in SQL/RDBMS. These disciplines remain the bedrock data management even in the Big Data era, in no small part because so much of our data assets remain stored in relational databases.
But for reasons of speed, agility and scalability, non-relational approaches add an important extra dimension to modern database management. Going forward, professionals will need both a sound working knowledge of the different database management systems available, their relative strengths and weaknesses, and be able to make judgement calls based on an understanding of broader operational priorities in order to pick the right tools for their needs.