What is a Database, by the way?

This article may contain affiliate links. If you buy some products using those links, I may receive monetary benefits. See affiliate disclosure here

During the initial days of my journey in web development, I had got a mentor who taught me how to create dynamic websites with PHP and MySQL. I initially went there asking how to setup a PHP site on an IIS server, because like everyone else, I was using Windows at that time.

After showing me how to setup PHP with IIS, he also introduced me to XAMPP, which I was hearing for the first time. Then he asked something like "now we need a database, right?", and that's when I first came across that term "database".

During the school days, I had heard about MS Access, databases, DBMS, etc, but those things mostly flied above my head.

Now, things like "database", "localhost", etc may seem second nature. But that was not the case initially. These terms didn't make much sense. I somewhat knew what "data" is, but database was something new.

Table of Contents

Why not just use files to store data

We store data on a computer, and that's what files are for. We can also organise files into folders. Files come in various forms - text files, binary files, image files, video files, the list goes on. So what the heck is a database? That was my thought.

If you are also a beginner, you might also have the same thought. And I will try to demystify it in layman terms.

It's is indeed okay to store data manually in text files (like .txt, .xml, .json etc) as long as your application is small enough. For instance, a blog with a few tens or even hundreds of blog posts can store all post content in markdown files, and then read them from the application whenever needed.

However, as your data grows in complexity and size, this becomes a difficult task.

For instance, imagine an eCommerce app with 1000+ products, 10k+ customers, and 100k+ orders so far. That's a moderately large application. If you were to store all these products, customers, orders, and other associated data directly in files, it will be cumbersome.

The size of data will be at least a few gigabytes, or even terabytes. So you can never store them in a single file for performance reasons. Therefore you need to split them into multiple files, which arises many questions:

are you going to have one file per product? one file per order? and so on..?
if yes, how would you name the files?
how would your application know which file to read to display a particular product or customer detail?
what will be the format of the data in each file? plaintext? json? csv? xml?

You have to design the file structure yourself, which may not be an easy task. Even if you could, you have to figure out optimisation techniques to improve performance as the data grows.

So, designing a custom database is not a walk in the park. That's where a database management system (DBMS) proves useful.

What does a database management system offer

First, let me tell you the difference between a database and a database management system. A "database" just refers to any organised collection of data, like the files and folders I mentioned above. So even if you are storing some simple files organised into folders in you PC, that's technically a 'database' - a rudimentary database.

Whereas database management system (DBMS) is a software that allows you to efficiently manage a database.

MySQL, PostreSQL, MariaDB, Cassandra, MongoDB, MS SQL, etc all are examples of database management systems.

A DBMS is not some magic way to store data without files and folders. Behind the scenes, a database is also storing all the data in files and folders.

However, the DBMS software has abstracted away all the implementation details so that you don't need to worry about it.

Not just that, these files may not always be like the simple files we manually create. Instead, DBMSs use custom binary files and indexes optimised for performance to handle large data.

In its essence, a database is a software that gives you a data query language (DQL) and a data manipulation language (DML) with which you can manage all the data, how huge it be. The actual file creation, deletion, and modification happen behind the scenes. You don't need to touch it directly.

So, when you install a database software like MySQL or MariaDB on your system, you are actually installing a program that gives you SQL as the language to interact with the data.

Also, since you are not directly touching the data files on the file system, there is less room for human errors or corrupted data files. Hence there is better data integrity.

Different types of databases

As I said above, a database management system gives you a query language to query the database. It can be SQL, or something else.

In order to query the data, you need to know how the data looks like. Right? Otherwise how can you know what to query?

For instance, I have a library database that stores the list of books, members, and borrowings. And now I want to fetch the details of books from the genre "travelogue".

SQL Databases

If it is an SQL database like MariaDB or MS SQL, it gives you a mental model to imagine the database as tables, each table having its own rows and columns.

Then I know that there is a table named "books" in my database, each book being a row in that table. So, the query looks somewhat like this, which can be run from the command line or from an application:

SELECT id, title, genre FROM books WHERE genre = 'travelogue';

And the database responds back with an array, which can again be presented like a row from a table. If there is only one product, the result will be a one-dimensional array, if there are multiple products, the result will be two-dimensional array, or array of arrays, like in the following result:

+----+---------------------------------+------------+  
| id | title                           | genre      |  
+----+---------------------------------+------------+  
|  7 | On Ancient Central Asian Tracks | Travelogue |  
| 11 | Balidweep                       | Travelogue |  
| 16 | Kappirikalude Nattil            | Travelogue |  
+----+---------------------------------+------------+

Behind the scenes, the database's file structure implements a kind of tabular form and hierarchies, which can be abstracted to a table when presented to the user. There are no HTML-like tables though.

SQL databases are also inherently relational databases, as they implement something called relational model. The data in one table can be connected to another using foreign keys. However, not all SQL databases are relational, and vice-versa.

The columns in a table are predefined while creating the table. That is, all rows have the same set of columns. Or in other words, SQL databases follow a strict schema.

NoSQL Databases

On the other hand, in the case of NoSQL databases like MongoDB, the mental model is different.

There are no tables, rows, and columns. Instead, each record is called a Document, and multiple Documents constitute a Collection, and there can be multiple Collections in a database.

Each document looks like a JSON object. Unlike the fixed number of columns in SQL database, the no. of keys in a NoSQL document can vary. So it is more flexible. With that being said, there are a also NoSQL databases that implement strict schema.

For instance, if the above data was in a MongoDB database, a book record will look like:

{
    id: 7,
    title: "On Ancient Central Asian Tracks",
    genre: "Travelogue"
}

Instead of SQL, NoSQL gives a query syntax. For MongoDB, the query syntax looks like JavaScript. So, if I want to fetch the same details from MongoDB, the query looks like:

db.books.find({ genre: "Travelogue" });

Final Thoughts

Although most database systems can be classified into SQL and NoSQL, based on whether they offer SQL as the primary query language, this division is not rigid.

For instance, there are various types of NoSQL databases:

Redis: a type of key-value store database, used widely for caching
CockroachDB, which is often treated as a NewSQL database, which offers both the scalability of NoSQL and consistency of SQL
Graph databases like Neo4j

Even within the SQL databases, the implementation slightly differs. The SQL standard is maintained by ISO and IEC. And so far, many versions have been released, like SQL:2008, SQL:2011, SQL:2016, etc.

But vendors may extend them. For instance, PostreSQL has FETCH, while in MySQL it is LIMIT - to query a certain number of results.