Relational Databases and Information Systems
What are databases good for? Most people will say they store data. Not true.
Databases are terrible at storing data. If you just want to store data, use files. Compress the files and make redundant backups. A database will easily take up 10x the amount of storage.
Databases are good for answering questions. In order to do this, they use extra RAM and drive space to improve speed. Between speed and space, speed always wins until you run out of space.
In this tutorial, I'll cover basic database concepts so you can understand the databases you use and hopefully communicate problems and feature requests more effectively. Let's get started.
Five Types of Data
The smallest part of a database is called a field. Data in a field is one of five basic types.
- String types are implemented as CHAR, VARCHAR, VARCHAR2. The difference between these types is not important unless you're a DBA. Strings are all used for storing anything you can type on a keyboard. Depending on the software you're using, a string can be up to about 1000 characters. Strings are the most common and important type of data, and you could use them for everything but sometimes there's a better suited, more specialized type available.
- Integers, also called INT are used for counting, expressing which or how many. They are also often used as indexes, which are explained later.
- Floating point numbers, also called NUMBER are used for measurements, expressing how much, or for general math.
- Dates, represented with the DATE or DATETIME types are used for calendar or schedule information. Birthdays, weddings, class schedules, work schedules, appointments, and reservations all use DATEs to indicate a point in time when something happened or is supposed to happen.
- Binary Large OBjects, also known as BLOBs in database lingo are used for storing anything that's too large for a string, or is in a special format not compatible with text. BLOBs can contain compressed data, pictures, sounds, or movies and can store hundreds of MB.
Three Levels of Organization
Relational databases are made up of tables, which are made up of records, which are made up of fields.
- Fields are what we just covered with the five types of data. Each field has a name and a type.
- Records exist in tables. A table may have 0 or more records, but no two records can be exactly the same.
- Tables each have a name and a list of fields. Each record in the table will be a set of these fields.
NULL and UNIQUE
By default, any field can be NULL which means the data is missing. If the data in the field is required, you can mark it as NOT NULL. If a field is NOT NULL, then you may also mark it as UNIQUE which means that every record in that table must have a different value for that field.
UNIQUE is highly prized in large databases because it means you can guarantee that you'll get back either one or zero answers, never two or more. In a table with millions of records, lacking this guarantee can easily result in pages and pages of data when you're searching for just one field.
Structured Query Language (SQL) is the means for asking questions to your database. The most important SQL is a SELECT statement which as the name implies allows you to choose a subset of your data and include it in a report.
A SELECT statement has three parts. The first part is the keyword SELECT followed by a list of fields. These fields are what you want your answer to look like. The second part is the keyword FROM followed by a list of tables. These tables tell SQL where to look for data, among the potentially hundreds of tables in your database. The third and optional part is the keyword WHERE which you can think of as meaning "if". If the WHERE clause is missing, then all possible answers are returned. If present, the WHERE clause limits the answers to those that evaluate as true.
Add to this a few wrinkles for convenience. If the list of fields after SELECT is replaced by * then all applicable fields will be part of your answer. If the list of fields is replaced by COUNT(*) then your answer will be an integer count of how many answers passed the logical test in the WHERE clause.
SELECT COUNT(*) FROM LIBRARY_BOOKS;
This example will tell you how many records are in the LIBRARY_BOOKS table. Once you understand SELECT, UPDATE works just the same but chooses which records to modify.
A join is simply when you have a where clause that tests for equality of two fields. Usually these are key fields in different tables, but they may also be the same field in the same table. The graphic above shows different kinds of joins based on whether the test found a match or not. This is a heavy subject and you'll study it more if you write a lot of complex SELECT statements.
Keys and Indexes
A key for a table is something you search for in order to find the records you want. A key is often one field, but may be a set of several fields together.
To make this searching fast, add an index to the table. Indexes make searching faster, but adding or changing the table's data is slower because you have to update the table and also update the index. Usually this comes out in favor of having the index, but for efficiency an index should have as few a number and as small (in bytes) fields as possible. INT indexes are the best.
A primary key is a special kind of key where each key value has at most one record in the table. This is a good thing for efficiency because once the database finds a record that matches a primary key, it can stop searching.
- One-to-one (data model) - Wikipedia, the free encyclopedia
A one-to-one relationship is the simplest kind, and means that for each side at most one match exists on the other side. One to one relationships include a book with its cover, left shoe to its right, and a spider in her web.
- One-to-many (data model) - Wikipedia, the free encyclopedia
One-to-Many relationships exist when one side can have many of the other, such as a wallet containing many credit cards, a classroom full of students, and keys on a keyboard.
- Many-to-many (data model) - Wikipedia, the free encyclopedia
Many-to-Many relationships exist when One-to-Many goes in both directions. Students signed up for classes, pictures on a webpage, and friends having friends are all good examples.
Database normalization is very dry, and you don't need to worry about it that much unless you're a DBA. Even most DBAs for most databases don't need to understand much more normalization than the following.
There are different levels of normalization, but the gist of it is this. You only want to store each needed bit of data in your database once, not zero times, and not twice.
Extending the LIBRARY_BOOKS table to illustrate, suppose every book in the library has a binding material that's either leather or paper, and a binding color that is always black for leather and brown for paper. If there are lots of books of each type, material and color are redundant fields and only one of them should be stored for each book. Putting the associations of black leather and brown paper into a separate table prevents you from storing this data more than once.
What happens if you store it more than once? Obviously you waste space, but that's not the most important thing. You might also introduce errors or omissions into your data. What if for some reason the library lost all leather bound books? Then the association of black leather would be nowhere in the database.
This is a trivial example for simplicity, but imagine a database stores a person's name or phone number multiple times. One location might be updated and another left with wrong information.
To solve this problem generally, normalization asks the question: Are there any fields in this table that are completely determined by another set of non-primary-key fields? (Is the book binding color completely determined by the binding material?) If there are, then you might want to normalize those fields out into a separate table.
You made it to the end. Are you a DBA now?
Yes you are! But only if you were a DBA when you started reading. Setting up a small relational database in MySQL or SQLite is very easy, and can be done without even the minimal grounding from this introductory article. Knowing what you do now might just make you less dangerous.
The path from here may lead you to a formal database course, or textbook, or video lecture series, or at the very least a better appreciation of the data in the databases you use and the work done by your friendly neighborhood DBA. Safe travels!