Tuesday, 26 March 2013

Relational databases: some simple rules (part 1)


It doesn't matter what we prefer to use, either Ms Access or SQL Server or Oracle, or even MySql, but when we want to create a new database, we really need to consider its structure, plan it and finally create all the database elements.
To do such a thing might seem easy – and in general it is – but a good development plan is what we need to avoid pitfalls and future issues with the structure of the database. The complexity of a database is surely one of the things to be considered, in fact it is clear that a more complex database will surely be more complex to develop.
Some simple rules can be taken into account and in this two parts article we are going to see some important points we need to keep in mind when planning a new database.

When building a new database, the core of it will be a set of tables. Tables are made of rows and columns: each row consists in a record, each column defines the information contained in a row.
For example a list of persons could have columns like: name, surname, address, post code, city, state etc.
In each row we will have the actual information like: John, Smith, 22 Acacia Avenue, 12345, London, United Kingdom. “John” is a value of the column “name” and stays in a field. All the information in a row consist in a record.
Think it as grid with column names: the actual records stay just below the column names.
Now let’s take a big step ahead.

Relational database
Allow me, please, to stress out a bit the relational database concept, because it is very important.
In order to simplify the concept we can make a simple example. Consider the persons table above. It’s a list of persons with the respective home address. We know that, for example, we are going to repeat the city or the state people live in for each record. Let’s say we have a list of 10000 names, then we are going to have 10000 cities and states. Just as an exercise, we can create that list and see how many times the very same city or state appear in the main table. That means, for example,  we can create a second table with just the list of all the cities and relate it to the first table. That will decrease the number of information in the first table... but how?
Ok, we need to understand that more deeply.
In the first example we need to create another column which will be always the first column we create in a new table: the ID column.
The ID column contains a unique identifier for the record: every single record will a unique number which will identify it without any doubt. IDs help us a lot when we need to query tables and if used with indexes can speed up queries and make them more efficient. But in our scenario and the purpose of this article, we need IDs for other reasons.
Our persons table will have 10000 records and columns like: ID, name, surname, address, post code, city, state.
As said, because we have 10000 records, we will surely repeat at least cities and states quite often, especially if our list is related to a specific and not too wide area like Europe. Consider that in the world there are – more or less – 196 countries and that there are 193 countries which are members of the United Nations. As you can imagine, in our persons table we are going to repeat the country a lot.
Another step: the size (in terms of disk usage) of our table is obviously related to the amount of data we have in it. The more record we have, the bigger the size will be. More information (in terms of characters in words) we store, bigger the table will be.
In large databases, table size is important for different reasons and not only in relation with disk usage: the bigger the table, the slower the query. IDs and indexes can help us, but relations can be just the perfect choice.
In fact, we can create a new states table with just two columns: ID and state. If we want to consider all the 193 states, we can put them all. In the persons table, the state column will not contain the state anymore, but the ID of the state table.
I don’t want to get into a long discussion, but if you stop for a moment and think of that new architecture, you will surely understand that it will save a lot of space and the size of our persons table will decrease by a significant amount in terms of size and disk usage.
AH! We have created our first relation between the persons table and the states table.
The above can be done in different situations, and tables can have multiple relations. In the above scenario, we could create a cities table (if it makes sense). However, if you imagine a complex database, you can definitely imagine many situations where a relation is needed.
That is the solid idea on which we must build our databases. That should always be our first rule, written in stone on our desk. Every time we create a new table we must think about how we can split it, how we can create relations and how we can save CPU and disk usage with relations.

Next time the second part of the article.

0 thoughts:

Post a Comment

Comments are moderated. I apologize if I don't publish comments immediately.

However, I do answer to all the comments.