An Introduction to Database Variable Types

Why do we need to worry about the data types we’re storing in a database? Well mainly because it helps you with database design. Did you know that the database itself can help to keep bad data out (for example you can’t put ‘text’ into a number only field) and can speed things up especially searching and sorting?

The easiest way to look at different data types is to look at some examples.

For a product catalogue, you’ll obviously need to keep a list of products (product categories, discounts, sales and so on are left out to keep this nice and simple).

A products table will have:

product id (our primary key)
product name
product description
product price

Product id is our “made up primary key” (see the previous tutorial for further details) since there’s nothing else we can use to uniquely identify a database row.

It’s an integer (whole number) so the database can automatically increment the value when we add a new product to our list.

Product Name will contain only a name so it will be relatively short.
The name is a variable length (we don’t know at this stage what sort of name you’ll give a product or how long it will be). We can use a ‘varchar’ field. Varchar means “variable size character” and it can contain any data (‘char’acters – numbers, spaces, letters, punctuation etc).

With a ‘varchar’ field you have to tell the database the maximum size it will store. In this case we’re going to limit it to 200 characters, so it becomes:

varchar(200)

Varchar fields will hold up to 255 characters, if you want to store more than that you have to use a different field type, such as blob or text.

The product description might contain a lot of data (eg size of the box it comes in, bonus features for a dvd, minimum running specifications for a PC game etc etc).

It probably won’t be less than 255 characters so we can’t use a varchar field.

We can use a ‘text’ field however, which will hold up to 3-4 pages of text in mysql (in postgresql the ‘text’ field type doesn’t have a size restriction). In mysql, if you need to store more than 4 pages, use the ‘longtext’ field type which will hold up to 32 pages in a single field.

A product’s price will be a number. However it might not be a whole number, so it can’t be an integer. Both database systems (MySQL and PostgreSQL) support the ‘float’ data type but I recommend using something else. The ‘float’ type is not useful for data where significant numbers are essential. The float type could round numbers up or down to fit into it’s setup, so read documentation carefully before using this particular type. Precision numbers (such as a product’s price) will either need to use the ‘decimal’ type in mysql or ‘numeric’ type in postgresql, as these types do not round any numbers and will store the values exactly as you enter them.

Our product table looks something like this:

product id            integer (primary key)
product name            varchar(200)
product description        text (up to 4 pages)
product price            decimal/numeric

An orders table looks like this:

order id            integer (primary key)
order first name        varchar(200)
order last name            varchar(200)
delivery instructions        text (in case there are extra delivery instructions)
order delivered            char(1)
order paid            datetime

We’ve added two new data types here.

A ‘char’ field is exactly the same as a ‘varchar’ field – it can hold any type of character.

The difference is that a ‘char’ field is *always* the size you specify. If you don’t provide enough data, then the database will pad it out. This can become a problem when searching and matching.

If you used a ‘char’ field for a phone number you might enter:

’555-555-555′ (length 11)

The database table was created with a ‘char’ field of 12, so when it’s stored in the database it becomes:

’555-555-555 ‘ (with an extra space on the end).

A varchar field will not pad out the data like this, it will leave it exactly as you enter it.

Use ‘char’ fields with caution. Unless you know that the data entered is always going to be the same length (eg an isbn or social security number) then try to use ‘varchar’ field instead.

In this particular case, we’re using the ‘order delivered’ field to see whether the order has been successfully delivered or not (’1′ means delivered, ’0′ is not delivered), so this isn’t an issue.

The datetime field will differ between databases. In MySQL, this stores a date and a time, in PostgreSQL there is no such ‘datetime’ field – there are other field types to store this information (I use an ‘int’ to store the php ‘time’ variable and then convert it back in the application using the php ‘date’ function).

Each database system will have slightly different options available.

MySQL lets you specify whether a number (integer or float) column will be ‘signed’ or ‘unsigned’ – a signed column will let you store negative and positive numbers, an unsigned column will only let you store positive numbers. PostgreSQL doesn’t let you make this distinction.

Some of the data types may have limitations built into them. A ‘small integer’ (smallint) column will only store values between -32,768 and +32,767 (the ‘integer’ data type goes from -2,147,483,648 to 2,147,483,647), so be aware of any data type limitations before you start.

MySQL has a special field type called ‘enum’ (or enumeration) which only allows certain data to be entered into a field, for example 0/false/no, or 1/true/yes (we could have used this for the ‘order delivered’ field). Most database systems will require you to set up a ‘constraint’ to emulate this particular behaviour, but that’s getting a bit beyond this article’s scope.

PostgreSQL has a built in data type for isbn’s, IP addresses, all sorts of geometric data and much more. It also allows you to create your own data types if necessary.

If you need to work with multiple databases, check each databases documentation before you start to make sure each one can handle the different data types. Some types such as ‘enum’ might appear, or you may need to work around differences such as ‘datetime’ showing differently depending on which system you are using.

The default types (varchar, int, smallint, bigint, text, char) will not significantly change between databases.

A lot of thought and research needs to go into designing a database, and we still haven’t created a database or any tables yet! Choosing the wrong datatype can make your database slow (and speed problems can be hard to track down!), so it’s best to do a little research, write down some ideas and go from there.

Until next time!

This entry was posted in Programming. Bookmark the permalink.

Leave a Reply