A recent debugging session on a web application surfaced a recurrent issue in database design – that of assigning a wrong data type to a database field.
Storing numeric data in a numeric field, when is it not recommended?
Many programmers, when designing a database schema, default to using a numeric data type for a numeric field – like zip codes, telephone numbers, age, price etc. However, not every numeric value needs to be stored as a numeric type. Many times it is prudent to store numeric values as a ‘text’ type – ‘varchar’ to be specific.
Take the case of a zip code – it usually gets assigned a numeric type. However, if you check zip codes in the US you can see that many of them start with a zero. A few random examples are given below.
07097 JERSEY CITY 07403 BLOOMINGDALE 00610 ANASCO 01541 PRINCETON
I’ve a database of around 43,000 US zip codes, out of which 3,000 (~ 7%) zip codes start with a zero. Saving this zip codes in a numeric field will truncate the initial zeros, losing information in the process.
This actually happened with a recent client project. I was debugging a legacy web application which was used by ground service teams to assign service personnel for home TV installations. Upon arrival of a new request the technician closest to the client location was assigned the job. This required calculating distances using the client zip code and the technicians location. However the application was sometimes giving wrong distances, giving the location distance as thousands of miles when actually it was just 12 miles. Digging around it was noticed that the corresponding zip codes were truncated and hence the wrong distances.
The data loss probably occurred when importing a ‘zip codes’ CSV into the database. As the database ‘zip codes’ field was set to a numeric type, while importing the initial zeros got truncated.
So when should numeric data be stored in a numeric type field and when in a ‘text’ type. The rule of thumb is that if you are going to do calculations on the numeric data – addition, subtraction, averaging etc. than use a numeric field type to store the data. You do not, or rather cannot, make any meaningful calculations on a zip code or a telephone number, so it is wise to store it as a text ‘char’ type.
Contrast zip codes with a ‘age’ field. One can do various calculations on it – average, min, max, standard deviation etc. So one can, or should, use a numeric field type for ‘age’.