Cherry picking SQL query fields to increase performance


The first SQL query most people learn is the SELECT statement with a query some thing like the following.

SELECT * FROM users WHERE id > 100

Nothing wrong with the syntax itself; but more often than not, what the user really wants is just a few columns from the table as shown in the PHP code below. Here we are only using three columns from the table. The usual practice of developers is to use the * modifier to get all the columns from the table as it is easier then specifying individual column names. It is quicker to select all columns from a table and worry later which to actually use in the code.

$sql = "SELECT * FROM users WHERE id > 100";
$result = mysql_query($sql);
 
if (!$result) {
    echo "Could not successfully run query ($sql) from DB: " . mysql_error();
    exit;
}
 
while ($row = mysql_fetch_assoc($result)) {
    echo $row["id"];
    echo $row["first_name"];
    echo $row["last_name"];
}

The problem with the above scenario is that MySQL has to move all that data over the network for each query, and the more redundant data there is the more bandwidth is wasted, impacting the performance of the website. If the number of columns in the table are small with a few thousand records, then the performance hit is barely noticeable. But increase the table columns and also the total records in the table and the delay gets noticeable.

The best way to mitigate this problem is to select the individual columns. This way we are not unnecessarily transferring data we do not require. For the above PHP code, the query would be as below.

SELECT id, first_name, last_name FROM users WHERE id > 100

In an example test on my local machine, using HeidiSQL as the SQL tool, I queried a 250 column table with around 33,000 records. Using SELECT * , HeidiSQL reported the following status.

/* 0 rows affected, 33,378 rows found. (+ 2.625 sec. network) */

While with the selected fields ( 3 in total ) it returned the following status.

/* 0 rows affected, 33,378 rows found. (+ 0.312 sec. network) */

As you may notice the network time is considerably larger for queries with the * modifier. The actual values do not matter but the relative difference between the timings. If you are running these kind of queries on your web server with high traffic, this can dramatically decrease the performance of your web site.

These type of problems can specially affect WordPress sites which use dozens of plugins. Any one of them using such sub-optimal queries can be a bottleneck of your complete site.

This site is a digital habitat of Sameer Borate, a freelance web developer working in PHP, MySQL and WordPress. I also provide web scraping services, website design and development and integration of various Open Source API's. Contact me at metapix[at]gmail.com for any new project requirements and price quotes.

4 Responses

1

علیرضا معظمی

January 29th, 2013 at 11:47 pm

the note is simple but many programmers dont pay attention.

when a query uses * after SELECT, then it needed to first fetch all column name from INFORMATION_SCHEMA table.

so if a programmer wants to get ALL fields in result, even using name of all fileds has higher performance from using * after SELECT.

2

vivek kumar

February 1st, 2013 at 11:33 pm

Its true……your post is relly helpful to programmers……….

3

dai

February 14th, 2013 at 10:50 pm

The lazy mans selection! It’ll bite the programmer in the butt in if you’re expecting the traffic to roll in

4

Tyler

February 19th, 2013 at 9:12 am

Great point. Thanks for including the network times. I am going to start doing this instead of *!

Your thoughts