Sequence gaps in MySQL


Auto-increment sequences are a common way to define a primary key in MySQL. This types of artificial keys, known as surrogate keys are commonly used by developers to quickly construct a database table. A common reason developers use artificial keys is due to the fact that most do not take the time to search for a natural key in their model, especially SQL newbies. The question of whether we should use a surrogate key or a natural key is a debate we should leave to the database experts. The purpose of this post is to present some queries to find if a auto-increment sequence contains gaps.

Gaps in auto-increment columns

In a table with a auto-increment column gaps gradually develop over time, due to repeated deletion and insertion of rows. An example of such a table is shown below. Note the sequence numbers of the ‘id’ column which is missing a few numbers. Not that the gaps are harmful in any way (at-least for small schemas), just that a cursory glance at a solution is satisfying and sometimes useful.

Table 1.

+----+-------+------------+
| id | name  | birthdate  |
+----+-------+------------+
|  1 | sam   | 1986-09-10 |
|  3 | grey  | 1979-01-19 |
|  5 | jenny | 1979-12-12 |
|  6 | bill  | 1991-04-27 |
|  7 | tom   | 1995-09-15 |
| 10 | jack  | 1964-02-09 |
+----+-------+------------+

To find out if a table with a defined auto-increment column, like the above, has gaps in it, we can use the following query. The query returns a single row with the value ‘gap’ if there are any gaps in the sequence, or a empty result for a non-gaped sequence.

SELECT DISTINCT 'gap' FROM `testtable` 
   HAVING COUNT(*) <> MAX(id)

This will return the following result for Table 1.

+-----+
| gap |
+-----+
| gap |
+-----+

Note that the above will only work if the sequence starts from the number ’1′. If you have a auto-increment column that starts with some other number, say 1000, than the following query can be used.

SELECT DISTINCT 'gap' FROM `testtable` 
    HAVING COUNT(*) <> MAX(id) - 999 #Start sequence number - 1

How the query works

The query counts the total number of rows in ‘testtable’ and if the total count is different from the maximum ‘id’ (auto-increment column) value than it returns a column with the value ‘gap’. Say the table has 20 rows and the auto-increment column number starts with ‘1’. So if there are no gaps in the sequence than the MAX(id) should be ‘20’, which is equal to COUNT(*). Now suppose the 20th row is deleted and a new row inserted, then the ‘id’ value of the new row will be ‘21’ which will imply that there is a gap in the sequence as the row value sequence jumps from ‘19’ to ‘21’.

The above query works for auto-increment columns that start with ’1′, for rows that start with some other number use the following query.

SELECT DISTINCT 'gap' FROM `testtable` 
    HAVING COUNT(*) <> (MAX(id) - (MIN(id) - 1))

Finding missing numbers in a sequence

Finding if a sequence has gaps is easy, finding the exact list of missing numbers is a little involved. The following query lists the missing numbers from a given auto-increment column.

SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
    FROM testtable AS a, testtable AS b
    WHERE a.id < b.id
    GROUP BY a.id
    HAVING start < MIN(b.id)

For example running the above query on Table 1. gives the following result. Note: For a table containing a large number of rows, the above query can take a substantial amount of time.

+-------+------+
| start | end  |
+-------+------+
|     2 |    2 |
|     4 |    4 |
|     8 |    9 |
+-------+------+

Visualizing sequence gaps

A nice way to represent the missing values in a sequence is through the visual medium. The following is a 1-dimensional graph of the sequence numbers in a auto-increment column. The black areas represent the missing numbers. As you can easily see there are large sections of unused numbers in the sequence. To create the following graph I used the ‘wp_comments’ table from a WordPress database and the wonderful Processing language for drawing the graphs.

For the curious the code is shown below. As the Processing code needs to connect to the MySQL database, you will also need to download the MySQL library for Processing from here.

import de.bezier.data.sql.*;
 
void setup()
{
  String user     = "your_db_username";
  String pass     = "your_db_password";
  String database = "database";
 
  size(500, 50);
  background(0,206,0);
  stroke(0);
 
  MySQL msql;
  String query;
 
  msql = new MySQL( this, "localhost", database, user, pass );
 
  if (msql.connect()) {
 
      /* Get the MAX and MIN values from of the column */
      msql.query("SELECT MAX(comment_ID) as max_n, " +
                 "MIN(comment_ID) AS min_n FROM wp_comments");
      msql.next();
      int max_n = msql.getInt("max_n");
      int min_n = msql.getInt("min_n");
 
      /* Get the ranges of missing numbers. 
         I've used a Wordpress 'wp_comments' table here.
       */
      query =  "SELECT a.comment_ID+1 AS start, " +
               "MIN(b.comment_ID) - 1 AS end " +
               "FROM wp_comments AS a, wp_comments AS b " +
               "WHERE a.comment_ID < b.comment_ID " +
               "GROUP BY a.comment_ID "+
               "HAVING start < MIN(b.comment_ID)";
 
      msql.query( query );
 
      /* Plot the missing numbers as vertical lines */
      while (msql.next()) {
          int from = msql.getInt("start");
          int to = msql.getInt("end");
 
          for(int i=from; i<=to; i++) {
              /* Map the numbers in the column to the sketch window */
              float m = map(i, min_n, max_n, 0, width);
              line(m, 0, m, 50);
          }
      }
 
  }
  else {
      print("connection failed !");
  }
 
}

This site is a digital habitat of Sameer Borate, a freelance web developer working in PHP, MySQL and WordPress. I also provide web scraping services, website design and development and integration of various Open Source API's. Contact me at metapix[at]gmail.com for any new project requirements and price quotes.

8 Responses

1

AndyfromTucson

September 28th, 2010 at 6:07 am

These queries work great, but I cannot figure out how they work. Would you please explain how they work?

sameer

September 30th, 2010 at 11:15 pm

I’ve added the details of how the queries work in the post.

3

Russell

January 31st, 2011 at 4:35 am

This is an alternative:
PROS: It’s very quick.
CONS: It really on spots the first gap, and you have to check the result by eye.

SELECT @m:=0;
SELECT testtable, @m:=@m+1 as r, IF (id@m, ‘GAP’ , ” ) FROM testtable;

This gives you something like:
+—–+—–+——–+
| id | r | IF(…. |
+—–+—–+——–+
| 2 | 2 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 5 | |
| 7 | 6 | GAP |
| 8 | 7 | GAP |
| 9 | 8 | GAP |
+—–+—–+——–+

4

Russell

January 31st, 2011 at 4:37 am

My code got mangled!

the if statement should be checking that id is NOT EQUAL to @m… so you’ll need to poke <> in there!

5

hugo alvarado

February 11th, 2011 at 9:05 am

A small adjustment to Russell’s suggestion, it allows to find all gaps, not just the first one..but it incorrectly thinks there is a gap in the first row! stil have to check the result by eye.

SELECT @m:=0;

SELECT
transactionId,
@m:=@m+1 as r,
IF (transactionId@m, if(@m:=transactionId,’GAP’,'GAP’) , ” )
FROM transaction;

6

hugo alvarado

February 11th, 2011 at 9:09 am

Doh, my code got mangled also

7

adam

May 27th, 2011 at 2:42 pm

There is another alternative if you’re looking for the places where the gaps exist. Not the values that don’t exist, but the places where gaps occur. Its called a self join.

SELECT t1.*, t2.id
FROM table t1
LEFT OUTER JOIN table t2
ON t1.id – 1 = t2.id
WHERE t2.id is NULL

I’ve found this to be performant on tables with 1000′s of rows (~5000 rows completed in ~0.05s) as long as an index is present on id.

8

trilok

September 28th, 2012 at 4:30 am

The following can be another alternative:

select case when c.a=0 then (select max(a)+1 from test.t1) else c.a-1 end frn from test.t1
c where c.a not in ( select b.a from test.t1 a, test.t1 b where b.a> 0 and a.a=b.a-1 ) order by
frn limit 1;

table test.t1 ( a tinyint unsigned );

Your thoughts

Sign up for fresh content in your email