Sequence gaps in MySQL

Auto-increment sequences are a common way to define a primary key in MySQL. This types of artificial keys, known as surrogate keys are commonly used by developers to quickly construct a database table. A common reason developers use artificial keys is due to the fact that most do not take the time to search for a natural key in their model, especially SQL newbies. The question of whether we should use a surrogate key or a natural key is a debate we should leave to the database experts. The purpose of this post is to present some queries to find if a auto-increment sequence contains gaps.

Gaps in auto-increment columns

In a table with a auto-increment column gaps gradually develop over time, due to repeated deletion and insertion of rows. An example of such a table is shown below. Note the sequence numbers of the ‘id’ column which is missing a few numbers. Not that the gaps are harmful in any way (at-least for small schemas), just that a cursory glance at a solution is satisfying and sometimes useful.

Table 1.

+----+-------+------------+
| id | name  | birthdate  |
+----+-------+------------+
|  1 | sam   | 1986-09-10 |
|  3 | grey  | 1979-01-19 |
|  5 | jenny | 1979-12-12 |
|  6 | bill  | 1991-04-27 |
|  7 | tom   | 1995-09-15 |
| 10 | jack  | 1964-02-09 |
+----+-------+------------+

To find out if a table with a defined auto-increment column, like the above, has gaps in it, we can use the following query. The query returns a single row with the value ‘gap’ if there are any gaps in the sequence, or a empty result for a non-gaped sequence.

SELECT DISTINCT 'gap' FROM `testtable` 
   HAVING COUNT(*) <> MAX(id)

This will return the following result for Table 1.

+-----+
| gap |
+-----+
| gap |
+-----+

Note that the above will only work if the sequence starts from the number ‘1’. If you have a auto-increment column that starts with some other number, say 1000, than the following query can be used.

SELECT DISTINCT 'gap' FROM `testtable` 
    HAVING COUNT(*) <> MAX(id) - 999 #Start sequence number - 1

How the query works

The query counts the total number of rows in ‘testtable’ and if the total count is different from the maximum ‘id’ (auto-increment column) value than it returns a column with the value ‘gap’. Say the table has 20 rows and the auto-increment column number starts with ‘1’. So if there are no gaps in the sequence than the MAX(id) should be ‘20’, which is equal to COUNT(*). Now suppose the 20th row is deleted and a new row inserted, then the ‘id’ value of the new row will be ‘21’ which will imply that there is a gap in the sequence as the row value sequence jumps from ‘19’ to ‘21’.

The above query works for auto-increment columns that start with ‘1’, for rows that start with some other number use the following query.

SELECT DISTINCT 'gap' FROM `testtable` 
    HAVING COUNT(*) <> (MAX(id) - (MIN(id) - 1))

Finding missing numbers in a sequence

Finding if a sequence has gaps is easy, finding the exact list of missing numbers is a little involved. The following query lists the missing numbers from a given auto-increment column.

SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
    FROM testtable AS a, testtable AS b
    WHERE a.id < b.id
    GROUP BY a.id
    HAVING start < MIN(b.id)

For example running the above query on Table 1. gives the following result. Note: For a table containing a large number of rows, the above query can take a substantial amount of time.

+-------+------+
| start | end  |
+-------+------+
|     2 |    2 |
|     4 |    4 |
|     8 |    9 |
+-------+------+

Visualizing sequence gaps

A nice way to represent the missing values in a sequence is through the visual medium. The following is a 1-dimensional graph of the sequence numbers in a auto-increment column. The black areas represent the missing numbers. As you can easily see there are large sections of unused numbers in the sequence. To create the following graph I used the ‘wp_comments’ table from a WordPress database and the wonderful Processing language for drawing the graphs.

For the curious the code is shown below. As the Processing code needs to connect to the MySQL database, you will also need to download the MySQL library for Processing from here.

import de.bezier.data.sql.*;
 
void setup()
{
  String user     = "your_db_username";
  String pass     = "your_db_password";
  String database = "database";
 
  size(500, 50);
  background(0,206,0);
  stroke(0);
 
  MySQL msql;
  String query;
 
  msql = new MySQL( this, "localhost", database, user, pass );
 
  if (msql.connect()) {
 
      /* Get the MAX and MIN values from of the column */
      msql.query("SELECT MAX(comment_ID) as max_n, " +
                 "MIN(comment_ID) AS min_n FROM wp_comments");
      msql.next();
      int max_n = msql.getInt("max_n");
      int min_n = msql.getInt("min_n");
 
      /* Get the ranges of missing numbers. 
         I've used a Wordpress 'wp_comments' table here.
       */
      query =  "SELECT a.comment_ID+1 AS start, " +
               "MIN(b.comment_ID) - 1 AS end " +
               "FROM wp_comments AS a, wp_comments AS b " +
               "WHERE a.comment_ID < b.comment_ID " +
               "GROUP BY a.comment_ID "+
               "HAVING start < MIN(b.comment_ID)";
 
      msql.query( query );
 
      /* Plot the missing numbers as vertical lines */
      while (msql.next()) {
          int from = msql.getInt("start");
          int to = msql.getInt("end");
 
          for(int i=from; i<=to; i++) {
              /* Map the numbers in the column to the sketch window */
              float m = map(i, min_n, max_n, 0, width);
              line(m, 0, m, 50);
          }
      }
 
  }
  else {
      print("connection failed !");
  }
 
}


8 thoughts on “Sequence gaps in MySQL

  1. These queries work great, but I cannot figure out how they work. Would you please explain how they work?

  2. This is an alternative:
    PROS: It’s very quick.
    CONS: It really on spots the first gap, and you have to check the result by eye.

    SELECT @m:=0;
    SELECT testtable, @m:=@m+1 as r, IF (id@m, ‘GAP’ , ” ) FROM testtable;

    This gives you something like:
    +—–+—–+——–+
    | id | r | IF(…. |
    +—–+—–+——–+
    | 2 | 2 | |
    | 3 | 3 | |
    | 4 | 5 | |
    | 5 | 5 | |
    | 7 | 6 | GAP |
    | 8 | 7 | GAP |
    | 9 | 8 | GAP |
    +—–+—–+——–+

  3. A small adjustment to Russell’s suggestion, it allows to find all gaps, not just the first one..but it incorrectly thinks there is a gap in the first row! stil have to check the result by eye.

    SELECT @m:=0;

    SELECT
    transactionId,
    @m:=@m+1 as r,
    IF (transactionId@m, if(@m:=transactionId,’GAP’,’GAP’) , ” )
    FROM transaction;

  4. There is another alternative if you’re looking for the places where the gaps exist. Not the values that don’t exist, but the places where gaps occur. Its called a self join.

    SELECT t1.*, t2.id
    FROM table t1
    LEFT OUTER JOIN table t2
    ON t1.id – 1 = t2.id
    WHERE t2.id is NULL

    I’ve found this to be performant on tables with 1000’s of rows (~5000 rows completed in ~0.05s) as long as an index is present on id.

  5. The following can be another alternative:

    select case when c.a=0 then (select max(a)+1 from test.t1) else c.a-1 end frn from test.t1
    c where c.a not in ( select b.a from test.t1 a, test.t1 b where b.a> 0 and a.a=b.a-1 ) order by
    frn limit 1;

    table test.t1 ( a tinyint unsigned );

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>