Statistical distribution of column values in MySQL

Many times we need to get a statistical distribution of values in a database table. Say you have a e-commerce shoe store having a product table with the following fields and values. As this is only an example I’ve limited the table to a few items; there will hundreds of rows in a real-life table.

;'product' table
;
+----+----------+-------+------+-------+
| id | type     | color | size | price |
+----+----------+-------+------+-------+
|  1 | sandals  | black |    7 | 45.00 |
|  2 | boots    | black |    8 | 36.00 |
|  3 | loafers  | brown |    8 | 45.00 |
|  4 | sandals  | brown |    7 | 45.00 |
|  5 | sneakers | blue  |    6 | 25.00 |
|  6 | sneakers | black |    6 | 25.00 |
|  7 | sandals  | blue  |    7 | 45.00 |
|  8 | loafers  | black |    7 | 35.00 |
|  9 | boots    | brown |    8 | 42.00 |
+----+----------+-------+------+-------+

What we want is to get a distribution which lists how many of each value we have got, like: sandals = 3, boots = 2, size 7 shoes = 4 etc.

The following query will accomplish that.

SELECT 'type', product.type, COUNT(*) AS total_rows 
    FROM product GROUP BY product.type 
UNION
SELECT 'color', product.color, COUNT(*) AS total_rows 
    FROM product GROUP BY product.color  
UNION
SELECT 'size', product.size, COUNT(*) AS total_rows 
    FROM product GROUP BY product.size  
UNION
SELECT 'price', product.price, COUNT(*) AS total_rows 
    FROM product GROUP BY product.price ;

This will return the following results.

+-------+----------+------------+
| type  | type     | total_rows |
+-------+----------+------------+
| type  | boots    |          2 |
| type  | loafers  |          2 |
| type  | sandals  |          3 |
| type  | sneakers |          2 |
| color | black    |          4 |
| color | blue     |          2 |
| color | brown    |          3 |
| size  | 6        |          2 |
| size  | 7        |          4 |
| size  | 8        |          3 |
| price | 25.00    |          2 |
| price | 35.00    |          1 |
| price | 36.00    |          1 |
| price | 42.00    |          1 |
| price | 45.00    |          4 |
+-------+----------+------------+

We now know how many of each type we have. This will be helpful to carryout any statistical analysis on the products table. We can create a new table (product_stats) with the above query by adding a ‘INSERT’ statement.

INSERT INTO product_stats (column_name, value, total_rows)
 
SELECT 'type', product.type, COUNT(*) AS total_rows 
    FROM product GROUP BY product.type 
UNION
SELECT 'color', product.color, COUNT(*) AS total_rows 
    FROM product GROUP BY product.color  
UNION
SELECT 'size', product.size, COUNT(*) AS total_rows 
    FROM product GROUP BY product.size  
UNION
SELECT 'price', product.price, COUNT(*) AS total_rows 
    FROM product GROUP BY product.price ;