# Calculating descriptive statistics in MySQL

Descriptive statistics can be quite useful for simple analysis of records in a database. For example, to calculate average numbers of sales or products for a particular duration, or the Variance of sales for a month etc.

We can easily calculate standard descriptive statistic measures in MySQL such as MEAN, SUM, STANDARD DEVIATION, VARIANCE, MIN and MAX using built-in functions. Below is a sample student table with student id and scores which we will use for calculating descriptive statistical measures.

```+------------+-------+
| student_id | score |
+------------+-------+
|          1 |    91 |
|          2 |    72 |
|          3 |    98 |
|          4 |    62 |
|          5 |    62 |
|          6 |    95 |
|          7 |    83 |
|          8 |    86 |
|          9 |    56 |
|         10 |    97 |
|         11 |    58 |
|         12 |    71 |
|         13 |    87 |
|         14 |    83 |
|         15 |    98 |
+------------+-------+
```

The following query calculates some standard statistic measures for the above data.

```SELECT COUNT(score) AS N
SUM(score) AS sum,
AVG(score) AS mean,
STDDEV_SAMP(score) AS 'std. dev.',
VAR_SAMP(score) AS 'variance',
MIN(score) AS minimum,
MAX(score) AS maximum
FROM student;
```

This is what we get.

```+----+------+---------+-----------+----------+---------+---------+
| N  | sum  | mean    | std. dev. | variance | minimum | maximum |
+----+------+---------+-----------+----------+---------+---------+
| 15 | 1199 | 79.9333 |   15.2103 | 231.3524 |      56 |      98 |
+----+------+---------+-----------+----------+---------+---------+
```

Generating a MEDIAN however can get a little tricky and we have to resort to some PHP code. For the above data the MEDIAN is calculated as 83. This is not a optimal solution and you could use an array instead.

```/* Get total number of records */
\$query = "SELECT count(*) as total FROM student";
\$result = mysqli_query(\$con, \$query);
\$row = \$result->fetch_assoc();
\$total = \$row['total'];
\$median = 0;

/* If odd number of rows */
if(\$total % 2 == 1) {
\$median_location = ceil((\$total - 1)/2);
\$query = "SELECT *
FROM student
ORDER BY score ASC
LIMIT {\$median_location},1";
\$result = mysqli_query(\$con, \$query);
\$row = \$result->fetch_assoc();
\$median = \$row['score'];
} else {
/* Even number of rows */
\$median_location = (\$total/2)-1;
\$query = "SELECT *
FROM student
ORDER BY score ASC
LIMIT {\$median_location}, 2";
\$result = mysqli_query(\$con, \$query);
\$temp = 0;
while(\$row = \$result->fetch_assoc()) {
\$temp += \$row['score'];
}
\$median = \$temp / 2;
}
```

Calculating MODE also requires some extra code. MODE refers to the most frequently occurring item in our data set; in our example it is the most commonly occurring ‘score’. First we need to get the frequency of all the scores, and then retrieve the top result.

```SELECT score, COUNT(score) AS frequency
FROM student
GROUP BY score
ORDER BY frequency DESC;
```

In our example it will return the following, where the most commonly occurring items are 98, 62 and 83. There can be multiple MODE values for a dataset.

```+-------+-----------+
| score | frequency |
+-------+-----------+
|    98 |         2 |
|    62 |         2 |
|    83 |         2 |
|    72 |         1 |
|    95 |         1 |
|    86 |         1 |
|    56 |         1 |
|    97 |         1 |
|    58 |         1 |
|    71 |         1 |
|    87 |         1 |
|    91 |         1 |
+-------+-----------+
```

Although not comprehensive data tools, these are some simple statistic measures to make a quick sense of your data.