# Calculating descriptive statistics in MySQL

Descriptive statistics can be quite useful for simple analysis of records in a database. For example, to calculate average numbers of sales or products for a particular duration, or the Variance of sales for a month etc.

We can easily calculate standard descriptive statistic measures in MySQL such as MEAN, SUM, STANDARD DEVIATION, VARIANCE, MIN and MAX using built-in functions. Below is a sample student table with student id and scores which we will use for calculating descriptive statistical measures.

 ```+------------+-------+ | student_id | score | +------------+-------+ | 1 | 91 | | 2 | 72 | | 3 | 98 | | 4 | 62 | | 5 | 62 | | 6 | 95 | | 7 | 83 | | 8 | 86 | | 9 | 56 | | 10 | 97 | | 11 | 58 | | 12 | 71 | | 13 | 87 | | 14 | 83 | | 15 | 98 | +------------+-------+```

The following query calculates some standard statistic measures for the above data.

 ```SELECT COUNT(score) AS N SUM(score) AS sum, AVG(score) AS mean, STDDEV_SAMP(score) AS 'std. dev.', VAR_SAMP(score) AS 'variance', MIN(score) AS minimum, MAX(score) AS maximum FROM student;```

This is what we get.

 ```+----+------+---------+-----------+----------+---------+---------+ | N | sum | mean | std. dev. | variance | minimum | maximum | +----+------+---------+-----------+----------+---------+---------+ | 15 | 1199 | 79.9333 | 15.2103 | 231.3524 | 56 | 98 | +----+------+---------+-----------+----------+---------+---------+```

Generating a MEDIAN however can get a little tricky and we have to resort to some PHP code. For the above data the MEDIAN is calculated as 83. This is not a optimal solution and you could use an array instead.

 ```/* Get total number of records */ \$query = "SELECT count(*) as total FROM student"; \$result = mysqli_query(\$con, \$query); \$row = \$result->fetch_assoc(); \$total = \$row['total']; \$median = 0;   /* If odd number of rows */ if(\$total % 2 == 1) { \$median_location = ceil((\$total - 1)/2); \$query = "SELECT * FROM student ORDER BY score ASC LIMIT {\$median_location},1"; \$result = mysqli_query(\$con, \$query); \$row = \$result->fetch_assoc(); \$median = \$row['score']; } else { /* Even number of rows */ \$median_location = (\$total/2)-1; \$query = "SELECT * FROM student ORDER BY score ASC LIMIT {\$median_location}, 2"; \$result = mysqli_query(\$con, \$query); \$temp = 0; while(\$row = \$result->fetch_assoc()) { \$temp += \$row['score']; } \$median = \$temp / 2; }```

Calculating MODE also requires some extra code. MODE refers to the most frequently occurring item in our data set; in our example it is the most commonly occurring ‘score’. First we need to get the frequency of all the scores, and then retrieve the top result.

 ```SELECT score, COUNT(score) AS frequency FROM student GROUP BY score ORDER BY frequency DESC;```

In our example it will return the following, where the most commonly occurring items are 98, 62 and 83. There can be multiple MODE values for a dataset.

 ```+-------+-----------+ | score | frequency | +-------+-----------+ | 98 | 2 | | 62 | 2 | | 83 | 2 | | 72 | 1 | | 95 | 1 | | 86 | 1 | | 56 | 1 | | 97 | 1 | | 58 | 1 | | 71 | 1 | | 87 | 1 | | 91 | 1 | +-------+-----------+```

Although not comprehensive data tools, these are some simple statistic measures to make a quick sense of your data.