Descriptive statistics can be quite useful for simple analysis of records in a database. For example, to calculate average numbers of sales or products for a particular duration, or the Variance of sales for a month etc.

We can easily calculate standard descriptive statistic measures in MySQL such as MEAN, SUM, STANDARD DEVIATION, VARIANCE, MIN and MAX using built-in functions. Below is a sample student table with student id and scores which we will use for calculating descriptive statistical measures.

+------------+-------+ | student_id | score | +------------+-------+ | 1 | 91 | | 2 | 72 | | 3 | 98 | | 4 | 62 | | 5 | 62 | | 6 | 95 | | 7 | 83 | | 8 | 86 | | 9 | 56 | | 10 | 97 | | 11 | 58 | | 12 | 71 | | 13 | 87 | | 14 | 83 | | 15 | 98 | +------------+-------+ |

The following query calculates some standard statistic measures for the above data.

SELECT COUNT(score) AS N SUM(score) AS sum, AVG(score) AS mean, STDDEV_SAMP(score) AS 'std. dev.', VAR_SAMP(score) AS 'variance', MIN(score) AS minimum, MAX(score) AS maximum FROM student; |

This is what we get.

+----+------+---------+-----------+----------+---------+---------+ | N | sum | mean | std. dev. | variance | minimum | maximum | +----+------+---------+-----------+----------+---------+---------+ | 15 | 1199 | 79.9333 | 15.2103 | 231.3524 | 56 | 98 | +----+------+---------+-----------+----------+---------+---------+ |

Generating a MEDIAN however can get a little tricky and we have to resort to some PHP code. For the above data the MEDIAN is calculated as 83. This is not a optimal solution and you could use an array instead.

/* Get total number of records */ $query = "SELECT count(*) as total FROM student"; $result = mysqli_query($con, $query); $row = $result->fetch_assoc(); $total = $row['total']; $median = 0; /* If odd number of rows */ if($total % 2 == 1) { $median_location = ceil(($total - 1)/2); $query = "SELECT * FROM student ORDER BY score ASC LIMIT {$median_location},1"; $result = mysqli_query($con, $query); $row = $result->fetch_assoc(); $median = $row['score']; } else { /* Even number of rows */ $median_location = ($total/2)-1; $query = "SELECT * FROM student ORDER BY score ASC LIMIT {$median_location}, 2"; $result = mysqli_query($con, $query); $temp = 0; while($row = $result->fetch_assoc()) { $temp += $row['score']; } $median = $temp / 2; } |

Calculating MODE also requires some extra code. MODE refers to the most frequently occurring item in our data set; in our example it is the most commonly occurring ‘score’. First we need to get the frequency of all the scores, and then retrieve the top result.

SELECT score, COUNT(score) AS frequency FROM student GROUP BY score ORDER BY frequency DESC; |

In our example it will return the following, where the most commonly occurring items are 98, 62 and 83. There can be multiple MODE values for a dataset.

+-------+-----------+ | score | frequency | +-------+-----------+ | 98 | 2 | | 62 | 2 | | 83 | 2 | | 72 | 1 | | 95 | 1 | | 86 | 1 | | 56 | 1 | | 97 | 1 | | 58 | 1 | | 71 | 1 | | 87 | 1 | | 91 | 1 | +-------+-----------+ |

Although not comprehensive data tools, these are some simple statistic measures to make a quick sense of your data.

This site is a digital habitat of Sameer Borate, a freelance web developer working in PHP, MySQL and WordPress. I also provide web scraping services, website design and development and integration of various Open Source API's. Contact me at metapix[at]gmail.com for any new project requirements and price quotes.