Free datasets for testing database engines

Below is a small list of free datasets with which you can test your database queries or use it for learning and practicing sql query optimization or other things.

I’ve recently started reading some database books, specially by Joe Celko, and needed some medium size datasets on which I could run the sql queries from the book. After searching I found a few which I could quickly download and install on my local machine. The datasets given below are small in size when downloaded in zip format but contain millions of rows, so they are enough to load the database engines.

Sakila dataset

The first one is the standard ‘Sakila‘ database. The development of the Sakila sample database began in early 2005 and early designs were based on the database used in the Dell whitepaper Three Approaches to MySQL Applications on Dell PowerEdge Servers.
The Sakila sample database is designed to represent a DVD rental store and borrows film and actor names from the Dell sample database. The Sakila database is relatively small so for testing queries on large datasets I would preferably use the ‘Employees’ dataset. However for learning examples featuring JOINS this database will be nice as it contains a good many tables.

It is a complex database with 16 tables and other features such as Views, Stored Procedures and Triggers. This is in my opinion the best sample available for studying MySQL databases.

Employees dataset

The second one is the ‘Employees’ test dataset, which contains fake data of about 300,000 employee records with 2.8 million salary entries. The zip file is around 35 MB while the uncompressed exported data is 167 MB, which is not huge, but heavy enough to be non-trivial for testing.

Transportation Statistics dataset

The third is the ‘The Bureau of Transportation Statistics’ dataset which lists airline on-time data, downloadable in customizable ways. The data files are in CSV, so you will need to first import them into your database.

Leave a Reply

Your email address will not be published.