Unlike many other fields data-science projects should start with focused questions. A table of data is not of much use in itself unless we analyse and understand it with a specific goal in mind. Without a predefined goal we would not know where to start – what to look for in the data, how to analyze it, what extraneous elements should be removed etc.
For the past several years I’ve regularly consumed Pantoprazole – a proton pump inhibitor (PPI) – to control my acid reflux. However, as it is well know, regular PPI use causes various health problems. I’ll not enumerate those here, but trust me when I say that they are numerous. Last year I decided to wean myself off PPIs slowly, by only limiting to a 20mg dose on any single day, and alternating or dropping some days completely. I recorded my dosage information in Google calendar on my mobile, so that I could process it later. Now that I’ve a more than a year’s data with me I thought of visualizing it to get an understanding of my dosage habits.
Missing data in databases can cause bugs in applications or incorrect calculations. Recently, while working on a RETS application, I needed to ensure that not many missing values were encountered in one of the MySQL tables. Although one could easily write a SQL query to find the percentage of missing values, I many times find it easier to first get a visual representation of the amount of missing data there is in the table, and then drill-down further if required. One library that I found that lets you easily get a visual representation of missing data in your database tables is missingno – a Python library.
Many times we need to get a statistical distribution of values in a database table. Say you have a e-commerce shoe store having a product table with the following fields and values. As this is only an example I’ve limited the table to a few items; there will hundreds of rows in a real-life table.
As any one who has programmed knows about configuration files. Configuration files are mostly text files used to configure the parameters and initial settings for computer programs – mostly user applications, operating system settings. Below is a small list of frequently used file formats.
– Windows INI