Learning the basics of statistics is my first step towards the world of data science. To be honest, I’m interested in acquiring some basics of the demigod ruling our life in 2021.
This bit-sized series is going to be a regular one as I’ll update my thoughts on every chapter of Think Stats 2.1.0 by Allen B. Downey.
Chapter 1: Exploratory data analysis
Like a little knowledge, a little set of observations or data is a dangerous thing. This may lead to different types of errors.
A statistical decision may be an outcome of chance or anecdotal evidence laced with different biases and inaccuracies.
We have to adapt some statistical approaches– e.g Data collection, Descriptive statistics, Exploratory data analysis, Estimation and Hypothesis, etc.– to avoid those.
Tools and Frameworks
An artiste knows her tools and medium well to express an idea.
A raw data set needs some preprocessing to avoid errors and cope with tools– programming language– and medium– framework packages like pandas, matplotlib etc.– to produce practical results.
Downey writes, ‘When data is exported from one software environment and imported into
another, errors might be introduced. And when you are getting familiar with a new dataset, you might interpret data incorrectly or introduce other misunderstandings. If you take the time to validate the data, you can save time later and avoid errors.
One way to validate data is to compute basic statistics and compare them with published results.’ p10
We focus on some specific variables of our interest among literally thousands of them and must have a good understanding of the data types of the framework/library in use.
I’m having basics or what is needed for the latter NOT the all. This is practical I think.
Chapter-wise solutions are available at my github repo.
This bit-sized series is going to be a regular one as I’ll update my thoughts on every chapter of Think Stats 2.1.0 by Allen B. Downey.
Chapter 1: Exploratory data analysis
Like a little knowledge, a little set of observations or data is a dangerous thing. This may lead to different types of errors.
A statistical decision may be an outcome of chance or anecdotal evidence laced with different biases and inaccuracies.
We have to adapt some statistical approaches– e.g Data collection, Descriptive statistics, Exploratory data analysis, Estimation and Hypothesis, etc.– to avoid those.
Tools and Frameworks
An artiste knows her tools and medium well to express an idea.
A raw data set needs some preprocessing to avoid errors and cope with tools– programming language– and medium– framework packages like pandas, matplotlib etc.– to produce practical results.
Downey writes, ‘When data is exported from one software environment and imported into
another, errors might be introduced. And when you are getting familiar with a new dataset, you might interpret data incorrectly or introduce other misunderstandings. If you take the time to validate the data, you can save time later and avoid errors.
One way to validate data is to compute basic statistics and compare them with published results.’ p10
We focus on some specific variables of our interest among literally thousands of them and must have a good understanding of the data types of the framework/library in use.
I’m having basics or what is needed for the latter NOT the all. This is practical I think.
Chapter-wise solutions are available at my github repo.
Comments
Post a Comment