What is data?
Data is a piece of information. It comes in many forms such text, music, video, images and more. Data can be used to improve the way we live by predict the future.
Recently, data has been named to be the new oil. Every corporation is starting to leverage the massive data that has been collected to the year to implement predictive
modeling.
What are the type of data types?
There are two types of dat type. Quantitative data and Categorical data. Quantitative data is mainly numerical that can be leverage by mathematical operations.
Categorical data contains non numerical values that can be used to group contents.
There are subcategories of categorical values such as:
- Ordinal: these values are ordered such as school letter grades (A,B,C...)
- Nominal: These are not ordered. eg. Dog breed (Shiba inu, German Shepherd, ...)
There are also subcategories of quantitative values such as:
- Continuous: eg. temperature values, stock prices, ...
- Discrete: Number of pets that a family has at home
The knowledge of the type of data we have leads us in determining what type visualization plots to use or what type of summary statistics to apply.
Descriptive Statistics
Let's first define the difference between descriptive and inferential statistics. In descriptive statistics we are trying to describe the data collected but inferential statistics focuses on drawing conclusion about the data.
How to analyze Quantitative data?.
To analyze quantitative values data, we can focus on the
measures of center (mean, media, mode),
measures of spread (range, Interquartile range, standard deviation, variance),
outliers ,
shapes (Right-skewed, Left-skewed, Symmetric).
What type of plot use?
For quantitative data
Histogram is commonly used but some alternate plotting techniques are shown below:
- Normal Quantile Plot
- Stem and Leaf Plot
- Box and Whisker Plot
For categorical data
bar chart is often used but one can use the following:
Binomial Distribution
We can use this type distribution to find the outcome of two events. It is basically the probability of success or failure. For instance, when we gamble, we either win (success) or lose (failure).
Conditional Probability
investopedia has great explanation which I will refer you to take peek at
Conditional Probabilityt section
Bayes Rule
Bayes rule is one of the most import rule that is being used in machine learning. Once again, I leave a link from
freecodecamp because the explanation is clear and simplistic.
Sampling distributions and the Central Limit Theorem
In this section we will look into the law of large number and the central limit theorem. As the sample size increases, the sample mean gets closer to the same mean as the population mean. This what the law of large number stipulates.
Central limit theorem states that the sample distribution mean of the mean will become a gaussian distribution with enough sample size.
Confidence Intervals
wikipedia defines confidence interval as "is a type of estimate computed from the statistics of the observed data. This gives a range of values for an unknown parameter (for example, a population mean)."
Scipy and
Numpy has great libraries that can be used to compute sampling and computing confidence interval.
Hypothesis Testing
Upcoming