Standard deviation is a measure of the amount of variation or dispersion in a set of values. It quantifies how much the values in a dataset deviate from the mean (average) of the dataset. A low standard deviation indicates that the values tend to be close to the mean, while a high standard deviation indicates that the values are spread out over a wider range.
Formula for Standard Deviation
For a dataset :
-
Mean ():
-
Variance :
- Standard Deviation :
Calculating Standard Deviation in Python
You can calculate the standard deviation in Python using the statistics module or using numpy for larger datasets.
Using the statistics Module
import statistics
# Example dataset
data = [2, 4, 4, 4, 5, 5, 7, 9]
# Calculate standard deviation
std_dev = statistics.stdev(data)
print("Standard Deviation:", std_dev)Using numpy
import numpy as np
# Example dataset
data = [2, 4, 4, 4, 5, 5, 7, 9]
# Calculate standard deviation
std_dev = np.std(data, ddof=1) # ddof=1 provides sample standard deviation
print("Standard Deviation:", std_dev)Explanation
- Using
statistics.stdev: This function calculates the sample standard deviation. - Using
numpy.std: Thenp.stdfunction calculates the standard deviation. Theddof=1parameter specifies that the calculation should use the sample standard deviation formula (n-1 in the denominator).
Both methods will give you the standard deviation of the dataset. You can choose either based on your preference or the size of your dataset.
Mean and Variance
In statistics, mean and variance are fundamental concepts used to describe the characteristics of a dataset.
Mean
The mean (often referred to as the average) is a measure of central tendency. It represents the average value of a dataset and provides a summary of the data’s center.
Formula for Mean
For a dataset :
Where:
- is the mean.
- is the number of observations in the dataset.
- represents each individual observation.
Example
For the dataset :
Variance
The variance is a measure of how much the values in a dataset vary or spread out from the mean. It quantifies the degree of dispersion or spread in the data.
Formula for Variance
For a dataset :
Where:
- is the variance.
- is the mean of the dataset.
- represents each individual observation.
Example
For the dataset with mean :
-
Calculate the squared differences from the mean:
-
Calculate the average of these squared differences:
Relationship Between Mean and Variance
- Mean provides a central value around