Standard Deviation Calculation with Code Example in Python

Standard deviation is a measure of the amount of variation or dispersion in a set of values. It quantifies how much the values in a dataset deviate from the mean (average) of the dataset. A low standard deviation indicates that the values tend to be close to the mean, while a high standard deviation indicates that the values are spread out over a wider range.

Formula for Standard Deviation

For a dataset X={x1,x2,,xn}X = \{x_1, x_2, \ldots, x_n\}:

  1. Mean ((μ)(\mu)):
    μ=1ni=1nxi\mu = \frac{1}{n} \sum_{i=1}^{n} x_i

  2. Variance (σ2)(\sigma^2):
    σ2=1ni=1n(xiμ)2\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2

i=1n\sum_{i=1}^{n}

  1. Standard Deviation (σ)(\sigma):
    σ=σ2=1ni=1n(xiμ)2\sigma = \sqrt{\sigma^2} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2}

Calculating Standard Deviation in Python

You can calculate the standard deviation in Python using the statistics module or using numpy for larger datasets.

Using the statistics Module

import statistics

# Example dataset
data = [2, 4, 4, 4, 5, 5, 7, 9]

# Calculate standard deviation
std_dev = statistics.stdev(data)

print("Standard Deviation:", std_dev)

Using numpy

import numpy as np

# Example dataset
data = [2, 4, 4, 4, 5, 5, 7, 9]

# Calculate standard deviation
std_dev = np.std(data, ddof=1)  # ddof=1 provides sample standard deviation

print("Standard Deviation:", std_dev)

Explanation

  • Using statistics.stdev: This function calculates the sample standard deviation.
  • Using numpy.std: The np.std function calculates the standard deviation. The ddof=1 parameter specifies that the calculation should use the sample standard deviation formula (n-1 in the denominator).

Both methods will give you the standard deviation of the dataset. You can choose either based on your preference or the size of your dataset.

Mean and Variance

In statistics, mean and variance are fundamental concepts used to describe the characteristics of a dataset.

Mean

The mean (often referred to as the average) is a measure of central tendency. It represents the average value of a dataset and provides a summary of the data’s center.

Formula for Mean

For a dataset X={x1,x2,,xn}X = \{x_1, x_2, \ldots, x_n\}:

μ=1ni=1nxi\mu = \frac{1}{n} \sum_{i=1}^{n} x_i

Where:

  • μ\mu is the mean.
  • nn is the number of observations in the dataset.
  • xix_i represents each individual observation.

Example

For the dataset {2,4,6,8,10}\{2, 4, 6, 8, 10\}:

μ=2+4+6+8+105=305=6\mu = \frac{2 + 4 + 6 + 8 + 10}{5} = \frac{30}{5} = 6

Variance

The variance is a measure of how much the values in a dataset vary or spread out from the mean. It quantifies the degree of dispersion or spread in the data.

Formula for Variance

For a dataset X={x1,x2,,xn}X = \{x_1, x_2, \ldots, x_n\}:

σ2=1ni=1n(xiμ)2\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2

Where:

  • σ2\sigma^2 is the variance.
  • μ\mu is the mean of the dataset.
  • xix_i represents each individual observation.

Example

For the dataset {2,4,6,8,10}\{2, 4, 6, 8, 10\} with mean μ=6\mu = 6:

  1. Calculate the squared differences from the mean:

    • (26)2=16(2 - 6)^2 = 16
    • (46)2=4(4 - 6)^2 = 4
    • (66)2=0(6 - 6)^2 = 0
    • (86)2=4(8 - 6)^2 = 4
    • (106)2=16(10 - 6)^2 = 16
  2. Calculate the average of these squared differences:

    • σ2=16+4+0+4+165=405=8\sigma^2 = \frac{16 + 4 + 0 + 4 + 16}{5} = \frac{40}{5} = 8

Relationship Between Mean and Variance

  • Mean provides a central value around