Statistical hypothesis testing is a fundamental concept in inferential statistics, allowing researchers and analysts to draw conclusions about a population based on sample data. It involves formulating hypotheses, collecting data, and using statistical methods to evaluate the plausibility of the hypotheses given the observed data.
In hypothesis testing, we typically formulate two mutually exclusive hypotheses: the null hypothesis (H0) and the alternative hypothesis (H1). The null hypothesis is a statement about the population parameter that we aim to test, often representing the status quo or the assumption of no effect. The alternative hypothesis is the opposite of the null hypothesis and represents the claim or effect that we are interested in detecting.
The process of hypothesis testing involves the following steps:
 Formulate the null and alternative hypotheses.
 Specify a significance level (α), which represents the maximum probability of rejecting the null hypothesis when it’s true (Type I error).
 Calculate a test statistic from the sample data.
 Determine the critical region or the pvalue based on the test statistic and the chosen significance level.
 Compare the test statistic or pvalue to the critical region or significance level, respectively.
 Make a decision to reject or fail to reject the null hypothesis based on the comparison.
Hypothesis testing is widely used in various fields, such as science, engineering, economics, and social sciences, to make datadriven decisions and validate or refute claims about populations or processes.
import numpy as np from scipy.stats import norm # Define the null and alternative hypotheses mu_null = 50 # Hypothesized population mean mu_alt = 52 # Alternative population mean # Sample data sample_mean = 51.2 sample_std = 3.5 sample_size = 30 # Calculate the test statistic z_stat = (sample_mean  mu_null) / (sample_std / np.sqrt(sample_size)) # Compute the pvalue p_value = 2 * (1  norm.cdf(abs(z_stat))) # Set the significance level alpha = 0.05 # Make a decision based on the pvalue if p_value < alpha: print("Reject the null hypothesis") else: print("Fail to reject the null hypothesis")
In the above example, we perform a onesample ztest to determine if the population mean differs from a hypothesized value (50). The test statistic (z_stat) is calculated based on the sample data, and the pvalue is computed using the standard normal distribution. By comparing the pvalue to the chosen significance level (α = 0.05), we can make a decision to reject or fail to reject the null hypothesis.
Types of Hypothesis Tests in scipy.stats
The scipy.stats module in Python provides a wide range of hypothesis tests to analyze different types of data and scenarios. Here are some of the common hypothesis tests available in scipy.stats:
 Onesample tests: These tests are used to determine if a sample comes from a population with a specific mean or median value.
 Onesample ttest for the mean of a normally distributed population.
 Onesample Wilcoxon signedrank test for the median of a nonnormal population.
 Twosample tests: These tests are used to compare the means or medians of two independent samples.
 Twosample ttest for the means of two independent normally distributed populations.
 MannWhitney U test for the medians of two independent nonnormal populations.
 Paired tests: These tests are used to compare two related or paired samples, such as before and after measurements.
 Paired ttest for the means of two related normally distributed samples.
 Wilcoxon signedrank test for the medians of two related nonnormal samples.
 Chisquare tests: These tests are used to assess the goodnessoffit of a sample to a theoretical distribution or to test for independence between categorical variables.
 Chisquare goodnessoffit test.
 Chisquare test for independence between two categorical variables.
 Analysis of Variance (ANOVA): These tests are used to compare the means of three or more independent groups.
 Oneway ANOVA for comparing the means of two or more independent groups.
These are just a few examples of the hypothesis tests available in scipy.stats. The module also provides functions for calculating critical values, pvalues, and other statistical quantities necessary for hypothesis testing.
Here’s an example of conducting a twosample ttest to compare the means of two independent samples:
from scipy.stats import ttest_ind # Sample data sample1 = [22, 25, 19, 28, 21] sample2 = [18, 24, 20, 23, 22, 19] # Perform the twosample ttest t_stat, p_value = ttest_ind(sample1, sample2) # Print the results print(f"tstatistic: {t_stat:.4f}") print(f"pvalue: {p_value:.4f}")
In this example, we use the ttest_ind
function to perform a twosample ttest on two independent samples. The function returns the tstatistic and the corresponding pvalue, which can be used to make a decision about rejecting or failing to reject the null hypothesis.
Conducting Hypothesis Tests in Python
Conducting hypothesis tests in Python using the scipy.stats module is simpler. Here are the general steps to follow:

Import the necessary functions from scipy.stats:
from scipy.stats import ttest_1samp, ttest_ind, mannwhitneyu, wilcoxon, f_oneway, chisquare, chi2_contingency
 Organize your sample data into appropriate data structures (e.g., lists, NumPy arrays).

Choose the appropriate hypothesis test function based on your research question and data characteristics (e.g., onesample, twosample, paired, ANOVA, chisquare).

Call the chosen hypothesis test function with your sample data as arguments:
test_statistic, p_value = ttest_1samp(sample_data, popmean)

Interpret the results:
 The test statistic (e.g., tstatistic, zscore, Fstatistic) provides information about the magnitude and direction of the difference between the sample and the hypothesized value or between the groups.
 The pvalue represents the probability of observing the test statistic or a more extreme value if the null hypothesis is true. A small pvalue (typically less than the chosen significance level, e.g., 0.05) suggests that the observed data is unlikely under the null hypothesis, leading to its rejection.

Make a decision based on the pvalue and the chosen significance level (α):
alpha = 0.05 # Significance level if p_value < alpha: print("Reject the null hypothesis") else: print("Fail to reject the null hypothesis")
Here’s an example of conducting a onesample ttest to determine if the mean of a sample differs significantly from a hypothesized population mean:
from scipy.stats import ttest_1samp import numpy as np # Sample data sample_data = np.array([12.5, 14.2, 13.8, 11.9, 12.7, 13.1]) hypothesized_mean = 13.0 # Perform onesample ttest t_stat, p_value = ttest_1samp(sample_data, hypothesized_mean) # Print results print(f"tstatistic: {t_stat:.4f}") print(f"pvalue: {p_value:.4f}") # Make a decision alpha = 0.05 if p_value < alpha: print("Reject the null hypothesis") else: print("Fail to reject the null hypothesis")
This example demonstrates how to use the ttest_1samp function to compare the sample mean to a hypothesized population mean. The pvalue is then compared to the chosen significance level (α = 0.05) to make a decision about rejecting or failing to reject the null hypothesis.
Interpreting Hypothesis Test Results
When interpreting the results of a hypothesis test, there are a few key aspects to consider:
 The test statistic (e.g., tstatistic, zscore, Fstatistic) provides information about the magnitude and direction of the difference between the sample and the hypothesized value or between the groups being compared. A larger absolute value of the test statistic generally indicates a stronger evidence against the null hypothesis.
 The pvalue represents the probability of observing the test statistic or a more extreme value if the null hypothesis is true. A smaller pvalue suggests that the observed data is less likely under the null hypothesis, providing stronger evidence for rejecting it.
 The significance level (α) is a predetermined threshold that determines the maximum acceptable probability of making a Type I error (rejecting the null hypothesis when it is true). Typically, α is set to 0.05 or 0.01, but it can be adjusted based on the specific requirements of the study.
 Decision Rule: Based on the pvalue and the chosen significance level, a decision is made to either reject or fail to reject the null hypothesis:
 If the pvalue is less than or equal to the significance level (pvalue ≤ α), the null hypothesis is rejected, suggesting that the observed difference or effect is statistically significant.
 If the pvalue is greater than the significance level (pvalue > α), the null hypothesis is not rejected, indicating that there is insufficient evidence to conclude a statistically significant difference or effect.
 The interpretation of the test results should be made in the context of the research question and the practical significance of the findings. A statistically significant result does not necessarily imply practical or substantive importance, and factors such as effect size, confidence intervals, and subjectmatter expertise should be considered in drawing conclusions.
It is important to note that failing to reject the null hypothesis does not necessarily mean that the null hypothesis is true; it simply indicates a lack of sufficient evidence to reject it based on the given data and significance level.
Here’s an example of interpreting the results of a twosample ttest:
from scipy.stats import ttest_ind # Sample data sample1 = [22, 25, 19, 28, 21] sample2 = [18, 24, 20, 23, 22, 19] # Perform the twosample ttest t_stat, p_value = ttest_ind(sample1, sample2) # Set the significance level alpha = 0.05 # Print the results print(f"tstatistic: {t_stat:.4f}") print(f"pvalue: {p_value:.4f}") # Make a decision and interpret the results if p_value < alpha: print("Reject the null hypothesis. There is a statistically significant difference between the means of the two groups.") else: print("Fail to reject the null hypothesis. There is no statistically significant difference between the means of the two groups.")
In this example, the tstatistic and pvalue are computed for a twosample ttest comparing the means of two independent samples. The pvalue is then compared to the chosen significance level (α = 0.05) to make a decision about rejecting or failing to reject the null hypothesis. The interpretation of the results should consider the practical implications and effect sizes in addition to the statistical significance.