Nonparametric statistics - micheledpierri.com: statistics, data analysis and coding

Introduction

Nonparametric statistics is a branch of statistics that does not make strong assumptions about the underlying population distribution of data.

Unlike parametric statistics, which typically assume data follows a specific distribution (most commonly the normal distribution), nonparametric methods do not have such requirements.

Parametric tests get their name because they rely on parameters that describe a known distribution (like how mean and variance characterize a normal distribution). Using these tests can produce inaccurate results when the distribution is unknown or when dealing with asymmetric data, outliers, or categorical variables.

Therefore, nonparametric methods are particularly useful in cases where:

the data distribution is unknown or non-normal
the sample sizes are small
the data is ordinal or categorical rather than numerical

Common nonparametric methods include:

ranks
counts and proportions
sign tests
permutation tests
bootstrap methods
order statistics

Ranks and ties

The ranks

A rank refers to the position of a number within an ordered list.

First, arrange the numbers in ascending order to generate ranks from a list. The rank of each number is its position in this ordered list.

Here’s an example with this list:

[3, 5, 2, 8, 4]

In ascending order:

[2, 3, 4, 5, 8]

Now we assign ranks to each value:

2 → rank 1

3 → rank 2

4 → rank 3

5 → rank 4

8 → rank 5

So in our original list [3, 5, 2, 8, 4], the corresponding ranks are [2, 4, 1, 5, 3]

The ties

A challenge occurs when the list contains identical numbers, known as ties.

For example, in the following list, there is a tie:

[3, 5, 2, 5, 4]

When we order it we have:

[2, 3, 4, 5, 5]

In the transformation to ranks:

2 —> rank 1

3 —> rank 2

4 —> rank 3

5 —> ?

In these cases, several methods can be used to assign rank values to ties:

Average: this is the most common. Ties receive an average rank; in this example, each value of 5 will have a rank of 4.5 (4+5/2) where 4 and 5 are the rank values in the positions occupied by the tied values. Therefore, the final rank will be [2, 4.5, 1, 4.5, 3].
Minimum: ties receive the lowest rank value
Maximum: ties receive the highest rank value
Dense: no gaps between ties
Sequential: simply orders them following the index

Python has a useful function to transform a list into ranks: rankdata.

Below we use this function to illustrate how ranks work.

from scipy.stats import rankdata
import pandas as pd

# Eamples
datasets = {
    "Values without ties": [1, 2, 3, 4, 5],
    "Values with simple ties": [1, 2, 2, 3, 4],
    "Values with multiple ties": [4, 4, 4, 2, 2],
    "Values with dense repetition": [5, 5, 6, 6, 6, 7]
}

# Calculate ranks with different methods
methods = ["average", "min", "max", "dense", "ordinal"]
rows = []

for name, values in datasets.items():
    for method in methods:
        ranks = rankdata(values, method=method)
        rows.append({
            "Dataset": name,
            "Method": method,
            "Values": values,
            "Ranks": ranks.tolist()
        })

# Convert to DataFrame for display
rank_df = pd.DataFrame(rows)
print(rank_df)

Usefulness of Ranks

Ranks offer several key advantages:

They are resistant to outliers
They don’t require known distributions since they work with relative positions rather than distributions
They enable comparisons between datasets with different scales
They remain consistent even after data transformations

Counts

A count represents how many times a specific event occurs within a dataset.

For example, when evaluating a drug’s effectiveness in a patient group, counts tell us how many patients recovered versus how many did not.

Counts are useful for constructing contingency tables:

Therapy	Recovered	Not Recovered
Treatment A	30	10
Treatment B	20	40

Additionally, counts can be used to construct distributions of categorical variables.

Finally, they are used in the chi-square test to verify if observed frequencies differ from expected ones.

Proportions

A proportion is a value between 0 and 1 that represents what fraction of a sample has a particular characteristic.

For example, if 20 patients out of 100 test positive on a diagnostic test, the proportion is 20/100 = 0.2.

Proportions are useful for comparing percentages between groups, estimating probabilities, and conducting hypothesis tests based on proportion data

Sign test

The sign test is simpler than the rank test since it only examines whether differences between pairs are positive or negative, disregarding the size of these differences.

Permutation test

This test evaluates whether two samples come from the same distribution by comparing statistics after random shuffling of the data.

Bootstrap method

This method simulates the sampling process multiple times by drawing new observations from the original sample, either with or without replacement. This enables group comparisons and calculations of p-values and confidence intervals.

Order statistics

These are statistical measures derived from the relative positions of data points when arranged in ascending order. Order statistics include fundamental descriptive measures like the minimum (smallest value), maximum (largest value), median (middle value), and quartiles (values that divide the ordered dataset into four equal parts). These statistics are particularly valuable because they provide insights about the data’s distribution and spread without making assumptions about its underlying probability distribution. For example, the median is more robust to outliers than the mean, making it especially useful when dealing with skewed data or datasets containing extreme values.

Applications of nonparametric statistics

The main commonly applied nonparametric tests are:

Kind of data	Test
Two independent groups	Mann-Whitney U test
Paired measurements	Wilcoxon signed-rank test
More than two groups	Kruskal-Wallis H test
Relationship between ordinal variables	Spearman’s rank correlation
Ordinal variables more than two groups	Friedman test
Relationship between nominal data or frequencies	Chi-square test

Python for nonparametric statistics

Python’s nonparametric tests are implemented in the scipy.stats library.

Mann-Whitney U test for two independent groups

# mann_whitney_test.py
import numpy as np
from scipy.stats import mannwhitneyu

# Simulated data: two independent groups (e.g. two treatments)
group_A = np.array([120, 130, 125, 118, 135])
group_B = np.array([110, 115, 112, 108, 120])

# Mann–Whitney U test (two-sided by default)
statistic, p_value = mannwhitneyu(group_A, group_B, alternative='two-sided')

# Output
print("=== Mann–Whitney U Test ===")
print(f"Group A: {group_A}")
print(f"Group B: {group_B}")
print(f"U statistic = {statistic}")
print(f"p-value = {p_value:.4f}")

# Simple interpretation
alpha = 0.05
if p_value < alpha:
    print("→ Significant difference between groups (reject H0)")
else:
    print("→ No significant difference (fail to reject H0)")

Wilcoxon signed-rank test for paired measurements

# wilcoxon_test.py
import numpy as np
from scipy.stats import wilcoxon

# Simulated paired data (e.g. before and after treatment)
before = np.array([140, 135, 142, 138, 145])
after = np.array([132, 130, 138, 135, 139])

# Perform the Wilcoxon signed-rank test (two-sided by default)
statistic, p_value = wilcoxon(before, after, alternative='two-sided')

# Output results
print("=== Wilcoxon Signed-Rank Test ===")
print(f"Before: {before}")
print(f"After:  {after}")
print(f"W statistic = {statistic}")
print(f"p-value = {p_value:.4f}")

# Simple interpretation
alpha = 0.05
if p_value < alpha:
    print("→ Significant difference between paired observations (reject H0)")
else:
    print("→ No significant difference (fail to reject H0)")

Kruskal-Wallis H test for comparing more than two groups

# kruskal_wallis_test.py
import numpy as np
from scipy.stats import kruskal

# Simulated data: three independent groups (e.g. three treatment types)
group_A = np.array([88, 90, 85, 87, 89])
group_B = np.array([78, 75, 80, 77, 76])
group_C = np.array([92, 94, 95, 91, 93])

# Perform the Kruskal–Wallis H test
statistic, p_value = kruskal(group_A, group_B, group_C)

# Output results
print("=== Kruskal–Wallis H Test ===")
print(f"Group A: {group_A}")
print(f"Group B: {group_B}")
print(f"Group C: {group_C}")
print(f"H statistic = {statistic:.4f}")
print(f"p-value = {p_value:.4f}")

# Simple interpretation
alpha = 0.05
if p_value < alpha:
    print("→ At least one group differs significantly (reject H0)")
else:
    print("→ No significant differences between the groups (fail to reject H0)")

Spearman’s rank correlation for relationships between ordinal variables

# spearman_correlation.py
import numpy as np
from scipy.stats import spearmanr

# Simulated data: e.g. clinical scores and biomarker levels
# (can be ordinal, non-normal, or monotonic)
scores = np.array([78, 85, 90, 95, 88, 70, 75, 80, 92, 87])
biomarker = np.array([0.42, 0.50, 0.55, 0.60, 0.52, 0.40, 0.45, 0.48, 0.58, 0.53])

# Perform Spearman's rank correlation test
correlation, p_value = spearmanr(scores, biomarker)

# Output results
print("=== Spearman Rank Correlation ===")
print(f"Scores:    {scores}")
print(f"Biomarker: {biomarker}")
print(f"Spearman’s rho = {correlation:.4f}")
print(f"p-value = {p_value:.4f}")

# Simple interpretation
alpha = 0.05
if p_value < alpha:
    print("→ Significant monotonic correlation (reject H0)")
else:
    print("→ No significant monotonic correlation (fail to reject H0)")

Fiedman test for relationship between more than two ordinal variables

# friedman_test.py
import numpy as np
from scipy.stats import friedmanchisquare

# Simulated repeated measures data (e.g. VAS pain scores for 3 treatments on same patients)
# Each row = one subject; each column = one treatment condition
scores = np.array([
    [6, 5, 4],
    [7, 6, 5],
    [5, 4, 3],
    [8, 7, 6],
    [6, 6, 5],
    [7, 5, 5],
    [6, 4, 4],
    [5, 5, 4],
    [7, 6, 5],
    [6, 5, 4]
])

# Unpack columns to pass to friedmanchisquare
treatment1 = scores[:, 0]
treatment2 = scores[:, 1]
treatment3 = scores[:, 2]

# Perform Friedman test
statistic, p_value = friedmanchisquare(treatment1, treatment2, treatment3)

# Output results
print("=== Friedman Test ===")
print("Treatment 1:", treatment1)
print("Treatment 2:", treatment2)
print("Treatment 3:", treatment3)
print(f"Chi-squared statistic = {statistic:.4f}")
print(f"p-value = {p_value:.4f}")

# Simple interpretation
alpha = 0.05
if p_value < alpha:
    print("→ At least one treatment differs significantly (reject H0)")
else:
    print("→ No significant differences between treatments (fail to reject H0)")

Chi-square test for relationships between nominal data and frequencies

# chi_squared_test.py
import numpy as np
from scipy.stats import chi2_contingency

# Simulated 2x2 contingency table
# Example: Treatment success vs failure in two groups
#           Success   Failure
# Group A     30         10
# Group B     20         25
table = np.array([
    [30, 10],
    [20, 25]
])

# Perform Chi-squared test
chi2_stat, p_value, dof, expected = chi2_contingency(table)

# Output results
print("=== Chi-squared Test of Independence ===")
print("Observed frequencies:")
print(table)
print("\\nExpected frequencies under H0:")
print(np.round(expected, 2))
print(f"\\nChi-squared statistic = {chi2_stat:.4f}")
print(f"Degrees of freedom = {dof}")
print(f"p-value = {p_value:.4f}")

# Simple interpretation
alpha = 0.05
if p_value < alpha:
    print("→ Significant association between variables (reject H0)")
else:
    print("→ No significant association (fail to reject H0)")

Nonparametric Statistics in Medicine

In medical research, we frequently encounter various scenarios and conditions where traditional parametric tests may be unreliable or incorrect. These situations commonly include cases with limited data availability, the presence of significant outliers that can skew results, datasets that deviate from the normal distribution pattern, or when working with ordinal data where the intervals between values cannot be assumed to be equal.

In such challenging circumstances, nonparametric statistical methods prove invaluable by providing more reliable and trustworthy results. Their reliability stems from their distribution-free nature, meaning they don’t assume the data follows any particular probability distribution. Additionally, these methods demonstrate superior robustness compared to their parametric counterparts when analyzing datasets complicated by outliers or limited by small sample sizes. This robustness ensures that the statistical conclusions remain valid even under less-than-ideal conditions.

Moreover, researchers have access to sophisticated normality tests, particularly the Shapiro-Wilks and Kolmogorov-Smirnov tests, which serve as essential diagnostic tools. These tests play a crucial role in the statistical analysis process by enabling researchers to systematically evaluate variables and make informed decisions about when to apply nonparametric methods. This systematic approach to choosing appropriate statistical tests strengthens the validity of research findings.

Limitations of nonparametric statistics

First of all, it should be clarified that nonparametric statistics is less powerful than parametric statistics. This means that a larger number of observations is necessary to detect significant differences.

Since nonparametric tests are based on the relative positions of data rather than their absolute values, this results in a loss of information.

Often the results are not as immediately readable and interpretable, especially for clinicians for whom the concepts of “mean and standard deviation” are certainly more familiar than those of “rank positions”.

Finally, when making comparisons between more than two variables, it may be necessary to perform post-hoc tests with corrections (for example Bonferroni) to identify differences between pairs, which makes the statistical analysis and its interpretation more complex.

Conclusion

Nonparametric statistical methods have established themselves as fundamental and indispensable tools in medical research, particularly when dealing with real-world data that often fails to meet the strict assumptions required by parametric approaches. Their versatility and reliability make them essential components of the modern medical researcher’s analytical toolkit.