correlation_coefficient

formula of pearson's r value¶

$$r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \sum_{i=1}^{n}(y_i - \bar{y})^2}}$$

Variables:

r: Represents the Pearson correlation coefficient. This value ranges from -1 to +1, indicating the strength and direction of the linear relationship between two variables.

n: Represents the total number of data points or observations in the dataset.

xi: Represents the individual value of the first variable (x) for the i-th data point.

yi: Represents the individual value of the second variable (y) for the i-th data point.

x̄: Represents the mean (average) of the values for variable x.

ȳ: Represents the mean (average) of the values for variable y.

Σ: Represents the summation symbol, indicating that we need to sum the values for all data points from i = 1 to n.

In simpler terms:

The formula calculates the covariance of x and y divided by the product of their standard deviations. This essentially measures how much the two variables change together relative to how much they change individually.

In [5]:

# a demon for correlation coefficient close to 1, -1, 0 and pearson's r value

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

# Generate data for correlation close to 1
np.random.seed(0)
x1 = np.linspace(0, 10, 100)
y1 = x1 + np.random.normal(0, 1, 100)  # Add some noise

# Generate data for correlation close to -1
x2 = np.linspace(0, 10, 100)
y2 = -x2 + np.random.normal(0, 1, 100)  # Add some noise

# Generate data for correlation close to 0
x3 = np.random.rand(100)
y3 = np.random.rand(100)

# Calculate Pearson's r for each dataset
r1, p1 = pearsonr(x1, y1)
r2, p2 = pearsonr(x2, y2)
r3, p3 = pearsonr(x3, y3)

# Create the plot
plt.figure(figsize=(15, 5))

# Plot 1: Correlation close to 1
plt.subplot(1, 3, 1)
plt.scatter(x1, y1)
plt.title(f"Correlation ~ 1 (r = {r1:.2f})")
plt.xlabel("X")
plt.ylabel("Y")

# Plot 2: Correlation close to -1
plt.subplot(1, 3, 2)
plt.scatter(x2, y2)
plt.title(f"Correlation ~ -1 (r = {r2:.2f})")
plt.xlabel("X")
plt.ylabel("Y")


# Plot 3: Correlation close to 0
plt.subplot(1, 3, 3)
plt.scatter(x3, y3)
plt.title(f"Correlation ~ 0 (r = {r3:.2f})")
plt.xlabel("X")
plt.ylabel("Y")

plt.tight_layout()
plt.show()

print(f"Pearson's r for dataset 1: {r1:.2f}")
print(f"Pearson's r for dataset 2: {r2:.2f}")
print(f"Pearson's r for dataset 3: {r3:.2f}")

Pearson's r for dataset 1: 0.94
Pearson's r for dataset 2: -0.95
Pearson's r for dataset 3: -0.03

XYZ CODE

Understanding Correlation Coefficients: A Comprehensive Guide

formula of pearson's r value¶

No comments:

Post a Comment

​Understanding Correlation Coefficients: A Comprehensive Guide​

formula of pearson's r value¶

No comments:

Post a Comment

Understanding Correlation Coefficients: A Comprehensive Guide