formula of pearson's r value¶
r=∑ni=1(xi−ˉx)(yi−ˉy)√∑ni=1(xi−ˉx)2∑ni=1(yi−ˉy)2Variables:
r: Represents the Pearson correlation coefficient. This value ranges from -1 to +1, indicating the strength and direction of the linear relationship between two variables.
n: Represents the total number of data points or observations in the dataset.
xi: Represents the individual value of the first variable (x) for the i-th data point.
yi: Represents the individual value of the second variable (y) for the i-th data point.
x̄: Represents the mean (average) of the values for variable x.
ȳ: Represents the mean (average) of the values for variable y.
Σ: Represents the summation symbol, indicating that we need to sum the values for all data points from i = 1 to n.
In simpler terms:
The formula calculates the covariance of x and y divided by the product of their standard deviations. This essentially measures how much the two variables change together relative to how much they change individually.
# a demon for correlation coefficient close to 1, -1, 0 and pearson's r value
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
# Generate data for correlation close to 1
np.random.seed(0)
x1 = np.linspace(0, 10, 100)
y1 = x1 + np.random.normal(0, 1, 100) # Add some noise
# Generate data for correlation close to -1
x2 = np.linspace(0, 10, 100)
y2 = -x2 + np.random.normal(0, 1, 100) # Add some noise
# Generate data for correlation close to 0
x3 = np.random.rand(100)
y3 = np.random.rand(100)
# Calculate Pearson's r for each dataset
r1, p1 = pearsonr(x1, y1)
r2, p2 = pearsonr(x2, y2)
r3, p3 = pearsonr(x3, y3)
# Create the plot
plt.figure(figsize=(15, 5))
# Plot 1: Correlation close to 1
plt.subplot(1, 3, 1)
plt.scatter(x1, y1)
plt.title(f"Correlation ~ 1 (r = {r1:.2f})")
plt.xlabel("X")
plt.ylabel("Y")
# Plot 2: Correlation close to -1
plt.subplot(1, 3, 2)
plt.scatter(x2, y2)
plt.title(f"Correlation ~ -1 (r = {r2:.2f})")
plt.xlabel("X")
plt.ylabel("Y")
# Plot 3: Correlation close to 0
plt.subplot(1, 3, 3)
plt.scatter(x3, y3)
plt.title(f"Correlation ~ 0 (r = {r3:.2f})")
plt.xlabel("X")
plt.ylabel("Y")
plt.tight_layout()
plt.show()
print(f"Pearson's r for dataset 1: {r1:.2f}")
print(f"Pearson's r for dataset 2: {r2:.2f}")
print(f"Pearson's r for dataset 3: {r3:.2f}")
Pearson's r for dataset 1: 0.94 Pearson's r for dataset 2: -0.95 Pearson's r for dataset 3: -0.03
No comments:
Post a Comment