Economics · 4th Semester · GU CBCS

Correlation Analysis

Introductory Quantitative Techniques for Economics — Unit 5

Topics: 4
Marks: 20
Type: Statistics-Based

📚 Table of Contents

  1. What is Correlation?
  2. Types of Correlation
  3. Scatter Diagrams
  4. Karl Pearson's Linear Correlation
  5. Properties of r
  6. Pearson's r — Worked Example
  7. Spearman's Rank Correlation
  8. Rank Correlation — Worked Example
  9. Partial Correlation
  10. Multiple Correlation
  11. Comparison Table
  12. Exam Corner
01

What is Correlation?

Definition

Correlation is a statistical technique that measures the degree and direction of the linear relationship between two or more variables. It answers: "When one variable changes, does the other variable change too — and by how much?"

🔑 Key Concept
Correlation only measures association, not causation. Two variables can be correlated without one causing the other.

📌 Examples in Economics

  • 💹 Income & Consumption — Positive
  • 📉 Price & Demand — Negative
  • 🏭 Employment & Output — Positive
  • 💰 Interest rate & Investment — Negative
  • 🌡️ Rainfall & Shoe size — Zero

📐 Notation

We study relationship between variables:

Variables X = Independent variable
Y = Dependent variable
n = Number of pairs of observations

The correlation coefficient is dimensionless — it has no units.

💡 Memory Trick
"Correlation is the SHADOW of causation" — two things moving together (correlation) doesn't mean one is dragging the other (causation). Always say: "r shows association, not cause-effect."
02

Types of Correlation

A. Based on Direction

Positive Correlation

Both variables move in the same direction. When X increases, Y increases; when X decreases, Y decreases.

r > 0e.g. Income & Expenditure

Negative Correlation

Variables move in opposite directions. When X increases, Y decreases and vice versa.

r < 0e.g. Price & Demand

Zero Correlation

No systematic linear relationship between X and Y. Changes in one variable do not affect the other.

r = 0e.g. Shoe size & Intelligence

B. Based on Degree

−1
Perfect Negative
−0.75 to −0.5
High Neg
0
No Corr
+0.5 to +0.75
High Pos
+1
Perfect Positive
Value of rDegreeInterpretation
r = +1Perfect PositiveAll points exactly on an upward line
0.75 to 0.99High PositiveStrong positive association
0.25 to 0.74Moderate PositiveModerate positive association
0 to 0.24Low PositiveWeak positive association
r = 0ZeroNo linear relationship
−0.24 to 0Low NegativeWeak negative association
−0.74 to −0.25Moderate NegativeModerate negative association
−0.99 to −0.75High NegativeStrong negative association
r = −1Perfect NegativeAll points exactly on a downward line

C. Based on Nature (Number of Variables)

Simple (Linear)

Relationship between two variables. e.g. r between X and Y only.

rXY

Partial

Correlation between two variables while holding others constant.

r12.3

Multiple

Correlation between one variable and a set of other variables simultaneously.

R1.23
03

Scatter Diagrams

What is a Scatter Diagram?

A scatter diagram (scatter plot) is a graphical method of studying correlation. Each pair (X, Y) is plotted as a point. The pattern of dots reveals the type and degree of correlation.

⚠️ Important
Scatter diagrams give only a rough idea of correlation — they cannot give the exact numerical value. For that, we use Karl Pearson's r or Spearman's ρ.
Six Types of Scatter Patterns
Perfect Positive (r=+1) X → Perfect Negative (r=−1) X → No Correlation (r≈0) X → High Positive (r≈+0.9) X → Moderate Positive (r≈+0.6) X → High Negative (r≈−0.9) X →
📖 Exam Tip
In exams you may be asked: "Draw scatter diagrams showing (i) perfect positive, (ii) negative and (iii) no correlation." — Practice drawing all six patterns above.
04

Karl Pearson's Linear Correlation Coefficient

What is r?

Pearson's correlation coefficient r (also called the product-moment correlation coefficient) is the most widely used measure of linear correlation. It gives a precise numerical value of the strength and direction of the linear relationship between two quantitative variables.

Developed by Karl Pearson (1896) Range: −1 ≤ r ≤ +1 Dimensionless

Formula (Standard Form)

Karl Pearson's r — Definition Form r = Σ(X − X̄)(Y − Ȳ) ────────────────────────────── √[Σ(X − X̄)²] × √[Σ(Y − Ȳ)²]

Where X̄ = mean of X, Ȳ = mean of Y

Shortcut Formula (Calculation Form) — Most Used in Exams

Pearson's r — Shortcut (Direct Method) r = n·ΣXY − ΣX·ΣY ─────────────────────────────────────────────── √[n·ΣX² − (ΣX)²] × √[n·ΣY² − (ΣY)²]

Deviation from Mean Form (When X̄, Ȳ are integers)

Using deviations dx = X − X̄, dy = Y − Ȳ r = Σdx·dy ────────────────────────── √(Σdx²) × √(Σdy²)

What each term means

SymbolMeaning
nNumber of paired observations
ΣXYSum of products of X and Y
ΣX, ΣYSum of all X values, Y values
ΣX², ΣY²Sum of squares of X, Y
X̄, ȲArithmetic mean of X, Y

Columns to make in exams

  1. X values and Y values (given)
  2. X² column (square each X)
  3. Y² column (square each Y)
  4. XY column (multiply each pair)
  5. Find ΣX, ΣY, ΣX², ΣY², ΣXY
  6. Apply shortcut formula
05

Properties of Karl Pearson's r

7 Key Properties — Must Know for Exam

Property 1 — Range
The value of r always lies between −1 and +1 inclusive:
−1 ≤ r ≤ +1
Property 2 — Direction
If r > 0 → Positive correlation. If r < 0 → Negative correlation. If r = 0 → No linear correlation.
Property 3 — Pure Number
r is a pure number (dimensionless). It is independent of the units of measurement of X and Y. Changing units does not change r.
Property 4 — Symmetry
rXY = rYX. The correlation between X and Y equals the correlation between Y and X.
Property 5 — Change of Origin & Scale
r is unchanged if we substitute u = (X − a)/h and v = (Y − b)/k (change of origin and scale). This is used in the assumed mean shortcut.
Property 6 — Perfect Correlation
r = ±1 only when all points lie exactly on a straight line. r = +1: perfectly upward line. r = −1: perfectly downward line.
Property 7 — Limitation
r measures only linear relationships. Even if r = 0, there may be a strong non-linear (curved) relationship between X and Y.

r and Regression Coefficients

Relationship between r and Regression Coefficients bYX, bXY r = √(bYX × bXY)

Where: bYX = regression coefficient of Y on X bXY = regression coefficient of X on Y
⚠️ Sign Rule
bYX and bXY must have the same sign. If both are positive → r is positive. If both negative → r is negative. They can never have opposite signs.
💡 Memory Trick
"r is the Geometric Mean of the two regression bees (b's)"
r = √(b₁ × b₂) — just like geometric mean of two numbers.
06

Pearson's r — Full Worked Example

Problem

Calculate Karl Pearson's coefficient of correlation for the following data:

Obs.X (Income)Y (Expenditure)
1108
21411
31814
42216
52620
63024

Step 1 — Build the Calculation Table

XYXY
1081006480
1411196121154
1814324196252
2216484256352
2620676400520
3024900576720
ΣX = 120ΣY = 93ΣX² = 2680ΣY² = 1613ΣXY = 2078

n = 6

Step 2 — Apply the Shortcut Formula

  1. n·ΣXY = 6 × 2078 = 12,468
  2. ΣX·ΣY = 120 × 93 = 11,160
  3. Numerator = 12,468 − 11,160 = 1,308
  4. n·ΣX² − (ΣX)² = 6×2680 − 120² = 16,080 − 14,400 = 1,680
  5. n·ΣY² − (ΣY)² = 6×1613 − 93² = 9,678 − 8,649 = 1,029
  6. Denominator = √1680 × √1029 = 40.99 × 32.08 = 1314.96
Final Answer r = 1308 / 1314.96 ≈ 0.995
✅ Interpretation
r ≈ +0.995 → Very high positive correlation between income and expenditure. As income rises, expenditure rises almost proportionally.
07

Spearman's Rank Correlation (ρ)

Why Rank Correlation?

Spearman's Rank Correlation Coefficient ρ (rho) is used when:

  • ✦ Data is given in qualitative form (e.g. beauty, intelligence, character) that can only be ranked
  • ✦ Variables cannot be measured precisely in numbers
  • ✦ The distribution is heavily skewed (non-normal)
  • ✦ Exact values are not available, only ranks are given
Developed by Charles Spearman (1904) Range: −1 ≤ ρ ≤ +1 Non-parametric

Case A — When Ranks Are Not Repeated (No Ties)

Spearman's Rank Correlation Formula ρ = 1 − 6·Σd² ───────────── n(n² − 1)

Where: d = R₁ − R₂ (difference of ranks for each pair) n = number of pairs

Step-by-step procedure

  1. Assign ranks to X: rank 1 to highest (or lowest — be consistent)
  2. Assign ranks to Y similarly
  3. Find d = R₁ − R₂ for each pair
  4. Find d² for each pair
  5. Sum all d² to get Σd²
  6. Apply formula: ρ = 1 − 6Σd² / n(n²−1)
⚠️ Ranking Rule

Give rank 1 to the largest value (or smallest — use the same convention for both X and Y). Verify: sum of all ranks = n(n+1)/2.

Check: Sum of Ranks ΣR = n(n+1)/2
e.g., n=6: ΣR = 6×7/2 = 21 ✓

Case B — When Ranks Are Repeated (With Ties)

When two or more items have the same value, they are given the average of the ranks they would have received. A correction factor is added to Σd².

Spearman's Formula with Tie Correction ρ = 1 − 6[Σd² + CF₁ + CF₂ + …] ───────────────────────────────── n(n² − 1)

Correction Factor (CF) for each tied group: CF = (m³ − m) / 12

Where m = number of items with the same rank (size of the tied group)
🔑 How to Handle Ties
If 3 items are tied at ranks 4, 5, 6 → each gets rank (4+5+6)/3 = 5. Then CF = (3³−3)/12 = (27−3)/12 = 24/12 = 2. Add one CF for each tied group in either variable.
📖 Example — Tie in X
X values: 8, 5, 5, 3. The two 5's are tied at ranks 2 and 3, so both get rank (2+3)/2 = 2.5. CF = (2³−2)/12 = 6/12 = 0.5.
📖 Example — Three-way Tie
Y values: 10, 7, 7, 7, 4. Three 7's tied at ranks 2, 3, 4 → each gets rank 3. CF = (3³−3)/12 = 24/12 = 2.

Pearson's r vs Spearman's ρ

AspectPearson's rSpearman's ρ
Data typeQuantitative, continuousOrdinal / ranked data
AssumptionNormal distributionNo distributional assumption
RelationshipLinear onlyMonotonic (linear or nonlinear)
OutliersSensitive to outliersLess sensitive
CalculationMore complexSimpler formula
NatureParametricNon-parametric
💡 Memory Trick
"SPEARMAN for SUBJECTIVE situations" — Beauty contest, teaching quality, management skills → use Spearman's ρ. Exact measurable data → use Pearson's r.
08

Rank Correlation — Worked Examples

Example A — Ranks Not Given (No Ties)

Ten students were given marks in Economics (X) and Mathematics (Y). Find Spearman's ρ.

StudentX (Eco)Y (Maths)R₁R₂d = R₁−R₂
175804311
288922111
3605589−11
4908812−11
555609811
670726511
74550101000
8726856−11
9827434−11
1065657700
Σd² =8
Applying Formula (n = 10, Σd² = 8) ρ = 1 − 6×8 / 10(100−1) = 1 − 48 / 990 = 1 − 0.0485 = 0.951
✅ Interpretation
ρ = 0.951 → Very high positive rank correlation. Students who score high in Economics also tend to score high in Mathematics.

Example B — With Tied Ranks

Find Spearman's ρ for the following data where some values are repeated.

PairXYR₁R₂d
150405411
265553211
375651100
465503300
545356511
6653036−39
770552200
Σd² =12
  1. X = 65 appears 3 times → ranks 2, 3, 4 → each gets (2+3+4)/3 = 3 → CF = (3³−3)/12 = 24/12 = 2
  2. Y = 55 appears 2 times → ranks 1, 2 → each gets (1+2)/2 = 1.5 … but in our table above Y=55 appears twice → CF = (2³−2)/12 = 6/12 = 0.5
With Tie Correction (Σd² = 12, n = 7, CF_X = 2, CF_Y = 0.5) ρ = 1 − 6[12 + 2 + 0.5] / 7(49 − 1) = 1 − 6 × 14.5 / 7 × 48 = 1 − 87 / 336 = 1 − 0.259 = 0.741
✅ Interpretation
ρ ≈ +0.74 → High positive rank correlation between the two series.
09

Partial Correlation

What is Partial Correlation?

Partial correlation measures the degree of linear relationship between two variables while controlling for (holding constant) the effect of one or more other variables.

🔑 Why it matters
Simple correlation between X₁ and X₂ may be spurious (false) because of a third variable X₃ influencing both. Partial correlation removes this effect to reveal the true relationship.
Notation: r12.3 "correlation between 1 and 2, holding 3 constant"

Classic Spurious Correlation Example

Why Partial Correlation is Needed
X₃ (Third Variable) e.g. GDP / National Income X₁ (Variable 1) e.g. Public Expenditure X₂ (Variable 2) e.g. Crime Rate Spurious r₁₂ = 0.82? After controlling for X₃: r₁₂.₃ ≈ 0.12 (very low!)

Formulas for Partial Correlation

Given three variables X₁, X₂, X₃ with simple correlations r₁₂, r₁₃, r₂₃:

First Order Partial Correlation — r₁₂.₃ (controlling X₃) r₁₂.₃ = r₁₂ − r₁₃ · r₂₃ ────────────────────────────────────── √(1 − r₁₃²) × √(1 − r₂₃²)
First Order Partial Correlation — r₁₃.₂ (controlling X₂) r₁₃.₂ = r₁₃ − r₁₂ · r₂₃ ────────────────────────────────────── √(1 − r₁₂²) × √(1 − r₂₃²)
First Order Partial Correlation — r₂₃.₁ (controlling X₁) r₂₃.₁ = r₂₃ − r₁₂ · r₁₃ ────────────────────────────────────── √(1 − r₁₂²) × √(1 − r₁₃²)
⚠️ Note on Order
Zero order = simple correlations (r₁₂, r₁₃, r₂₃) — no variable controlled.
First order = one variable held constant (r₁₂.₃) — the digit after the dot indicates which variable is controlled.
Second order = two variables held constant (r₁₂.₃₄) — rarely tested at UG level.

Worked Example — Partial Correlation

Given: r₁₂ = 0.6, r₁₃ = 0.5, r₂₃ = 0.4. Find r₁₂.₃.

  1. Numerator: r₁₂ − r₁₃ × r₂₃ = 0.6 − (0.5 × 0.4) = 0.6 − 0.20 = 0.40
  2. √(1 − r₁₃²) = √(1 − 0.25) = √0.75 = 0.866
  3. √(1 − r₂₃²) = √(1 − 0.16) = √0.84 = 0.917
  4. Denominator: 0.866 × 0.917 = 0.794
Result r₁₂.₃ = 0.40 / 0.794 = 0.504
✅ Interpretation
While simple r₁₂ = 0.6, after removing the effect of X₃, the partial correlation r₁₂.₃ = 0.504. The correlation reduced — some of r₁₂ was due to the common influence of X₃.
💡 Memory Trick
"Partial = Remove the Pest (the third variable)" — Partial correlation purges the influence of the unwanted third variable, leaving the pure relationship between the two variables of interest.
10

Multiple Correlation

What is Multiple Correlation?

Multiple correlation coefficient R measures the degree of linear relationship between one dependent variable and two or more independent variables jointly.

Unlike partial correlation (which removes effects), multiple correlation measures how well a set of predictors together explain the dependent variable.

Notation: R₁.₂₃ Range: 0 ≤ R ≤ 1 R = 1 → Perfect Fit

Key Difference from Simple & Partial Correlation

FeatureSimple rPartial rMultiple R
Variables involved22 (controlling others)1 + (2 or more)
Direction+ or −+ or −Always + (0 to 1)
PurposeAssociation of 2 varsPure association after controlJoint predictive power
Notationr₁₂r₁₂.₃R₁.₂₃

Formula for Multiple Correlation Coefficient R₁.₂₃

R₁.₂₃ = correlation of X₁ with both X₂ and X₃ simultaneously.

R₁.₂₃ — Formula using simple correlations R₁.₂₃² = r₁₂² + r₁₃² − 2·r₁₂·r₁₃·r₂₃ ────────────────────────────────────── 1 − r₂₃²
⚠️ Important
R₁.₂₃ is always positive (it is the positive square root). It ranges from 0 to 1 only — unlike r and ρ which can be negative. R = 0 means no relationship; R = 1 means perfect linear relationship.

Relationship between R and r (key property)

Coefficient of Multiple Determination R₁.₂₃² = Explained variation / Total variation

Also: R₁.₂₃ ≥ |r₁₂| and R₁.₂₃ ≥ |r₁₃| (Adding more predictors never reduces R)

Worked Example — Multiple Correlation

Given: r₁₂ = 0.7, r₁₃ = 0.6, r₂₃ = 0.4. Find R₁.₂₃.

  1. Numerator: r₁₂² + r₁₃² − 2·r₁₂·r₁₃·r₂₃ = (0.49) + (0.36) − 2×0.7×0.6×0.4 = 0.85 − 0.336 = 0.514
  2. Denominator: 1 − r₂₃² = 1 − 0.16 = 0.84
  3. R₁.₂₃² = 0.514 / 0.84 = 0.6119
Result R₁.₂₃ = √0.6119 = 0.782
✅ Interpretation
R₁.₂₃ = 0.782 → X₁ has a high positive multiple correlation with X₂ and X₃ jointly. About 61.19% (= R² × 100) of the variation in X₁ is explained by X₂ and X₃ together.
Multiple Correlation — Conceptual Diagram
X₁ (Dependent) X₂ (Predictor) X₃ (Predictor) r₁₂ r₁₃ r₂₃ R₁.₂₃ = combined effect of X₂ and X₃ on X₁
💡 Memory Trick
"R is ALWAYS positive — it's the REPORT CARD of predictors" — R² (coefficient of determination) tells you the % of the dependent variable's variation explained. R = 1 is a perfect report card; R = 0 means predictors are useless.
11

Comparison: All Four Types at a Glance

Feature Linear Correlation (r) Rank Correlation (ρ) Partial Correlation Multiple Correlation (R)
Developed by Karl Pearson Charles Spearman Multiple authors Multiple authors
Range −1 to +1 −1 to +1 −1 to +1 0 to +1 only
Variables 2 (X & Y) 2 (ranks) 2 (controlling 1+) 1 dependent + 2+ predictors
Data type Quantitative Ordinal or ranked Quantitative Quantitative
Measures Linear association Monotonic association Pure association (after control) Joint predictive power
Key formula nΣXY−ΣXΣY / √[…] 1 − 6Σd²/n(n²−1) (r₁₂−r₁₃r₂₃)/√[…] √[(r₁₂²+r₁₃²−2r₁₂r₁₃r₂₃)/(1−r₂₃²)]
Outlier sensitivity High Low Moderate Moderate
Can be negative? Yes Yes Yes No (always ≥ 0)
Which Correlation to Use? — Decision Flowchart
How many variables? 2 vars 3+ vars Is data ranked? Control others? Yes No Yes No Spearman's ρ Rank Correlation Pearson's r Linear Correlation Partial r r₁₂.₃ formula Multiple R R₁.₂₃ formula Beauty, grades, subjective data Income & spend, exact numeric data Remove third-var spurious effect Multiple predictors joint effect on Y

🎯 Exam Corner — Likely Questions (20 Marks)

2M Define correlation. Distinguish between positive and negative correlation with examples.
3M What is a scatter diagram? Draw scatter diagrams showing (i) perfect positive, (ii) negative, and (iii) no correlation.
3M State and explain any four properties of Karl Pearson's coefficient of correlation.
5M Calculate Karl Pearson's coefficient of correlation from the following data and interpret the result. [Data given in 5–8 pairs]
2M What is rank correlation? Under what circumstances is Spearman's rank correlation preferred over Pearson's r?
5M Calculate Spearman's rank correlation coefficient from the following data. (Data with equal ranks / ties — apply correction factor.)
3M Distinguish between simple, partial, and multiple correlation with suitable examples.
4M Given r₁₂ = 0.7, r₁₃ = 0.5, r₂₃ = 0.4, calculate (i) partial correlation r₁₂.₃ and (ii) multiple correlation R₁.₂₃.
2M Prove that the multiple correlation coefficient is always non-negative. OR Show that R₁.₂₃ ≥ r₁₂.
3M What is spurious correlation? How does partial correlation help in identifying it?

Last-Minute Formula Sheet

Pearson's r — Shortcut r = [n·ΣXY − ΣX·ΣY] / √{[n·ΣX²−(ΣX)²][n·ΣY²−(ΣY)²]}
Pearson's r — Deviation Form r = Σdxdy / √(Σdx² · Σdy²) ; dx=X−X̄, dy=Y−Ȳ
Spearman's ρ — No Ties ρ = 1 − 6Σd² / n(n²−1)
Spearman's ρ — With Ties ρ = 1 − 6[Σd² + ΣCF] / n(n²−1) ; CF = (m³−m)/12
Partial Correlation r₁₂.₃ r₁₂.₃ = (r₁₂ − r₁₃·r₂₃) / √[(1−r₁₃²)(1−r₂₃²)]
Multiple Correlation R₁.₂₃ R₁.₂₃ = √[(r₁₂²+r₁₃²−2·r₁₂·r₁₃·r₂₃) / (1−r₂₃²)]
r via Regression Coefficients r = √(bYX · bXY) ; sign = sign of b's
Coeff. of Multiple Determination R² = Explained variation / Total variation (0 ≤ R² ≤ 1)
Range & Sign Summary Pearson r : −1 ≤ r ≤ +1 (can be +ve or −ve) Spearman ρ : −1 ≤ ρ ≤ +1 (can be +ve or −ve) Partial r : −1 ≤ r₁₂.₃ ≤ +1 (can be +ve or −ve) Multiple R : 0 ≤ R ≤ +1 (ALWAYS non-negative)
Tie Correction Factor (CF) — Quick Reference 2 items tied → CF = (8−2)/12 = 0.5 3 items tied → CF = (27−3)/12 = 2.0 4 items tied → CF = (64−4)/12 = 5.0 Add one CF for each tied group in X, one for each tied group in Y