Introductory Quantitative Techniques for Economics — Unit 5
Correlation is a statistical technique that measures the degree and direction of the linear relationship between two or more variables. It answers: "When one variable changes, does the other variable change too — and by how much?"
We study relationship between variables:
The correlation coefficient is dimensionless — it has no units.
Both variables move in the same direction. When X increases, Y increases; when X decreases, Y decreases.
Variables move in opposite directions. When X increases, Y decreases and vice versa.
No systematic linear relationship between X and Y. Changes in one variable do not affect the other.
| Value of r | Degree | Interpretation |
|---|---|---|
| r = +1 | Perfect Positive | All points exactly on an upward line |
| 0.75 to 0.99 | High Positive | Strong positive association |
| 0.25 to 0.74 | Moderate Positive | Moderate positive association |
| 0 to 0.24 | Low Positive | Weak positive association |
| r = 0 | Zero | No linear relationship |
| −0.24 to 0 | Low Negative | Weak negative association |
| −0.74 to −0.25 | Moderate Negative | Moderate negative association |
| −0.99 to −0.75 | High Negative | Strong negative association |
| r = −1 | Perfect Negative | All points exactly on a downward line |
Relationship between two variables. e.g. r between X and Y only.
Correlation between two variables while holding others constant.
Correlation between one variable and a set of other variables simultaneously.
A scatter diagram (scatter plot) is a graphical method of studying correlation. Each pair (X, Y) is plotted as a point. The pattern of dots reveals the type and degree of correlation.
Pearson's correlation coefficient r (also called the product-moment correlation coefficient) is the most widely used measure of linear correlation. It gives a precise numerical value of the strength and direction of the linear relationship between two quantitative variables.
Where X̄ = mean of X, Ȳ = mean of Y
| Symbol | Meaning |
|---|---|
| n | Number of paired observations |
| ΣXY | Sum of products of X and Y |
| ΣX, ΣY | Sum of all X values, Y values |
| ΣX², ΣY² | Sum of squares of X, Y |
| X̄, Ȳ | Arithmetic mean of X, Y |
Calculate Karl Pearson's coefficient of correlation for the following data:
| Obs. | X (Income) | Y (Expenditure) |
|---|---|---|
| 1 | 10 | 8 |
| 2 | 14 | 11 |
| 3 | 18 | 14 |
| 4 | 22 | 16 |
| 5 | 26 | 20 |
| 6 | 30 | 24 |
| X | Y | X² | Y² | XY |
|---|---|---|---|---|
| 10 | 8 | 100 | 64 | 80 |
| 14 | 11 | 196 | 121 | 154 |
| 18 | 14 | 324 | 196 | 252 |
| 22 | 16 | 484 | 256 | 352 |
| 26 | 20 | 676 | 400 | 520 |
| 30 | 24 | 900 | 576 | 720 |
| ΣX = 120 | ΣY = 93 | ΣX² = 2680 | ΣY² = 1613 | ΣXY = 2078 |
n = 6
Spearman's Rank Correlation Coefficient ρ (rho) is used when:
Give rank 1 to the largest value (or smallest — use the same convention for both X and Y). Verify: sum of all ranks = n(n+1)/2.
When two or more items have the same value, they are given the average of the ranks they would have received. A correction factor is added to Σd².
| Aspect | Pearson's r | Spearman's ρ |
|---|---|---|
| Data type | Quantitative, continuous | Ordinal / ranked data |
| Assumption | Normal distribution | No distributional assumption |
| Relationship | Linear only | Monotonic (linear or nonlinear) |
| Outliers | Sensitive to outliers | Less sensitive |
| Calculation | More complex | Simpler formula |
| Nature | Parametric | Non-parametric |
Ten students were given marks in Economics (X) and Mathematics (Y). Find Spearman's ρ.
| Student | X (Eco) | Y (Maths) | R₁ | R₂ | d = R₁−R₂ | d² |
|---|---|---|---|---|---|---|
| 1 | 75 | 80 | 4 | 3 | 1 | 1 |
| 2 | 88 | 92 | 2 | 1 | 1 | 1 |
| 3 | 60 | 55 | 8 | 9 | −1 | 1 |
| 4 | 90 | 88 | 1 | 2 | −1 | 1 |
| 5 | 55 | 60 | 9 | 8 | 1 | 1 |
| 6 | 70 | 72 | 6 | 5 | 1 | 1 |
| 7 | 45 | 50 | 10 | 10 | 0 | 0 |
| 8 | 72 | 68 | 5 | 6 | −1 | 1 |
| 9 | 82 | 74 | 3 | 4 | −1 | 1 |
| 10 | 65 | 65 | 7 | 7 | 0 | 0 |
| — | Σd² = | 8 | ||||
Find Spearman's ρ for the following data where some values are repeated.
| Pair | X | Y | R₁ | R₂ | d | d² |
|---|---|---|---|---|---|---|
| 1 | 50 | 40 | 5 | 4 | 1 | 1 |
| 2 | 65 | 55 | 3 | 2 | 1 | 1 |
| 3 | 75 | 65 | 1 | 1 | 0 | 0 |
| 4 | 65 | 50 | 3 | 3 | 0 | 0 |
| 5 | 45 | 35 | 6 | 5 | 1 | 1 |
| 6 | 65 | 30 | 3 | 6 | −3 | 9 |
| 7 | 70 | 55 | 2 | 2 | 0 | 0 |
| — | Σd² = | 12 | ||||
Partial correlation measures the degree of linear relationship between two variables while controlling for (holding constant) the effect of one or more other variables.
Given three variables X₁, X₂, X₃ with simple correlations r₁₂, r₁₃, r₂₃:
Given: r₁₂ = 0.6, r₁₃ = 0.5, r₂₃ = 0.4. Find r₁₂.₃.
Multiple correlation coefficient R measures the degree of linear relationship between one dependent variable and two or more independent variables jointly.
Unlike partial correlation (which removes effects), multiple correlation measures how well a set of predictors together explain the dependent variable.
| Feature | Simple r | Partial r | Multiple R |
|---|---|---|---|
| Variables involved | 2 | 2 (controlling others) | 1 + (2 or more) |
| Direction | + or − | + or − | Always + (0 to 1) |
| Purpose | Association of 2 vars | Pure association after control | Joint predictive power |
| Notation | r₁₂ | r₁₂.₃ | R₁.₂₃ |
R₁.₂₃ = correlation of X₁ with both X₂ and X₃ simultaneously.
Given: r₁₂ = 0.7, r₁₃ = 0.6, r₂₃ = 0.4. Find R₁.₂₃.
| Feature | Linear Correlation (r) | Rank Correlation (ρ) | Partial Correlation | Multiple Correlation (R) |
|---|---|---|---|---|
| Developed by | Karl Pearson | Charles Spearman | Multiple authors | Multiple authors |
| Range | −1 to +1 | −1 to +1 | −1 to +1 | 0 to +1 only |
| Variables | 2 (X & Y) | 2 (ranks) | 2 (controlling 1+) | 1 dependent + 2+ predictors |
| Data type | Quantitative | Ordinal or ranked | Quantitative | Quantitative |
| Measures | Linear association | Monotonic association | Pure association (after control) | Joint predictive power |
| Key formula | nΣXY−ΣXΣY / √[…] | 1 − 6Σd²/n(n²−1) | (r₁₂−r₁₃r₂₃)/√[…] | √[(r₁₂²+r₁₃²−2r₁₂r₁₃r₂₃)/(1−r₂₃²)] |
| Outlier sensitivity | High | Low | Moderate | Moderate |
| Can be negative? | Yes | Yes | Yes | No (always ≥ 0) |