• LOGIN
  • No products in the cart.

SUBMATH: Correlation

We can easily see that warmer weather and higher sales go together. The relationship is good but not perfect. In fact the correlation is 0.9575 ... see at the end how I calculated it.

Correlation

When two sets of data are strongly linked together we say they have a High Correlation.

The word Correlation is made of Co- (meaning “together”), and Relation

  • Correlation is Positive when the values increase together, and
  • Correlation is Negative when one value decreases as the other increases

A correlation is assumed to be linear (following a line).

correlation examples

Correlation can have a value:

  • 1 is a perfect positive correlation
  • 0 is no correlation (the values don’t seem linked at all)
  • -1 is a perfect negative correlation

The value shows how good the correlation is (not how steep the line is), and if it is positive or negative.

Example: Ice Cream Sales

The local ice cream shop keeps track of how much ice cream they sell versus the temperature on that day, here are their figures for the last 12 days:

Ice Cream Sales vs Temperature
Temperature °C Ice Cream Sales
14.2° $215
16.4° $325
11.9° $185
15.2° $332
18.5° $406
22.1° $522
19.4° $412
25.1° $614
23.4° $544
18.1° $421
22.6° $445
17.2° $408

And here is the same data as a Scatter Plot:

scatter plot ice cream 1

We can easily see that warmer weather and higher sales go together. The relationship is good but not perfect.

In fact the correlation is 0.9575 … see at the end how I calculated it.

Correlation Is Not Good at Curves

The correlation calculation only works properly for straight line relationships.

Our Ice Cream Example: there has been a heat wave!

It gets so hot that people aren’t going near the shop, and sales start dropping.

Here is the latest graph:

scatter ice cream plot 2

The correlation value is now 0: “No Correlation” … !

The calculated correlation value is 0 (I worked it out), which means “no correlation”.

But we can see the data follows a nice curve that reaches a peak around 25° C.

But the correlation calculation is not “smart” enough to see this.

Moral of the story: make a Scatter Plot, and look at it!
You may see a relationship that the calculation does not.

“Correlation Is Not Causation”

A common saying is “Correlation Is Not Causation”.

What it really means is that a correlation does not prove one thing causes the other:

  • One thing might cause the other
  • The other might cause the first to happen
  • They may be linked by a different thing
  • Or it could be random chance!

There can be many reasons the data has a good correlation.

Example: Sunglasses vs Ice Cream

Our Ice Cream shop finds how many sunglasses were sold by a big store for each day and compares them to their ice cream sales:

scatter ice cream plot 3

The correlation between Sunglasses and Ice Cream sales is high

Does this mean that sunglasses make people want ice cream?

Example: Poor suburbs are more likely to have high pollution.

Why?

  • Do poor people make pollution?
  • Are polluted suburbs the only place poor people can afford?
  • Is it a common link, such as factories with low paying jobs and lots of pollution?

Example: A Real Case!

study sick

A few years ago a survey of employees found a strong positive correlation between “Studying an external course” and Sick Days.

Does this mean:

  • Studying makes them sick?
  • Sick people study a lot?
  • Or did they lie about being sick so they can study more?

Without further research we can’t be sure why.

How To Calculate

How did I calculate the value 0.9575 at the top?

I used “Pearson’s Correlation”. There is software that can calculate it, such as the CORREL() function in Excel or LibreOffice Calc …

… but here is how to calculate it yourself:

Let us call the two sets of data “x” and “y” (in our case Temperature is x and Ice Cream Sales is y):

  • Step 1: Find the mean of x, and the mean of y
  • Step 2: Subtract the mean of x from every x value (call them “a“), and subtract the mean of y from every y value (call them “b“)
  • Step 3: Calculate: aba2 and b2 for every value
  • Step 4: Sum up ab, sum up a2 and sum up b2
  • Step 5: Divide the sum of ab by the square root of [(sum of a2) × (sum of b2)]

Here is how I calculated the first Ice Cream example (values rounded to 1 or 0 decimal places):

correlation calculations

As a formula it is:

correlation formula

Where:

  • Σ is Sigma, the symbol for “sum up”
  • correlation formula (xi - xbar) is each x-value minus the mean of x (called “a” above)
  • correlation formula (yi - ybar) is each y-value minus the mean of y (called “b” above)

You probably won’t have to calculate it like that, but at least you know it is not “magic”, but simply a routine set of calculations.

Note for Programmers

You can calculate it in one pass through the data. Just sum up xyx2y2 and xy (no need for a or b calculations above) then use the formula:

correlation formula onepass

scatter diagram

A scatter diagram is one of the seven basic tools of quality, but many professionals find it to be a difficult concept.

Other charts use lines or bars to show data, while a scatter diagram uses dots. This may be confusing, but it is often easier to understand than lines and bars.

In this blog post, I will explain the scatter diagram.

Scatter Diagram

A scatter plot, scatter graph, and correlation chart are other names for a scatter diagram.

We draw this graph with two variables. The first variable is independent and the second variable depends on the first.

scatter-diagram

This diagram is used to find the correlation between these two variables, how they are related. After determining the correlation, you can then predict the behavior of the dependent variable based on the measure of the independent variable.

A scatter chart is useful when one variable is measurable and the other is not.

According to the PMBOK Guide 6th edition, a scatter diagram is, “a graph that shows the relationship between two variables. Scatter diagrams can show a relationship between any element of a process, environment, or activity on one axis and a quality defect on the other axis.”

Example

You are analyzing accident patterns on a highway. You select the two variables, motor speed and the number of accidents, and draw the diagram.

Once the diagram is complete, you notice that as the speed of vehicles increases, the number of accidents goes up. This shows the relationship between the two.

Since this diagram shows you the correlation between the variables, many experts call it a correlation chart.

In most cases, the independent variable is plotted along the horizontal axis (x-axis) and the dependent variable is plotted on the vertical axis (y-axis). The independent variable is the control parameter because it influences the behavior of the dependent variable.

It is not necessary to have a controlling parameter to draw a scatter diagram. It can have two independent variables. In that case, you can use any axis for any variable.

I have seen many professionals think that a scatter diagram is like a fishbone diagram because the fatter has two parameters: cause and effect.

Please note that these two diagrams are different. The fishbone diagram shows you the effect of a cause, but it does not show the relationship between these two. The scatter diagram helps you analyze the relationship between the two variables.

However, the Ishikawa diagram can help you draw the scatter diagram; for example, you can find the two variables (cause and effect), and then draw the scatter diagram to analyze the relationship between them.

Types of Scatter Diagram

You can classify scatter diagrams in many ways; I will discuss the two most popular based on correlation and slope of the trend. They cover almost all types of scatter diagrams used in project management.

According to the correlation, you can divide scatter diagrams into the following categories:

  • Scatter Diagram with No Correlation
  • Scatter Diagram with Moderate Correlation
  • Scatter Diagram with Strong Correlation

Scatter Diagram with No Correlation

This diagram is also known as “Scatter Diagram with Zero Degree of Correlation”.

Here, the data point spread is so random that you cannot draw a line through them.

Therefore, you can say that these variables have no correlation.

Scatter Diagram with Moderate Correlation

This diagram is also known as “Scatter Diagram with a Low Degree of Correlation”.

Here, the data points are a little closer and you can see that some kind of relationship exists between these variables.

Scatter Diagram with Strong Correlation

This diagram is also known as “Scatter Diagram with a High Degree of Correlation”.

In this diagram, data points are close to each other and you can draw a line by following their pattern.

In this case, you say that these variables are closely related.

As discussed earlier, you can categorize the scatter diagram according to the slope, or trend, of the data points:

  • Scatter Diagram with Strong Positive Correlation
  • Scatter Diagram with Weak Positive Correlation
  • Scatter Diagram with Strong Negative Correlation
  • Scatter Diagram with Weak Negative Correlation
  • Scatter Diagram with Weakest (or no) Correlation

A strong positive correlation means a visible upward trend from left to right; a strong negative correlation means a visible downward trend from left to right. A weak correlation means the trend is less clear. A flat line, from left to right, is the weakest correlation, as it is neither positive nor negative. A scatter diagram with no correlation shows that the independent variable does not affect the dependent variable.

Scatter Diagram with Strong Positive Correlation

This diagram is also known as a Scatter Diagram with Positive Slant.

In a positive slant, the correlation is positive, i.e. as the value of X increases, the value of Y will increase. You can say that the slope of a straight line drawn along the data points will go up. The pattern resembles a straight line.

For example, if the weather gets hotter, cold drink sales will go up.

Scatter Diagram with Weak Positive Correlation

As the value of X increases, the value of Y also increases, but the pattern does not resemble a straight line.

Scatter Diagram with Strong Negative Correlation

This diagram is also known as a Scatter Diagram with a Negative Slant.

In the negative slant, the correlation is negative, i.e. as the value of X increases, the value of Y will decrease. The slope of a straight line drawn along the data points will go down.

For example, if the temperature goes up, sales of winter coats go down.

Scatter Diagram with Weak Negative Correlation

scatter-diagram-with-weak-negative-correlation

As the value of X increases, the value of Y will decrease, but the pattern is not clear.

Scatter Diagram with No Correlation

There isn’t any relationship between the two variables to be seen. It might just be a series of points with no visible trend, or it might be a straight, flat row of points. In either case, the independent variable has no effect on the second variable; it is not dependent.

Limitations of a Scatter Diagram

The following are a few limitations of a scatter diagram:

  • Scatter diagrams cannot give you the exact extent of correlation.
  • A scatter diagram does not show you the quantitative measurement of the relationship between the variables. It only shows the quantitative expression of quantitative change.
  • This chart does not show you the relationship for more than two variables.

Benefits of a Scatter Diagram

The following are a few advantages of a scatter diagram:

  • It shows the relationship between two variables.
  • It is the best method to show you a non-linear pattern.
  • The range of data flow, i.e. maximum and minimum value, can be determined.
  • Observation and reading are straightforward.
  • Plotting the diagram is easy.

Assignment

SUBMATH: Correlation Assignment

ASSIGNMENT : SUBMATH: Correlation Assignment MARKS : 50  DURATION : 1 week, 3 days

 

Courses

Featured Downloads