To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes.
The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Correlation
When two sets of data are strongly linked together we say they have a High Correlation.
The word Correlation is made of Co- (meaning “together”), and Relation
A correlation is assumed to be linear (following a line).
Correlation can have a value:
The value shows how good the correlation is (not how steep the line is), and if it is positive or negative.
Example: Ice Cream Sales
The local ice cream shop keeps track of how much ice cream they sell versus the temperature on that day, here are their figures for the last 12 days:
And here is the same data as a Scatter Plot:
We can easily see that warmer weather and higher sales go together. The relationship is good but not perfect.
In fact the correlation is 0.9575 … see at the end how I calculated it.
Correlation Is Not Good at Curves
The correlation calculation only works properly for straight line relationships.
Our Ice Cream Example: there has been a heat wave!
It gets so hot that people aren’t going near the shop, and sales start dropping.
Here is the latest graph:
The correlation value is now 0: “No Correlation” … !
The calculated correlation value is 0 (I worked it out), which means “no correlation”.
But we can see the data follows a nice curve that reaches a peak around 25° C.
But the correlation calculation is not “smart” enough to see this.
Moral of the story: make a Scatter Plot, and look at it!
You may see a relationship that the calculation does not.
“Correlation Is Not Causation”
A common saying is “Correlation Is Not Causation”.
What it really means is that a correlation does not prove one thing causes the other:
There can be many reasons the data has a good correlation.
Example: Sunglasses vs Ice Cream
Our Ice Cream shop finds how many sunglasses were sold by a big store for each day and compares them to their ice cream sales:
The correlation between Sunglasses and Ice Cream sales is high
Does this mean that sunglasses make people want ice cream?
Example: Poor suburbs are more likely to have high pollution.
Why?
Example: A Real Case!
A few years ago a survey of employees found a strong positive correlation between “Studying an external course” and Sick Days.
Does this mean:
Without further research we can’t be sure why.
How To Calculate
How did I calculate the value 0.9575 at the top?
I used “Pearson’s Correlation”. There is software that can calculate it, such as the CORREL() function in Excel or LibreOffice Calc …
… but here is how to calculate it yourself:
Let us call the two sets of data “x” and “y” (in our case Temperature is x and Ice Cream Sales is y):
Here is how I calculated the first Ice Cream example (values rounded to 1 or 0 decimal places):
As a formula it is:
Where:
You probably won’t have to calculate it like that, but at least you know it is not “magic”, but simply a routine set of calculations.
Note for Programmers
You can calculate it in one pass through the data. Just sum up x, y, x2, y2 and xy (no need for a or b calculations above) then use the formula:
A scatter diagram is one of the seven basic tools of quality, but many professionals find it to be a difficult concept.
Other charts use lines or bars to show data, while a scatter diagram uses dots. This may be confusing, but it is often easier to understand than lines and bars.
In this blog post, I will explain the scatter diagram.
Scatter Diagram
A scatter plot, scatter graph, and correlation chart are other names for a scatter diagram.
We draw this graph with two variables. The first variable is independent and the second variable depends on the first.
This diagram is used to find the correlation between these two variables, how they are related. After determining the correlation, you can then predict the behavior of the dependent variable based on the measure of the independent variable.
A scatter chart is useful when one variable is measurable and the other is not.
According to the PMBOK Guide 6th edition, a scatter diagram is, “a graph that shows the relationship between two variables. Scatter diagrams can show a relationship between any element of a process, environment, or activity on one axis and a quality defect on the other axis.”
Example
You are analyzing accident patterns on a highway. You select the two variables, motor speed and the number of accidents, and draw the diagram.
Once the diagram is complete, you notice that as the speed of vehicles increases, the number of accidents goes up. This shows the relationship between the two.
Since this diagram shows you the correlation between the variables, many experts call it a correlation chart.
In most cases, the independent variable is plotted along the horizontal axis (x-axis) and the dependent variable is plotted on the vertical axis (y-axis). The independent variable is the control parameter because it influences the behavior of the dependent variable.
It is not necessary to have a controlling parameter to draw a scatter diagram. It can have two independent variables. In that case, you can use any axis for any variable.
I have seen many professionals think that a scatter diagram is like a fishbone diagram because the fatter has two parameters: cause and effect.
Please note that these two diagrams are different. The fishbone diagram shows you the effect of a cause, but it does not show the relationship between these two. The scatter diagram helps you analyze the relationship between the two variables.
However, the Ishikawa diagram can help you draw the scatter diagram; for example, you can find the two variables (cause and effect), and then draw the scatter diagram to analyze the relationship between them.
Types of Scatter Diagram
You can classify scatter diagrams in many ways; I will discuss the two most popular based on correlation and slope of the trend. They cover almost all types of scatter diagrams used in project management.
According to the correlation, you can divide scatter diagrams into the following categories:
Scatter Diagram with No Correlation
This diagram is also known as “Scatter Diagram with Zero Degree of Correlation”.
Here, the data point spread is so random that you cannot draw a line through them.
Therefore, you can say that these variables have no correlation.
Scatter Diagram with Moderate Correlation
This diagram is also known as “Scatter Diagram with a Low Degree of Correlation”.
Here, the data points are a little closer and you can see that some kind of relationship exists between these variables.
Scatter Diagram with Strong Correlation
This diagram is also known as “Scatter Diagram with a High Degree of Correlation”.
In this diagram, data points are close to each other and you can draw a line by following their pattern.
In this case, you say that these variables are closely related.
As discussed earlier, you can categorize the scatter diagram according to the slope, or trend, of the data points:
A strong positive correlation means a visible upward trend from left to right; a strong negative correlation means a visible downward trend from left to right. A weak correlation means the trend is less clear. A flat line, from left to right, is the weakest correlation, as it is neither positive nor negative. A scatter diagram with no correlation shows that the independent variable does not affect the dependent variable.
Scatter Diagram with Strong Positive Correlation
This diagram is also known as a Scatter Diagram with Positive Slant.
In a positive slant, the correlation is positive, i.e. as the value of X increases, the value of Y will increase. You can say that the slope of a straight line drawn along the data points will go up. The pattern resembles a straight line.
For example, if the weather gets hotter, cold drink sales will go up.
Scatter Diagram with Weak Positive Correlation
As the value of X increases, the value of Y also increases, but the pattern does not resemble a straight line.
Scatter Diagram with Strong Negative Correlation
This diagram is also known as a Scatter Diagram with a Negative Slant.
In the negative slant, the correlation is negative, i.e. as the value of X increases, the value of Y will decrease. The slope of a straight line drawn along the data points will go down.
For example, if the temperature goes up, sales of winter coats go down.
Scatter Diagram with Weak Negative Correlation
As the value of X increases, the value of Y will decrease, but the pattern is not clear.
Scatter Diagram with No Correlation
There isn’t any relationship between the two variables to be seen. It might just be a series of points with no visible trend, or it might be a straight, flat row of points. In either case, the independent variable has no effect on the second variable; it is not dependent.
Limitations of a Scatter Diagram
The following are a few limitations of a scatter diagram:
Benefits of a Scatter Diagram
The following are a few advantages of a scatter diagram:
Assignment
ASSIGNMENT : SUBMATH: Correlation Assignment MARKS : 50 DURATION : 1 week, 3 days