Top-Rated Free Essay
Preview

Correlation Analysis

Good Essays
3446 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Correlation Analysis
Correlation analysis:
The correlation analysis refers to the techniques used in measuring the closeness of the relationship between the variables. The degree of relationship between the variables under consideration is measured through the correlation analysis. And the measure of correlation called as correlation coefficient or correlation index summarizes in one figure the direction and degree of correlation.
Thus correlation is a statistical device which helps us in analyzing the covariation of two or more variables.

The problem of analyzing the relation between different series should be broken down nto 3 steps: * Determining whether a relation exists and, if it does, measuring it * Testing whether it is significant * Establishing the cause and effect relation, if any.

A real life example:
An extremely high and significant correlation between the increase in smoking and increase in lung cancer would not prove that smoking causes lung cancer. The proof of a cause and effect relation can be developed only by means of an exhaustive study of the operating elements themselves.

Correlation and causation:
Correlation analysis helps us in determining the degree of relationship between two or more variables, it does not tell anything about cause and effect relationship. The explanation of a significant degree of correlation may be any one, or a combination of the following reasins: * Correlation may be due to pure chance, especially ina small sample The following example shall illustrate the point: income (rs) : 5000 6000 7000 8000 9000 weight (lb) : 120 140 160 180 200 the above data show a perfectly positive relationship between income and weight as the income is increasing the weight is also increasing and the rate of change between two variables is the same * Both the correlated variables may be influenced by one or more other variables * Both the variables may be mutually influencing each other so that neither can be designated as the cause and other the effect.

Types of correlation:
Correlation is described in several different ways. Three of the most important ways of classifying correlation are: i. Positive or negative ii. Simple, partial and multiple iii. Linear and non linear

Positive and negative correlation: Whether correlation is positive or negative it would depend upon the direction of change of variables. * If both the variables are varying in the same direction then is known as positive correlation * If the variables are varying in the opposite directions then it is known as negative correlation. Examples: Positive correlation: negative correlation: X 10 12 15 18 20 X 20 30 40 60 80
Y 15 20 22 25 37 Y 40 30 22 15 10

Simple, partial and multiple correlation: * When only two variables are studied then it is a problem of simple correlation. * When three or more variables are studied then it is a problem of partial or multiple correlation * In multiple correlations three or more variables are studied simultaneously.

Linear and nonlinear correlation:
The distinction between linear and nonlinear correlation is based upon the constancy ratio of change between the variables. * If the amount of change in one variable tends to be a constant ratio to the amount of change is the other variable then correlation is said to be linear. * If the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable then correlation is said to be nonlinear.

The following two diagrams illustrate the difference between linear and curvilinear correlation:

linear curve nonlinear curve

Correlation example:
Let's assume that we want to look at the relationship between two variables, height (in inches) and self-esteem. Perhaps we have a hypothesis that how tall you are effects your self-esteem (incidentally, I don't think we have to worry about the direction of causality here -- it's not likely that self-esteem causes your height!). Let's say we collect some information on twenty individuals (all male -- we know that the average height differs for males and females so, to keep this example simple we'll just use males). Height is measured in inches. Self-esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self-esteem). Here's the data for the 20 cases (don't take this too seriously -- I made this data up to illustrate what a correlation is): Person | Height | Self Esteem | 1 | 68 | 4.1 | 2 | 71 | 4.6 | 3 | 62 | 3.8 | 4 | 75 | 4.4 | 5 | 58 | 3.2 | 6 | 60 | 3.1 | 7 | 67 | 3.8 | 8 | 68 | 4.1 | 9 | 71 | 4.3 | 10 | 69 | 3.7 | 11 | 68 | 3.5 | 12 | 67 | 3.2 | 13 | 63 | 3.7 | 14 | 62 | 3.3 | 15 | 60 | 3.4 | 16 | 63 | 4.0 | 17 | 65 | 4.1 | 18 | 67 | 3.8 | 19 | 63 | 3.4 | 20 | 61 | 3.6 |
And, here are the descriptive statistics: Variable | Mean | StDev | Variance | Sum | Minimum | Maximum | Range | Height | 65.4 | 4.40574 | 19.4105 | 1308 | 58 | 75 | 17 | Self Esteem | 3.755 | 0.426090 | 0.181553 | 75.1 | 3.1 | 4.6 | 1.5 |
Finally, we'll look at the simple bivariate (i.e., two-variable) plot:

Calculating the Correlation
Now we're ready to compute the correlation value. The formula for the correlation is:

We use the symbol r to stand for the correlation. Through the magic of mathematics it turns out that r will always be between -1.0 and +1.0. if the correlation is negative, we have a negative relationship; if it's positive, the relationship is positive. You don't need to know how we came up with this formula unless you want to be a statistician. But you probably will need to know how the formula relates to real data -- how you can use the formula to compute the correlation. Let's look at the data we need for the formula. Here's the original data with the other necessary columns: Person | Height (x) | Self Esteem (y) | x*y | x*x | y*y | 1 | 68 | 4.1 | 278.8 | 4624 | 16.81 | 2 | 71 | 4.6 | 326.6 | 5041 | 21.16 | 3 | 62 | 3.8 | 235.6 | 3844 | 14.44 | 4 | 75 | 4.4 | 330 | 5625 | 19.36 | 5 | 58 | 3.2 | 185.6 | 3364 | 10.24 | 6 | 60 | 3.1 | 186 | 3600 | 9.61 | 7 | 67 | 3.8 | 254.6 | 4489 | 14.44 | 8 | 68 | 4.1 | 278.8 | 4624 | 16.81 | 9 | 71 | 4.3 | 305.3 | 5041 | 18.49 | 10 | 69 | 3.7 | 255.3 | 4761 | 13.69 | 11 | 68 | 3.5 | 238 | 4624 | 12.25 | 12 | 67 | 3.2 | 214.4 | 4489 | 10.24 | 13 | 63 | 3.7 | 233.1 | 3969 | 13.69 | 14 | 62 | 3.3 | 204.6 | 3844 | 10.89 | 15 | 60 | 3.4 | 204 | 3600 | 11.56 | 16 | 63 | 4 | 252 | 3969 | 16 | 17 | 65 | 4.1 | 266.5 | 4225 | 16.81 | 18 | 67 | 3.8 | 254.6 | 4489 | 14.44 | 19 | 63 | 3.4 | 214.2 | 3969 | 11.56 | 20 | 61 | 3.6 | 219.6 | 3721 | 12.96 | Sum = | 1308 | 75.1 | 4937.6 | 85912 | 285.45 |
The first three columns are the same as in the table above. The next three columns are simple computations based on the height and self-esteem data. The bottom row consists of the sum of each column. This is all the information we need to compute the correlation. Here are the values from the bottom row of the table (where N is 20 people) as they are related to the symbols in the formula:

Now, when we plug these values into the formula given above, we get the following (I show it here tediously, one step at a time):

So, the correlation for our twenty cases is .73, which is a fairly strong positive relationship. I guess there is a relationship between height and self-esteem, at least in this made up data!
Testing the Significance of a Correlation
Once you've computed a correlation, you can determine the probability that the observed correlation occurred by chance. That is, you can conduct a significance test. Most often you are interested in determining the probability that the correlation is a real one and not a chance occurrence. In this case, you are testing the mutually exclusive hypotheses: Null Hypothesis: | r = 0 | Alternative Hypothesis: | r <> 0 |

Methods of studying correlation: i. Scatter diagram method ii. Graphic method iii. Karl Pearson’s coefficient of correlation iv. Rank method v. Concurrent deviation method vi. Method of least squares

As we have just two methods in our course, hence the following discussions will be on rank method and Karl Pearson method.

Karl Pearson Correlation Coefficient

* Karl Pearson’s Product-Moment Correlation Coefficient or simply Pearson’s Correlation Coefficient for short, is one of the important methods used in Statistics to measure Correlation between two variables. * A few words about Karl Pearson. Karl Pearson was a British mathematician, statistician, lawyer and a eugenicist. He established the discipline of mathematical statistics. He founded the world’s first statistics department In the University of London in the year 1911. He along with his colleagues Weldon and Galton founded the journal “Biometrika” whose object was the development of statistical theory. * The Correlation between two variables X and Y, which are measured using Pearson’s Coefficient, give the values between +1 and -1. When measured in population the Pearson’s Coefficient is designated the value of Greek letter rho (ρ). But, when studying a sample, it is designated the letter r. It is therefore sometimes called Pearson’s r. Pearson’s coefficient reflects the linear relationship between two variables. As mentioned above if the correlation coefficient is +1 then there is a perfect positive linear relationship between variables, and if it is -1 then there is a perfect negative linear relationship between the variables. And 0 denotes that there is no relationship between the two variables. * The degrees -1, +1 and 0 are theoretical results and are not generally found in normal circumstances. That means the results cannot be more than -1, +1. These are the upper and the lower limits
Pearson’s Coefficient computational formula

Sample question: compute the value of the correlation coefficient from the following table:

Subject | Age x | Weight Level y | 1 | 43 | 99 | 2 | 21 | 65 | 3 | 25 | 79 | 4 | 42 | 75 | 5 | 57 | 87 | 6 | 59 | 81 |

Step 1: Make a chart. Use the given data, and add three more columns: xy, x2, and y2. Subject | Age x | Weight Level y | xy | x2 | y2 | 1 | 43 | 99 | | | | 2 | 21 | 65 | | | | 3 | 25 | 79 | | | | 4 | 42 | 75 | | | | 5 | 57 | 87 | | | | 6 | 59 | 81 | | | |

Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99 = 4,257

Step 3: Take the square of the numbers in the x column, and put the result in the x2 column Subject | Age x | Weight Level y | xy | x2 | y2 | 1 | 43 | 99 | 4257 | 1849 | | 2 | 21 | 65 | 1365 | 441 | | 3 | 25 | 79 | 1975 | 625 | | 4 | 42 | 75 | 3150 | 1764 | | 5 | 57 | 87 | 4959 | 3249 | | 6 | 59 | 81 | 4779 | 3481 | | Step 4: Take the square of the numbers in the y column, and put the result in the y2 column.

Step 5: Add up all of the numbers in the columns and put the result at the bottom.2 column. The Greek letter sigma (Σ) is a short way of saying “sum of.”

Subject | Age x | Weight Level y | xy | x2 | y2 | 1 | 43 | 99 | 4257 | 1849 | 9801 | 2 | 21 | 65 | 1365 | 441 | 4225 | 3 | 25 | 79 | 1975 | 625 | 6241 | 4 | 42 | 75 | 3150 | 1764 | 5625 | 5 | 57 | 87 | 4959 | 3249 | 7569 | 6 | 59 | 81 | 4779 | 3481 | 6561 | Σ | 247 | 486 | 20485 | 11409 | 40022 | Step 6: Use the following formula to work out the correlation coefficient.

The answer is: 1.3787 × 10-4
The range of the correlation coefficient is from -1 to 1. Since our result is 1.3787 × 10-4, a tiny positive amount, we can’t draw any conclusions one way or another. Assumptions of the pearsonian coefficient:
Karl Pearson’s coefficient of correlation is based on the following assumptions:

* There is a linear relationship between the variables. * The two variables under study are affected by large number of independent causes so as to form a normal distribution. * There is a cause and effect relationship between the forces affecting the distribution of the items in the two series.

Merits and limitations of the pearsonian coefficient: * The correlation coefficient always assumes linear relationship regardless of the fact whether that assumption is correct or not * Great care must be exercised in interpreting the values of this coefficient as very often the coefficient is misinterpreted. * The value of the coefficient is unduly affected by the extreme items. * As compared with other methods this method takes more time to compute the value of correlation coefficient

Interpreting the coefficient of correlation: The following general rules are given which would help in interpreting the value of r: * r- coefficient of correlation * when r = +1, perfect positive relationship between the variables * when r = -1, perfect negative relationship between the variables * when r = 0, no relationship between the variables * the closer r is to +1 or -1, the closer relationship between the variables and the closer r is to 0, the less close the relationship. * The closeness of the relationship is nor proportional to r.

Spearman's Rank Correlation Coefficient
The Spearman's Rank Correlation Coefficient is used to discover the strength of a link between two sets of data. This example looks at the strength of the link between the price of a convenience item (a 50cl bottle of water) and distance from the Contemporary Art Museum in El Ravel, Barcelona.
Spearman’s Rank correlation coefficient
A correlation can easily be drawn as a scatter graph, but the most precise way to compare several pairs of data is to use a statistical test - this establishes whether the correlation is really significant or if it could have been the result of chance alone.
Spearman’s Rank correlation coefficient is a technique which can be used to summarize the strength and direction (negative or positive) of a relationship between two variables.
The result will always be between 1 and minus 1Method - calculating the coefficient * Create a table from your data. * Rank the two data sets. Ranking is achieved by giving the ranking '1' to the biggest number in a column, '2' to the second biggest value and so on. The smallest value in the column will get the lowest ranking. This should be done for both sets of measurements. * Tied scores are given the mean (average) rank. For example, the three tied scores of 1 euro in the example below are ranked fifth in order of price, but occupy three positions (fifth, sixth and seventh) in a ranking hierarchy of ten. The mean rank in this case is calculated as (5+6+7) ÷ 3 = 6. * Find the difference in the ranks (d): This is the difference between the ranks of the two values on each row of the table. The rank of the second value (price) is subtracted from the rank of the first (distance from the museum). * Square the differences (d²) To remove negative values and then sum them (d²).

Convenience Store | Distance from CAM (m) | Rank distance | Price of 50cl bottle (€) | Rank price | Difference between ranks (d) | d² | 1 | 50 | 10 | 1.80 | 2 | 8 | 64 | 2 | 175 | 9 | 1.20 | 3.5 | 5.5 | 30.25 | 3 | 270 | 8 | 2.00 | 1 | 7 | 49 | 4 | 375 | 7 | 1.00 | 6 | 1 | 1 | 5 | 425 | 6 | 1.00 | 6 | 0 | 0 | 6 | 580 | 5 | 1.20 | 3.5 | 1.5 | 2.25 | 7 | 710 | 4 | 0.80 | 9 | -5 | 25 | 8 | 790 | 3 | 0.60 | 10 | -7 | 49 | 9 | 890 | 2 | 1.00 | 6 | -4 | 16 | 10 | 980 | 1 | 0.85 | 8 | -7 | 49 | | d² = 285.5 | Data Table: Spearman's Rank Correlation * Calculate the coefficient (R) using the formula below. The answer will always be between 1.0 (a perfect positive correlation) and -1.0 (a perfect negative correlation).
When written in mathematical notation the Spearman Rank formula looks like this : |
Now to put all these values into the formula. * Find the value of all the d² values by adding up all the values in the Difference² column. In our example this is 285.5. Multiplying this by 6 gives 1713. * Now for the bottom line of the equation. The value n is the number of sites at which you took measurements. This, in our example is 10. Substituting these values into n³ - n we get 1000 - 10 * We now have the formula: R = 1 - (1713/990) which gives a value for R:
1 - 1.73 = -0.73
What does this R value of -0.73 mean?
The closer R is to +1 or -1, the stronger the likely correlation. A perfect positive correlation is +1 and a perfect negative correlation is -1. The R value of -0.73 suggests a fairly strong negative relationship.

A further technique is now required to test the significance of the relationship.
The R value of -0.73 must be looked up on the Spearman Rank significance table below as follows: * Work out the 'degrees of freedom' you need to use. This is the number of pairs in your sample minus 2 (n-2). In the example it is 8 (10 - 2). * Now plot your result on the table. * If it is below the line marked 5%, then it is possible your result was the product of chance and you must reject the hypothesis. * If it is above the 0.1% significance level, then we can be 99.9% confident the correlation has not occurred by chance. * If it is above 1%, but below 0.1%, you can say you are 99% confident. * If it is above 5%, but below 1%, you can say you are 95% confident (i.e. statistically there is a 5% likelihood the result occurred by chance).
In the example, the value 0.73 gives a significance level of slightly less than 5%. That means that the probability of the relationship you have found being a chance event is about 5 in a 100. You are 95% certain that your hypothesis is correct. The reliability of your sample can be stated in terms of how many researchers completing the same study as yours would obtain the same results: 95 out of 100.

* The fact two variables correlate cannot prove anything - only further research can actually prove that one thing affects the other. * Data reliability is related to the size of the sample. The more data you collect, the more reliable your result.
Merits and limitations of rank method: * This method is simpler to understand and easier to apply compared to Karl parsons method. The answer obtained by this method and the Karl parsons method will be same provided no value is repeated. * Where the data are of a qualitative nature like honesty, efficiency, intelligence, etc this method can be used with great advantage * This is the only method that can be used where we are given the ranks and not the actual data. * Even where actual data are given rank method can be applied for ascertaining correlation.
Limitations:
* This method cannot be used for finding out correlation in a grouped frequency distribution. * Where the number of items exceeds 30 the calculations become quite tedious and require a lot of time. Therefore, this method should not be applied where N exceeds 30 unless we are given the ranks and not the actual values of variables.

SIGNIFICANCE OF THE STUDY OF CORREALTION: * Correlation analysis contributes to the understanding of economic behavior, aids in locating the critically important variables on which others depend, may reveal to the economist the connection by which disturbances spread and suggest him the paths through which stabilizing forces may become effective. * In business correlation analysis enables the executive to estimate costs, prices and other variables on the basis of some other series with which these costs, sales, or prices may be functionally related. Some of the guesswork can be removed from decisions when the relationship between a variable to be estimated and the one or more other variables on which it depends are close and reasonably invariant. * However it should be noted that coefficient of correlation is one of most widely used and also one of the most widely abused statistical measures. It is abused in the sense that one sometimes overlooks the fact that correlation measures nothing but the strength of linear relationship and it does not necessarily imply a cause effect relationship.

You May Also Find These Documents Helpful

  • Satisfactory Essays

    Correlational studies show relationships between variables. If high scores on one variable predict high scores on the other variable, the correlation is positive. If high scores on one variable predict low scores on the other variable, the correlation is negative.…

    • 404 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    The correlation coefficient (r value) is a quantitative assessment of the strength of relationship between the x and y values in a set of (x, y) pairs. The value of r is a measure of the extent to which x and y are linearly related or the extent to which the points in the scatterplot fall close to a straight line. The value of r is between -1 and +1. A value near the upper limit, +1, indicates a substantial positive relationship, whereas an r value close to the lower limit, -1,…

    • 441 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    Math 533 Part C

    • 1123 Words
    • 5 Pages

    2. Determine the equation of the best fit line, which describes the relationship between income and credit balance.…

    • 1123 Words
    • 5 Pages
    Powerful Essays
  • Satisfactory Essays

    | Correlation research may clarify relationships between variables that cannot be examined by other research methods. They allow prediction of behavior…

    • 765 Words
    • 4 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Researchers use the ____Correlation Method____________________ to establish the degree of relationship between two characteristics, events, or behaviors.…

    • 490 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    A Correlation suggests that two variables might be linked, but does not provide evidence that they are connected. For example, height and weight are linked as taller people are generally heavier than shorter people.…

    • 464 Words
    • 2 Pages
    Good Essays
  • Good Essays

    6. In the following examples, try to predict of there is a positive correlation (+), a negative correlation (-) or no correlation at all (0).…

    • 770 Words
    • 4 Pages
    Good Essays
  • Good Essays

    HGD Midterm Study Guide

    • 1605 Words
    • 7 Pages

    Correlations can range from -1.00 to +1.00 and describe the strength of a relationship between two variables.…

    • 1605 Words
    • 7 Pages
    Good Essays
  • Satisfactory Essays

    Deborahm

    • 373 Words
    • 2 Pages

    E. Discuss your 1st pairing of variables, using graphical, numerical summary and interpretation –income and years-liez…

    • 373 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Week 5

    • 290 Words
    • 2 Pages

    The correlation between two variable explain if there will likely be a change in the variable and that will effect a proportional change in other variables. A high correlation means they share a common cause.…

    • 290 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Study guide answer exam 1

    • 1138 Words
    • 5 Pages

    A correlation exists when 2 variables are related to each other. May be positive or negative depends on nature of the association between the variables measured. Correlation indicates the 2 variables that change together in the opposite direction. Strength of correlation depends on size of coefficient.…

    • 1138 Words
    • 5 Pages
    Satisfactory Essays
  • Powerful Essays

    Correlation does not mean causations means just because there is a connection between two variables doesn’t mean they are the cause of each other. It is important to understand because as a member of the general public, I may think the answer to a specific issue is solved/simple just by seeing a similarity or connection when in fact I’m not seeing all the other variables or reasoning etc.…

    • 6427 Words
    • 35 Pages
    Powerful Essays
  • Good Essays

    Dr Williams Correlation

    • 699 Words
    • 3 Pages

    Dr. Williams wanted to see if the amount of television a person watched was related to that person’s depression or lack thereof. To find this out, Dr. Williams conducted a correlational study. A correlation is a shared relationship or connection between two or more things. A correlation can be either positive, negative, or zero. A positive correlation is when factors or variables moves directly of each other, meaning if one variable goes down, the other will as well and vice versa.…

    • 699 Words
    • 3 Pages
    Good Essays
  • Powerful Essays

    Correlation

    • 3417 Words
    • 14 Pages

    Is being physically strong still important in today's workplace? In our current high-tech world one might be inclined to think that only skills required for computer work such as reading, reasoning, abstract thinking, etc. are important for performing well in many of today's jobs. There are still, however, a number of very important jobs that require, in addition to cognitive skills, a significant amount of strength to be able to perform at a high level. Take, for example, the job of a construction worker. It takes a lot of strength to lift, position, and secure many building materials such as wood boards, metal bars, and cement blocks. In addition, the tools used in construction work are often heavy and require a lot of strength to control. When was the last time you tried to operate a jackhammer?…

    • 3417 Words
    • 14 Pages
    Powerful Essays
  • Good Essays

    Because correlation is when two or more things happened at the same time, but not even connected to each other. They might be associated with each other but not connected by cause.…

    • 1639 Words
    • 7 Pages
    Good Essays