Hi, all I am doing an analysis of a salary survey. the regression presents a R square = 0.8, the adjust R square = 0.5. How to trim the database to make a better R square? Thanks for any help.
From China, Chongqing
From China, Chongqing
Dear Yovi,
Please give more details about the nature of your data collected. What do you mean by "Trimming the database..."? How many variables does your database have? R-Squared value is for a single explanatory variable and Adjusted R-Squared is for multiple explanatory variables.
Have a nice day.
Simhan
From United Kingdom
Please give more details about the nature of your data collected. What do you mean by "Trimming the database..."? How many variables does your database have? R-Squared value is for a single explanatory variable and Adjusted R-Squared is for multiple explanatory variables.
Have a nice day.
Simhan
From United Kingdom
Dear Yovi
The two common variables in salary regression can be:
1. Salary & Job Points (if jobs are evaluated)
2. Salary & Age
3. Salary & Tenure.....etc
Whichever variables used, you are trying to establish the correlation between the two.
In conducting salary survey, "trimming the database" refers to identifing "data outliers" or "extreme data points" and excluding them in the analysis because including these data will "skewed" the results either upwards or downwards and trends/norms will not be able to be established. E.g. If I have 5 Production Workers. 4 of them receiving a salary within the range of $1000 to $2000 but the fifth is receiving $5000. The fifth is considered an "outlier" because by including this data point, it will skewed the analysis.
To identify "outliers" you need to perform a Standard Deviation Analysis (use Excel), set the desired Deviation step e.g. 1, 2 or 3, and run the analysis. The Deviation step is anchored on the size of your data sample.
It is a good practice to run two sets of regression salary - one before the "trimming" to depicts current situation and another after the "trimming" to depicts desired situation.
"Trimming" is not just for show or presentation, it indicates an area of concern for the company which must eventually be addressed.
Please see sample attachment.
When R=1, you have a "perfect" correlaton but this is rarely the case in real life. To conclude whether two variables are "relatively correlated", the minimum is at R=0.8 (but it also depend very much on your desired standard).
Regards
Autumn Jane
From Singapore, Singapore
The two common variables in salary regression can be:
1. Salary & Job Points (if jobs are evaluated)
2. Salary & Age
3. Salary & Tenure.....etc
Whichever variables used, you are trying to establish the correlation between the two.
In conducting salary survey, "trimming the database" refers to identifing "data outliers" or "extreme data points" and excluding them in the analysis because including these data will "skewed" the results either upwards or downwards and trends/norms will not be able to be established. E.g. If I have 5 Production Workers. 4 of them receiving a salary within the range of $1000 to $2000 but the fifth is receiving $5000. The fifth is considered an "outlier" because by including this data point, it will skewed the analysis.
To identify "outliers" you need to perform a Standard Deviation Analysis (use Excel), set the desired Deviation step e.g. 1, 2 or 3, and run the analysis. The Deviation step is anchored on the size of your data sample.
It is a good practice to run two sets of regression salary - one before the "trimming" to depicts current situation and another after the "trimming" to depicts desired situation.
"Trimming" is not just for show or presentation, it indicates an area of concern for the company which must eventually be addressed.
Please see sample attachment.
When R=1, you have a "perfect" correlaton but this is rarely the case in real life. To conclude whether two variables are "relatively correlated", the minimum is at R=0.8 (but it also depend very much on your desired standard).
Regards
Autumn Jane
From Singapore, Singapore
Thank you Autumn Jane for a clear explanation of what "Data Trimming" is in this context. As the R-Sqared value shown was 0.8 (quite high), I did not think about the outliers. There is a good explanation of this at http://www.statisticaloutsourcingser...m/Outlier2.pdf However, it is not wrt to pay analysis.
Have a nice day.
Simhan
From United Kingdom
Have a nice day.
Simhan
From United Kingdom
Dear Simhan
You are very right to say that a 0.8 R-Squared is in actual fact quite high. But for salary analysis, you only need 1 outlier (be it over or under) to cause serious morale issue across the organization. Therefore, the tighter the control in data spread, the more valid the analysis.
Have a nice day.
Regards
Autumn Jane
From Singapore, Singapore
You are very right to say that a 0.8 R-Squared is in actual fact quite high. But for salary analysis, you only need 1 outlier (be it over or under) to cause serious morale issue across the organization. Therefore, the tighter the control in data spread, the more valid the analysis.
Have a nice day.
Regards
Autumn Jane
From Singapore, Singapore
Dear Yoyi - getting a 0.5 r-square is not worthless - it is telling you that factors you have chosen as the independent variables are not really the sole ones determining salary and you are missing some important factor..
Are you doing only a single factor correlation or are you doing multivariate analysis.. if you are doing multivariate analysis then you should also worry about the interdependence between the selected independent variables - if i remember my statistics correctly it is to with Pearson's correlation coefficient.
Regards
From India, Delhi
Are you doing only a single factor correlation or are you doing multivariate analysis.. if you are doing multivariate analysis then you should also worry about the interdependence between the selected independent variables - if i remember my statistics correctly it is to with Pearson's correlation coefficient.
Regards
From India, Delhi
Thanks guys! Your replies are really meaningful to me!
First of all, I am doing a market survey analyses. All I have are the P25,P50,and P75. I remember that my teacher told me: the R square would be accept if it >= 0.95, that means the market data has high validity. If the R square is low, we should "trim" the original data. In this case, I only use Annual salary and the job grade to derive the regression line.
Dear Autumn Jane, your explanation is terrific, but I don't understand why the Internal Equity Analyses can determine the number of pay structures an organization should have. Could you give any further explanation?
From China, Chongqing
First of all, I am doing a market survey analyses. All I have are the P25,P50,and P75. I remember that my teacher told me: the R square would be accept if it >= 0.95, that means the market data has high validity. If the R square is low, we should "trim" the original data. In this case, I only use Annual salary and the job grade to derive the regression line.
Dear Autumn Jane, your explanation is terrific, but I don't understand why the Internal Equity Analyses can determine the number of pay structures an organization should have. Could you give any further explanation?
From China, Chongqing
You are using univariate analysis; however, as there are people with variable experience and qualifications you should be using multivatriate analysis as the pay just does not depend upon the grade alone.
From United Kingdom
From United Kingdom
R square value is 0.8, its mean your independent variable is expalining 80% of correct relationship between dependent variable. R-square equal to 0.5 or greater then is acceptable. Remember one thing R-sqaure never be equal to 1, if it is 1 its mean 100% relationship which is not part of statistical model. 1 or 100% is only part of mathmetical expression.
If your independent variable is more then 1, you should also check tolernace and variane inflation factor(VIF).
Basicaly in technicla term R-Square explaing the varince(Variation) of independent variable in dependent variable.
Just suppose you check raltionship of salary with age.
Salary is your dependent variable and age is idependent variable.
If R-square explains 0.8. Then we can interprate, 80 % variation in salary is explain by age.
From Pakistan
If your independent variable is more then 1, you should also check tolernace and variane inflation factor(VIF).
Basicaly in technicla term R-Square explaing the varince(Variation) of independent variable in dependent variable.
Just suppose you check raltionship of salary with age.
Salary is your dependent variable and age is idependent variable.
If R-square explains 0.8. Then we can interprate, 80 % variation in salary is explain by age.
From Pakistan
Dear Yoyi
Let me come back to you regarding using Internal Equity Analysis to determine the number of Pay Structure(s) an organization should have as I am trying to look through my projects to extract the relevant materials for pictorial explanation.
In the meantime, would be be able to clarify how are job grades classified in your organization,e.g. job grade 1, 2, 3....?
Regards
Autumn Jane
From Singapore, Singapore
Let me come back to you regarding using Internal Equity Analysis to determine the number of Pay Structure(s) an organization should have as I am trying to look through my projects to extract the relevant materials for pictorial explanation.
In the meantime, would be be able to clarify how are job grades classified in your organization,e.g. job grade 1, 2, 3....?
Regards
Autumn Jane
From Singapore, Singapore
Community Support and Knowledge-base on business, career and organisational prospects and issues - Register and Log In to CiteHR and post your query, download formats and be part of a fostered community of professionals.