Hi, all I am doing an analysis of a salary survey. the regression presents a R square = 0.8, the adjust R square = 0.5. How to trim the database to make a better R square? Thanks for any help.
From China, Chongqing
From China, Chongqing
Dear Yovi,
Please provide more details about the nature of the data you have collected. What do you mean by "Trimming the database..."? How many variables does your database contain? The R-Squared value is for a single explanatory variable, and the Adjusted R-Squared is for multiple explanatory variables.
Have a nice day.
Simhan
From United Kingdom
Please provide more details about the nature of the data you have collected. What do you mean by "Trimming the database..."? How many variables does your database contain? The R-Squared value is for a single explanatory variable, and the Adjusted R-Squared is for multiple explanatory variables.
Have a nice day.
Simhan
From United Kingdom
Dear Yovi,
The two common variables in salary regression can be:
1. Salary & Job Points (if jobs are evaluated)
2. Salary & Age
3. Salary & Tenure...etc
Whichever variables are used, you are trying to establish the correlation between the two.
In conducting a salary survey, "trimming the database" refers to identifying "data outliers" or "extreme data points" and excluding them in the analysis because including these data will skew the results either upwards or downwards, and trends/norms will not be able to be established. For example, if I have five Production Workers, with four of them receiving a salary within the range of $1000 to $2000 but the fifth one receiving $5000. The fifth is considered an "outlier" because including this data point will skew the analysis.
To identify "outliers," you need to perform a Standard Deviation Analysis (use Excel), set the desired Deviation step (e.g., 1, 2, or 3), and run the analysis. The Deviation step is anchored on the size of your data sample.
It is a good practice to run two sets of regression salary - one before the "trimming" to depict the current situation and another after the "trimming" to depict the desired situation.
"Trimming" is not just for show or presentation; it indicates an area of concern for the company that must eventually be addressed.
Please see the sample attachment.
When R=1, you have a "perfect" correlation, but this is rarely the case in real life. To conclude whether two variables are "relatively correlated," the minimum is at R=0.8 (but it also depends very much on your desired standard).
Regards,
Autumn Jane
From Singapore, Singapore
The two common variables in salary regression can be:
1. Salary & Job Points (if jobs are evaluated)
2. Salary & Age
3. Salary & Tenure...etc
Whichever variables are used, you are trying to establish the correlation between the two.
In conducting a salary survey, "trimming the database" refers to identifying "data outliers" or "extreme data points" and excluding them in the analysis because including these data will skew the results either upwards or downwards, and trends/norms will not be able to be established. For example, if I have five Production Workers, with four of them receiving a salary within the range of $1000 to $2000 but the fifth one receiving $5000. The fifth is considered an "outlier" because including this data point will skew the analysis.
To identify "outliers," you need to perform a Standard Deviation Analysis (use Excel), set the desired Deviation step (e.g., 1, 2, or 3), and run the analysis. The Deviation step is anchored on the size of your data sample.
It is a good practice to run two sets of regression salary - one before the "trimming" to depict the current situation and another after the "trimming" to depict the desired situation.
"Trimming" is not just for show or presentation; it indicates an area of concern for the company that must eventually be addressed.
Please see the sample attachment.
When R=1, you have a "perfect" correlation, but this is rarely the case in real life. To conclude whether two variables are "relatively correlated," the minimum is at R=0.8 (but it also depends very much on your desired standard).
Regards,
Autumn Jane
From Singapore, Singapore
Thank you, Autumn Jane, for a clear explanation of what "Data Trimming" is in this context. As the R-Squared value shown was 0.8 (quite high), I did not think about the outliers. There is a good explanation of this at http://www.statisticaloutsourcingser...m/Outlier2.pdf. However, it is not with respect to pay analysis.
Have a nice day.
Simhan
From United Kingdom
Have a nice day.
Simhan
From United Kingdom
Dear Simhan,
You are absolutely correct in stating that a 0.8 R-squared value is indeed quite high. However, for salary analysis, just one outlier, whether it is above or below the expected range, can lead to significant morale issues throughout the organization. Therefore, the tighter the control over the spread of data, the more reliable and valid the analysis becomes.
Have a nice day.
Regards,
Autumn Jane
From Singapore, Singapore
You are absolutely correct in stating that a 0.8 R-squared value is indeed quite high. However, for salary analysis, just one outlier, whether it is above or below the expected range, can lead to significant morale issues throughout the organization. Therefore, the tighter the control over the spread of data, the more reliable and valid the analysis becomes.
Have a nice day.
Regards,
Autumn Jane
From Singapore, Singapore
Dear Yoyi,
Getting a 0.5 R-square is not worthless. It is telling you that the factors you have chosen as the independent variables are not really the sole ones determining salary, and you are missing some important factors. Are you doing only a single-factor correlation, or are you doing multivariate analysis? If you are doing multivariate analysis, then you should also worry about the interdependence between the selected independent variables. If I remember my statistics correctly, it has to do with Pearson's correlation coefficient.
Regards
From India, Delhi
Getting a 0.5 R-square is not worthless. It is telling you that the factors you have chosen as the independent variables are not really the sole ones determining salary, and you are missing some important factors. Are you doing only a single-factor correlation, or are you doing multivariate analysis? If you are doing multivariate analysis, then you should also worry about the interdependence between the selected independent variables. If I remember my statistics correctly, it has to do with Pearson's correlation coefficient.
Regards
From India, Delhi
Thanks, guys! Your replies are really meaningful to me!
First of all, I am doing a market survey analysis. All I have are the P25, P50, and P75. I remember that my teacher told me: the R square would be acceptable if >= 0.95, that means the market data has high validity. If the R square is low, we should "trim" the original data. In this case, I only use Annual salary and the job grade to derive the regression line.
Dear Autumn Jane, your explanation is terrific, but I don't understand why the Internal Equity Analysis can determine the number of pay structures an organization should have. Could you give any further explanation?
From China, Chongqing
First of all, I am doing a market survey analysis. All I have are the P25, P50, and P75. I remember that my teacher told me: the R square would be acceptable if >= 0.95, that means the market data has high validity. If the R square is low, we should "trim" the original data. In this case, I only use Annual salary and the job grade to derive the regression line.
Dear Autumn Jane, your explanation is terrific, but I don't understand why the Internal Equity Analysis can determine the number of pay structures an organization should have. Could you give any further explanation?
From China, Chongqing
You are using univariate analysis; however, as there are people with variable experience and qualifications, you should be using multivariate analysis as pay does not depend solely on the grade.
From United Kingdom
From United Kingdom
R-squared value is 0.8, which means your independent variable is explaining 80% of the correct relationship between the dependent variable. An R-squared value equal to 0.5 or greater is acceptable. Remember, one thing, R-squared can never be equal to 1. If it is 1, that means a 100% relationship, which is not a part of the statistical model. The value of 1 or 100% is only a part of a mathematical expression.
If you have more than one independent variable, you should also check tolerance and variance inflation factor (VIF). Basically, in technical terms, R-squared explains the variance (variation) of the independent variable in the dependent variable.
Just suppose you are checking the relationship between salary and age. Salary is your dependent variable, and age is the independent variable. If the R-squared value explains 0.8, then we can interpret that 80% of the variation in salary is explained by age.
From Pakistan
If you have more than one independent variable, you should also check tolerance and variance inflation factor (VIF). Basically, in technical terms, R-squared explains the variance (variation) of the independent variable in the dependent variable.
Just suppose you are checking the relationship between salary and age. Salary is your dependent variable, and age is the independent variable. If the R-squared value explains 0.8, then we can interpret that 80% of the variation in salary is explained by age.
From Pakistan
Dear Yoyi,
Let me come back to you regarding using internal equity analysis to determine the number of pay structures an organization should have as I am trying to look through my projects to extract the relevant materials for pictorial explanation.
In the meantime, would you be able to clarify how job grades are classified in your organization, e.g., job grade 1, 2, 3...?
Regards,
Autumn Jane
From Singapore, Singapore
Let me come back to you regarding using internal equity analysis to determine the number of pay structures an organization should have as I am trying to look through my projects to extract the relevant materials for pictorial explanation.
In the meantime, would you be able to clarify how job grades are classified in your organization, e.g., job grade 1, 2, 3...?
Regards,
Autumn Jane
From Singapore, Singapore
Looking for something specific? - Join & Be Part Of Our Community and get connected with the right people who can help. Our AI-powered platform provides real-time fact-checking, peer-reviewed insights, and a vast historical knowledge base to support your search.