The Reliability & Validity Of Pre-Employment Assessments
Pre-employment assessments are professionally considered a good indicator of job-fit because of their reliability, validity, and fairness. When hiring managers put their trust on our assessments to make the best hiring decision, it is our duty to ensure that the instruments we use are dependable and sound.
A test is considered ‘good’ if it meets the three criteria of being reliable, valid, and fair. In other words, the results of our hiring assessments should be consistent, should measure what it claims to measure, and provide equal opportunity to all test takers. As we’ve touched on fairness in a previous post about Adverse Impact Analysis, this piece will focus on explaining reliability and validity.
Whenever we assess reliability, the main questions that we repeatedly ask ourselves are, “Do the results today resemble the ones from yesterday? Will it remain the same tomorrow? What about next week? A year from now?”
A reliable test should provide results that are similar no matter when we take it. Although minor differences can be expected if taken at different points in time, these discrepancies can be calculated statistically using Cronbach’s alpha, also known as the coefficient alpha. In scientific standards, an alpha of 0.7 and above is considered acceptable for most psychological measures.
Simply put, a test is valid if it satisfies its objectives. For instance, an intelligence test should measure intelligence; a personality test should measure personality; a Pottermore Sorting Hat Quiz should measure exactly which Hogwarts House you belong to.
Validity can be further dissected into several subsets, or as we like to picture it as – a multi-headed creature in mythological tales. But contrary to the myths, every subset plays a key role in justifying the test’s legitimacy; cutting one (head) off can directly impair its whole existence.
In research, this is considered the easiest form of validity. We can call an assessment face valid if the items on a test seem to be effective in terms of its stated aims. Note, the key word here is ‘seem’. Think of it as face value, where you scan just the surface to form an opinion. In the case of a pre-employment assessment, face validity simply relies on a subjective impression of the test – both lexically (the words or vocabulary used) and semantically (the meaning of these words).
Content validity ensures that only items relevant to job-related behaviours are used in the assessment. A personality test, for example, should cover areas such as openness, conscientiousness, extroversion, agreeableness, and neuroticism to determine a candidate’s qualification for the role. If some of these items are left out, the results may not be an accurate indication of a person’s temperaments. Likewise, if contents unrelated to personality are included in the test, the results will no longer be deemed valid.
The main question we need to answer when we deal with criterion-related validity is whether the test results correlate to external criteria like job performance. In other words, individuals who score high on the assessment tend to perform better on the job than those who score less. Our job is to make sure that the correlation between the two are high. Following psychological standards, a correlation coefficient ranging from 0.7 to 1 is considered strong and perfect.
All the other types of validity described above can be seen as forms of evidence for construct validity. Think of it as building a case – the more evidence you have, the more confident you can be to prove your test is worthy. Of course, these ‘evidence’ needs to be updated regularly as well to ensure that our test remains fresh, modern, and most of all – valid.
Summing It All Up
Now that we’ve looked into the nature of the two major criteria, the next step is to understand how they apply to our tests. Picture yourself hitting a target, like in archery or darts. As we all know, the goal of any shooting-type exercise is to hit the target; this goes with creating a good pre-employment assessment. Let’s look at the following three configurations:
Not reliable, not valid.
The shooter is way off the mark here and none of the shots are near each other – failing at both the objective of the game and not consistently hitting the same spot. Assessments that are neither reliable nor valid are considered a ‘bad’ test.
Reliable, but not valid.
Better than the previous configuration, the shooter is consistently hitting the same area. However, it is still way off target. A pattern like this means our test is consistent; but isn’t measuring what it’s supposed to measure. It is important to note that a test can be reliable and not valid, but not the other way around.
Reliable and valid.
You’ve hit the bullseye! If our test behaves this way where all shots occur at the same place and at the center, it means we have achieved our goal – an assessment with consistent results that can provide insights to job performance. This is the ideal we are always aiming for.
Hiring can be risky if your decisions rely mainly on gut feeling. At Prevue, we aim to provide hiring solutions that will give you a reliable, valid, and fair outlook on your candidates. To learn more about our Prevue Assessment Suite, click here.