Norms Development Details You Should Know
Can you believe that it’s almost September? Where did the summer go? A hearty “welcome back” to all our school-based colleagues!
One of the first things that I know you’ll be doing upon returning to school is making decisions about students on your caseload. You need to determine what testing, if any, you’ll need to complete in your assessment process. You might be picking up a freshly sharpened pencil, a cup of coffee, and the new test that you ordered months ago and attempt to recall “What is this one all about?”
When you make the decision to use a test to evaluate a student’s performance or progress, it’s wise to understand the background of the test and ensure that you are able to translate that background into your theory of speech/language development and the goals of testing. Knowing more about test development also helps you analyze and interpret the results you get from testing and understand why results different from your expectations might have occurred.
So, to help you in this regard, we offer another “look under the hood” of test development. The information below is adapted from an article by Dr. Mark H. Daniel, AGS Publishing’s Executive Director/Senior Scientist. For a quick review of test terminology, you might want to sharpen your teeth on the Pearson’s Assessment group Glossary of Common Test Terms.
Norms can be constructed once a test has been administered to a nationally representative sample. This process involves judgment as well as analysis. For example, decisions need to be made about how to group the data and whether or not to normalize standard scores. In most cases, the process begins with estimating percentiles.
How are percentile norms computed?
A primary objective of norms development is to smooth out irregularities while preserving the true shapes of the within-year distributions and the age trends. [Remember the bell curve? Like many things in life, it's not always perfect!] These irregularities are introduced by sampling error, which occurs even though the norm sample has been carefully selected so the demographics of each subgroup (e.g., age or grade) match those of the U.S. population.
In the most common norming method, the overall sample is first divided into subgroups. For example, groups for grade norms might correspond to the spring and fall of each grade, whereas those for age norms might be formed from 3-, 6-, or 12-month ranges. Within each subgroup, the raw scores corresponding to selected percentile points are identified. Then, for each percentile point, the across-age or across-grade progression is smoothed to remove random deviations from a regular pattern of growth.
Next, each within-subgroup distribution of scores (as adjusted by the previous step) is smoothed to eliminate lumps or gaps. These two smoothing operations—across and within subgroups—are alternated (by computer) until the results stabilize. This results in a set of percentiles that progress steadily across age or grade and is smoothly distributed within each age or grade. Percentiles for any point on the age- or grade-range can then be read from the smoothed growth curves.
Continuous norming. If using a continuous norming approach, a large number of overlapping subgroups centered on each individual month of the age or grade range is created instead. For example, one subgroup might be centered on age 6:4, the next on age 6:5, and so on. Each subgroup has enough cases to permit calculation of percentile points for each month. These values can be smoothed across and within subgroups as described above.
How are standard scores computed?
Usually, raw scores at any specific age or grade are not normally distributed, even after smoothing. The distribution may be skewed (stretched out farther in one direction than another). Or it may be flatter or taller than a normal distribution.
There is often a theoretical reason (particularly with ability tests) to expect the true distribution to be normal and, thus, to assume that any non-normality is an artifact. If this assumption is made, then normalized standard scores are constructed. Each percentile point is converted into the standard score that would correspond to that percentile in a normal distribution. Because normalized standard scores are derived from percentiles, all tests using such scores show the same relationship between standard scores and percentiles.
If the underlying distribution is not assumed to be normal (e.g., with GFTA-2 and KLPA-2), then standard scores generally are constructed directly from raw scores. This might be the case with a behavior-problems scale on which most individuals score in a normal range and a few have extreme scores in one direction. Here, one would likely compute linear standard scores, which reflect the distance of each raw-score value from the mean in standard-deviation units. The relationship between linear standard scores and percentiles will vary from test to test.
Send us your “What I’d like to learn about tests this year” list
As your partner in testing, we’d like you to know what we do, how we do it, and why. In turn, we’d like to know what other information we could provide to help you in your jobs. So send us your “What I’d like to learn about tests this year” list to firstname.lastname@example.org. And we’ll try to fulfill your wishes.
Fall is back to the books time. For you, this means test manuals. Happy reading! There’s more good stuff to discover in our manuals…really, there is.
Seize your September!