Clinical Café by Tina Radichel, M.S., CCC-SLP
Lazy, Hazy, and Crazy Summer Days
Want to keep the wheels turning during the summer months? Want to simplify your life? Look no further! In an attempt to ease your burden in having to repeatedly clarify terminology regarding types of tests, this month’s Clinical Café focuses on some often-confused terms. Spread the word! Place copies of this issue by the napkin holders in the staff lounge. Slip a copy under the door of your scanning department or your technology office. Give a prize to the person who uses them correctly in a report or a presentation. You have all summer to brainstorm.
Norm- vs. Criterion-Referenced Tests
Norm-referenced tests use a representative group to compare against an examinee’s performance. This representative group is gathered carefully and tested in a standardized way so that the group is representative of the entire population for which the test is intended.
Criterion-referenced tests use a set of benchmarks, or criteria, which have specific expectations of mastery. An examinee’s performance is then compared to these expectations of content mastery or performance—that is, to him/herself, not to any reference group.
Pre-learning: Diagnostic tests are the ones that we as SLPs are usually most familiar. These tests measure knowledge and skill areas of an examinee “left to his/her own devices.” We complete diagnostic tests first to accurately place students in the intervention program most suitable to their needs. The OWLS and the CASL would be considered diagnostic tests because they point to a specific direction in intervention.
During learning: Formative tests offer information about learning in the middle or throughout the learning process. Formative testing can take the shape of learning self-assessment, quizzes, practice tests, or observations.
Post-learning: Summative tests make a final, end-of-course judgment on the intervention or learning and its relative outcome and success/failure of the examinee. National certification exams, like our NESPA exam, are summative tests. The ACT and SAT also fit into this category. Not to confuse anyone, but SLPs obviously also use our diagnostic tests post-learning. Still, these tests are better classified as diagnostic tests, which are given repeatedly to track progress (for medium stakes purposes—oops, I’m getting ahead of myself. Read on!).
Low/Medium/High Stakes Tests
We’ve been hearing quite a bit about these different types of tests in recent months. Dividing tests up along this continuum is descriptive because “low, medium, or high stakes” accurately describes the different levels of impact tests can have on examinees. Whether you’re talking about verifying the identity of the individual, the item and test rigor of creation and review, the need for item and test administration security, or the number and scope of consequences as well as the “stakes” of decisions made based on the results, the category that the test falls into accurately reflects all of these components. For example, the ACT is a high stakes test. It may impact college entrance. The items and test are ultimately secure. And I’m sure you all remember the “what you need to bring to this test” form, which includes proofs of identity and signatures of the stoic proctor. Most of our speech and language tests are medium stakes tests. The items are secure due to the investment in the norms development, the test is administered by an examiner or proctor, and it generates reports that point to key placement decisions and intervention programs. Low stakes test examples include student self-assessments, like our Career Decision Making (CDM) system, which offers direction and planning for examinees and an opportunity to develop motivation and thinking skills.
Speed vs. Power Tests
Timed tests are usually assessing how fast examinees can go against how much they really know. Certainly, there are elements of both speed and power in timed tests. However, when you remove the speed demand on the examinee, the test can truly become one of power—that is, a test that measures an examinee’s ability and knowledge (remember, “Knowledge is Power!”). Examples of speed tests are the ACT/SAT tests. Tests of power include the PPVT-III, and other untimed tests. At Pearson, all of our speech and language tests are built and standardized for an untimed administration—we want to know about speech and language power!
Putting It All Together
Here are three examples of how these test types might all work together:
- Your school develops an end-of-year (summative) curriculum-based assessment program (criterion-referenced) for measuring progress and determining success of the students against the curriculum content (medium stakes). The tests are completed over the course of the last week of school in the classroom and are untimed (power).
- The local school district has purchased a set of items from an outside vendor. These items have been developed by teachers or content experts but do not have supporting, nationally developed norms and, therefore, cannot be used to compare students to their peers (which would be norm-referenced). The items sit on the school district’s network and are open to all teachers who want to create and deliver a test to their students throughout the year via self-directed computer time (low stakes) to check their learning against the state standards (formative). The tests give the students 10 minutes to answer 20 questions (speed), and provide feedback to the students in their strong and weak skill areas.
- You give the PPVT-III to five students on your caseload at the beginning of the year (diagnostic). You compare their results to the norms tables in the published manual (norm-referenced) and, with the rest of your assessment, make decisions about curriculum planning and/or interventions for the year (medium stakes).