Our Blogs

Share in practical tips and insights, inside information, stories and recollections, and expert advice.
Submit Your Question

Ask The Expert

Practice Effects in Testing


A recent question came to us from a colleague in Pennsylvania.

Q: What is current thinking on practice effects with standardized testing? How often is it ok to repeat tests like the PPVT-4, the CASL, or the CELF-4?

A: It depends (sorry, we know it’s easier to have black and white answers!). Most tests should have information in the manual about a development study in test-retest reliability–that is, the reliability of an individual’s performance over time. To help determine the risk of “practice effects” (i.e., low test-retest reliability), you need to consider the domain being measured, what research has been done to show the impact of practice effects between administrations, and the circumstances of your original administration.

As an example, in the PPVT-4 manual, pages 55-57, there is a description of the test-retest study completed during standardization. The window of time between administrations was a minimum of 14 days and averaged four weeks. In this 300+ person study, the reliability of the scores averaged a very high .93, which means that the PPVT-4 is quite resistant to practice effects given that window of time between administrations. Other tests will have different retest windows and should give guidance on a recommended “wait time” between administrations, with the usual caveats. Certainly, CELF-4 and CASL include this information in their manuals as well.

Another option to consider is the use of parallel forms, where available. In the case of PPVT-4 or EVT-2, for example, a second parallel form exists and you may choose to use the alternate form (i.e., a completely different but similar in difficulty item set) for your next test administration. This is one of the benefits of having two (or more) forms of a test.

Finally, for those tests without parallel forms for children, one set of giudelines might be that you allow enough time to elapse so that:

1. the examinee is now in the next norm group (e.g., 3-6-12 month interval, depending on the content and the norms,

2) the examinee no longer remembers the test items, OR

3) the examinee appears to have made progress (otherwise, why test?)

A final consideration is that if the examinee is sick or has other reasons for not participating in the original administration, you probably can feel confident testing right away again as soon as the individual feels better.

Hope this helps…as always, it’s a somewhat nuanced answer depending on the situation, examinee, and test. The best advice is to consult the test manual for direction. Feel free to continue the conversation with your comments below!

Using PLS-4 With Families Who Speak Multiple Languages


Question, from Terri W.:
Our SLPs have been having discussion about using the PLS-4 with families who speak more than one language. If a family is reporting that they speak both English and arabic at home and that the child understands both languages, can you report the standard score and percentile or should you just be reporting the raw score? We have reviewed the manual and cannot find the information.
Answer:
3.5% of the PLS-4 standardization sample spoke a language in addition to English (see Table 6.13 in the Examiner’s Manual). The standardization sample included children who “could speak and understand English and were able to take the take in the standard fashion without modification.” [page 175, Examiner's Manual]
If the child you are testing responds well in the test environment and understands and speaks English well enough to take the test in the standard fashion without modification, you can use the standard scores.
If the child you are testing is unfamiliar with and uncomfortable with participating in the test tasks with an unfamiliar adult or lacks proficiency in English, you should try alternative testing strategies (e.g., dynamic assessment; language sampling) or describe the skills the child was able and unable to do in the PLS-4 test session. Raw scores provide no information and should not be reported.

Evidence-Based Practice: Clinician’s Tutorial for What Works


Dr. Chad Nye, editor of Evidence-Based Practice Briefs, presented a one hour webinar “Evidence-Based Practice: A Clinician’s Tutorial for What Works!” on Thursday, December 16, 2010.

This session will help you make clinical decisions informed by evidence based practices. Information presented will help you follow systematic review procedures and interpret the quality of evidence presented.

Download a PDF of Dr. Nye’s slides by clicking here.

If you attended the webinar you will receive an email with all the information you need to receive ASHA CEU credit for the live event. This recording is not offered for ASHA CEU credit.

When are test norms “outdated?”


PLS Picture Book, Revised Edition (from 1979!)

When are test norms “outdated?” Ah, the age-old question. There is really no “number” or length of time that determines when norms are “out of date.” The fact is that one day makes any data set (whether in test norms or a journal article) one day older. The Standards for educational and psychological testing, which is the guiding document for most test publishers, uses the word “periodically” in the section on test revisions (I’ll do a quick excerpt here to save everyone’s time):

“Tests and their supporting documents…are reviewed periodically to determine whether revisions are needed. Revisions or amendments are necessary when new research data, significant changes in the domain, or new conditions of test use and interpretation would either improve the validity of interpretations…or suggest that the test is no longer fully appropriate for its intended use” (p. 42).

The paragraph goes on to discuss the difference between outdated norms and outdated item content, which are two different things, of course. This is not a black and white issue—and just like language, the nuances are critical. Some of our content domains, like vocabulary, change more often than other more “stable” domains, like the acquisition of basic syntactic structures or phoneme acquisition (although in the latter domain, the definitions of “mastery” of a phoneme vary widely). It’s true that a general rule of thumb for test revision tends to be 8-10 years, but that’s as much a practical matter as a data-based one.

The number of factors in making a clinical decision on whether or not to use any assessment tool (whether normed or not) makes our roles as professionals all the more important. Certainly, the older the norms the more critical we should be of the validity (i.e., the use of the norms as stated in the manual) of a test instrument. Yes, some states have made a gray issue into a black and white one by setting a specific number of “years old” that any norms set can be. But the “story” of any test is so much richer than just a number…and while one use of an assessment tool may be inappropriate in a given context, there may be other valid uses that still exist for a particular instrument.

As a final aside, ASHA echoes and supports the use of the Standards as a guiding document for test use in our profession:
Code of Fair Testing Practices in Education. (2004). Washington, DC: Joint Committee on Testing Practices.

Perhaps a gift to your colleague(s) for the holidays? It’s not “warm and fuzzy,” but sitting next to your APA Style Guide, it’s not a bad idea.

(Note: The author of this blog post has no affiliation with the publishing of the Standards nor receives any benefit from the promotion of the book!)

Dr. Zimmerman Interview


Irla Lee Zimmerman, P.h.D., one of the authors of the Preschool Language Scale (PLS), sat down with us last month at The American Psychological Association’s annual convention. We asked her about how she got into the field of speech-language pathology, and for some insight about additional ways you can use the information you get from PLS-4.

Dr. Zimmerman would love to hear from you about why and how you use PLS– she says so at the end of this video clip! Let her know in the comments section below.

Vocabulary Testing: Diagnose, Track Growth, and Intervene


Kathleen T. William, Ph.D., NCSP

Kathleen T. William, Ph.D., NCSP

Kathleen T. Williams, PhD, NCSP, (pictured at left) discussed vocabulary assessment, intervention, and how to measure progress. She also gave some examples from the PPVT-4 and EVT-2. You can watch the recording below. You can download the slides right here.

Did you attend the webinar? Follow it on Twitter? What did you think? Let us know in the comments or maybe even email Dr. Williams directly!

If you attended the webinar but did not receive an email with instructions for getting CE credit, please let us know.

Nancy Helm-Estabrooks Talks about Aphasia


At the end of June (National Aphasia Awareness Month), we caught up with Nancy Helm-Estabrooks, Sc.D, author of the Cognitive Linguistic Quick Test (CLQT). We talked about aphasia, the CLQT, and how she got into the field.

CELF-4: Question About the Following Directions Subtest


Question:

(via Tanya Coyle, M.Sc., S-LP(C), Reg. CASLPO)

“I have a question about the Following Directions subtest of the CELF-4.  Last year I was reassessing a student and during the FD subtest my gut told me he was doing MUCH better than the year before and had made great improvements.  I also felt that he was probably age appropriate or possibly mildly delayed, based on his performance.  When I scored him and checked the norms, he came out as only a SS of 4 and I was shocked that he could have done that poorly.  I rarely look up age equivalents, since they are problematic, but checked and his score described him as 8:2.  He was 9:0 with a raw score of 41.  This did not follow, as performing similarly to an 8 year-old didn’t seem all that bad for a just-turned 9 year-old (certainly not severely delayed).

I did some more checking and have concerns about the ‘age leap’ norms for Following Directions just at the 9 year-old level.  I realize that you are suddenly giving a 23 point credit to 9 year-olds that the 8 year-olds don’t get, but even if my student had made an error on 9 of the 23 items a week before I had tested him, when he was still 8:11, he would have come out as a SS of 7; a rather large difference from a SS of 4!  The difference between low average-mild and severely delayed is rather stark.  I did give him the first part, for goal writing purposes, and he made errors on 4 of the first 23 items.

I am wondering if there is a normative data mistake or problem in the jump from 5-8 and 9-21 for FD? Is there an explanation for what happened with my student?”

Elisabeth H. Wiig, PhD

Elisabeth H. Wiig, PhD

Dr. Wiig’s Answer:

You are indeed correct when you noted that there is a large bump in scores at age 9. The same raw score at age 8 would result in a standard score of 9-10 and at age 9 the same raw score is a standard score of 4. To perform in the average range at age 8, the student would have had to receive a raw score 46 or higher (5-7 additional raw score points.) When you look at the raw score means in Table 6.12 in the Examiner’s Manual, you’ll see that there was a big jump in the mean performance for the children in the standardization sample on the this subtest between ages 5:0 and 5:6 (6 standard score points), 6:0 and 6:6 (5 standard score points), 6:6 to 7:0 (5 standard scores points) and ages 8 and 9 (a 7 point jump). Improvement in these skills levels off after that. The norms for age 9 include children from 9 years, 0 months, 0 days to 9 years 11 months, 30 days. When you test a student who is at the very bottom of an age range, you are comparing that child to children who are mostly older than he or she is, and there is obviously a great deal of growth that occurs at this pivotal age.

J. Scott Yaruss, PhD talks about the OASES


We caught up with Dr. Yaruss at the 2009 ASHA Convention and asked him a few questions about the Overall Assessment of the Speaker’s Experience of Stuttering (OASES).

CELF-4: 31 Point Difference Between Language Content and Working Memory


Question:

I have a student who received the following index scores

Core

112

Receptive

105

Expressive

120

Language Content

125

Language Memory

106

Working Memory

94

There is a 31 point difference between Language Content and Working Memory. I used Tables 3.5 and 3.6 to get the Critical Value, and Prevalence. However, I am not sure I understand well enough to explain this to someone else. Could [you] describe this in a way that will help me better understand the importance of the 31 point difference?

Dr. Wiig’s Answer:

My first question would be, “How old is the student?”  Working memory deficits are reflected more and more as the student moves into adolescence. In the case you describe, the intra-personal weakness in working memory (Index score 94) as compared to the level of language content may not assume significance if the semantic aspects of language and communication are strong or exceptional, as in this case. The acquisition of vocabulary, word meanings and concepts is not as dependent on working memory as other aspects of language and communication such as creating meaningful communication when several aspects need to be integrated.