Our Blogs

Share in practical tips and insights, inside information, stories and recollections, and expert advice..

Entries Tagged With: PPVT-4

Monitoring Progress…the Easy (or Easier) Way!

These days, making statements about progress are increasingly important as we seek to document our efforts in each and every practice setting where SLPs and audiologists serve individuals with communication disorders. To that end, using scores that are sensitive to smaller changes in performance over time are critical. There are a number of Pearson products that currently have growth scores:

But what exactly is a growth score, and how is it used? Using the PPVT-4 test as an example, you can read a brief excerpt from the test manual for a definition below. In the case of the PPVT-4 and EVT-2, the growth score is titled “Growth Scale Value” or GSV:

The GSV score is useful for measuring change in…performance over time. The GSV is not a normative score, because it does not involve comparison with a normative group. Rather, it is a transformation of the raw score and is superior to raw scores for making statistical comparisons (p.18).

For a little more background on growth scores, you can read another set of comments in the PPVT-4 test manual regarding the GSV:

The GSV scale was developed so that vocabulary growth could be followed over a period of years on a single continuous scale. Standard scores, percentiles, stanines, and NCEs compare an examinee’s vocabulary knowledge with that of a reference group representing all individuals of the same age or grade. In contrast, the GSV measures an examinee’s vocabulary with respect to an absolute scale of knowledge. The test performance of any examinee…can be placed on a [single] GSV scale. As an examinee’s vocabulary grows, the GSV will increase.

The GSV is an equal-interval scale. Therefore, GSV scores can be added, subtracted, or averaged. Furthermore, the fact that GSVs can be averaged makes this scale a useful one for tracking the progress of groups.

Standard scores and percentiles are less useful than GSVs for measuring growth, because the reference norm group changes as the examinee moves into a higher age or grade level. If a person’s vocabulary increases at the average rate, his or her standard score and percentile would stay the same, whereas the GSV score would increase (p.21).

In addition, each test manual should offer you the number of growth points needed to show statistically significant change at a particular age level. For example, 8 GSV points of change from one test administration to another is statistically significant on the PPVT-4 for individuals age 2:6-12. For children in this age range, if they increase 8 points on the GSV scale, you can be confident that the child’s vocabulary has truly increased.

A caveat: Using growth scores for measuring progress doesn’t mean standard scores are not important. Standard scores serve a very clear purpose and can be used reliably with growth scores. You can think of a growth score as a complementary tool to a standard score; each score tells you something different about the individual’s performance and creates a clearer picture of change over time. The growth score indicates whether there has been improvement, and the standard score indicates whether the rate of improvement has been above or below the average rate for the child’s peers.

So, as you consider the need to demonstrate growth in an individual you serve, do consider using the growth scores available in the above tests as well—and make your work easier!


Dunn, L. M. & Dunn, D. M. (2007). PPVT-4 Manual. Bloomington, MN: NCS Pearson, Inc.

What is your (sample) size?

Why do some tests have small samples, and others have huge samples; why is that OK? When talking about sample sizes in test development the conversation could quickly go statistical. That won’t happen here. Rather, this explanation is intended to be general in nature, and touch on some of the basic considerations for determining how sample sizes are determined for norm-referenced tests.

Several considerations go into determining sample size selection. These are:

  • The type of scores to be reported, whether
    • in broad ranges (e.g., below average, average, above average), or
    • on a continuum (e.g., age and grade equivalents)
  • The breadth of the scale, determined by the degree of behavioral variation seen in the population (i.e., low variation, high variation)
  • How the scores will be used (screener vs. placement; high stakes vs. low stakes)

In general, it’s preferable to have at least 100 persons contributing to each characteristic that defines the norm group (such as age, grade level, gender, race, SES, etc.) particularly when measuring constructs that are developmentally sensitive. As age increases variations in population become more stable, so fewer persons are required to obtain a statistically stable sample.

Let’s use two recently introduced Pearson tests as examples: the PPVT™-4 test and the OASES™ test. These products are quite different in their uses of scores and designs. The PPVT-4 is a vocabulary test, normed by age, and by grade in the spring and fall. It is intended primarily for use with children in early grades, but is also normed through age 81+ years. Scores are reported in standard score metric. The OASES is a product for adults who stutter, and is a self-rated survey. It has a criterion-referenced norm basis, meaning that scores fall into several (5) broad ranges of severity.

The PPVT-4 has a total sample size of 3540. The sample is broken out into many demographic variables, all of which closely mirror the U.S. population as a whole. There are 26 groupings by grade and 28 groupings by age. The age groups average126 persons. This many subjects was preferable because of the many combinations of age, grade, and demographic variables being considered, and the fact that the PPVT-4 measures abilities in children, which vary considerably particularly in the early grades. In addition, the PPVT-4 is often used to determine programming for children in schools.

In contrast, the OASES test has a total sample size of 173, which range more or less equally in age between 18 and 73. Why is this OK? Isn’t a bigger sample always better?

The OASES reports scores within broad ranges (Mild, Mild/Moderate, Moderate, Moderate/Severe, Severe). It is measuring a relatively narrow band of human behavior and reactions to those behaviors (stuttering), although variations among people who stutter can be large. The test is intended to be informative rather than diagnostic. The stakes associated with receiving one score or another on the OASES do not lead directly to a classification or impose life changes on the person. Additionally, stuttering is not an age-based or developmentally sensitive disorder. It affects adults of all races, gender, religious beliefs, geographic areas and socio-economic backgrounds more or less equally. So, these differences did not need to be accounted for in the sample. The OASES sample can be characterized as a clinical sample of adults who stutter, where age and clinical membership were the only variables that really mattered. So, a sample of 173 was sufficient, because the OASES is a “low stakes” kind of test, providing scores within broad ranges, and the scores on the test are relatively low in variation.

What does all of this mean for your practice? The sample sizes of quality testing tools can vary depending upon how the test’s scores will be used, and the variations in the population it measures. When the range of scores is expected to be wide within the population being measured, especially for children, a larger sample size is necessary. A test that is more focused, where people are expected to score within a narrower range, such as a screening test, may show very good validity and reliability results with a relatively small sample size. In all cases, the publisher should explain the reasons for their sample selection clearly, and with simple tables, in the test manual.

I hope this brief introduction helps you understand some basics of how norm sample sizes are determined for the tests you use. You should be able to find detailed information about sample size in the tables of your test manual (see http://www.speechandlanguage.com/cafe/june2007.asp).

The next level of detail about sample size in tests includes concepts like: standard distributions, confidence intervals, confidence levels, statistical significance, standard deviation, mean scores. But let’s save that for another day!

Parallel Forms: What They Can Do For You

Do you ever wonder about test forms when you pick up a test? Why does it matter if a test has two (or more) forms? Is it really worth the effort to figure out what to do with Form A and Form B? Does it really matter if a test has two forms?

Short answer: Yes, it matters, and yes, it’s worth the effort.

We’ll help you through the basics of test forms in this Clinical Cafe. We’ll explain the differences between forms, how to use Forms A and B, and what they can mean for your clinical practice.

Parallel forms of a test are statistically equal (or as equal as they possibly can be) in their ability to measure the target content area. “Alternate forms reliability” refers to the correlation between Form A and Form B; that is, how closely your results from each form would match if you gave them to the same person. The correlation (written as a decimal between .00 and 1.00) should be as close to 1.00—a perfect correlation—as possible; for high quality tests like the PPVT-4™ test and EVT-2, the correlation is 0.89 across all ages. Therefore, you should be able to give a student Form A or Form B from the same test and get very nearly the same score using either form.

So what is the difference whether you have Form A or Form B or both? Your options for timely and accurate evaluation expand when you own both forms. All you need to decide is which form to use first. If they are truly equivalent, or “parallel,” it won’t matter which one you choose. If you want to test the same student again in the same content area within a short timeframe (several weeks), as in the case of progress monitoring, use the alternate form. If you wait longer (months) before testing the same student again, you may be safe to use the same form again.

We’ve covered the development and appropriate use of parallel forms. But what is the value of parallel forms in your clinical work? You get more flexibility. You can get more frequent use out of the quality test you purchased. Consider just two scenarios: special client needs; and progress monitoring. You may have a student who you know to be bright, precocious, with a good memory. If you need to test this child again you could expect him/her to recall items, particularly pictures; possibly even perseverate on a particular picture. This is the time to use your alternate form.

Use of alternate forms for progress monitoring opens new horizons for your practice. Now you can use a test such as the PPVT-4™ or the EVT-2 for progress monitoring, without concern that your student will learn the test and invalidate the scores. Form B contains a different set of items, with scores on the same scale as Form A, and the scores can be compared equally. You can monitor the progress of your student’s ability more often, and be accurate and reliable in your measurement by using Form A and Form B. This can enhance your use of response-to-intervention (RTI) procedures. A test such as PPVT-4™ with Forms A and B becomes a Tier 2, and a Tier 3 RTI tool, using the alternate forms.

So, is it worth the effort to figure out what to do with Form A and Form B? And does it really matter if a test has two forms? It certainly is worth the effort, because it allows you to work smarter, and be more accurate in your assessment of children.

Testing 101

This month’s Clinical Café is a “back to the basics” discussion of common and often-discussed test types as well as the important concepts of reliability and validity. For new and veteran test users alike, you may find easy ways below to describe these ideas to others…and to refresh your own memory!

To begin, a standardized test is a test administered and scored in a standard manner. These tests are designed in such a way that the questions, conditions for administering, scoring procedures, and interpretations are consistent and are administered and scored in a predetermined, standard manner. “Standardized” may also refer to the reference of the score that a test-taker receives (i.e., a standard score).

Generally, there are two types of standardized tests: norm-referenced tests and criterion-referenced tests, resulting in a norm-referenced score or a criterion-referenced score, respectively. Norm-referenced scores compare test-takers to a group of same-age or same-grade peers. Criterion-referenced scores compare test-takers to a content performance level (i.e., a criterion), and may also be described as standards-based or curriculum-based assessment. Norm-referenced tests measure success by rank ordering students, while standards-based assessments allow that all students may score highly if they meet stated standards. Let’s look at each in a more in-depth way.

Norm-Referenced Tests (NRTs)

A norm-referenced test (NRT) compares an individual to a sample of his or her peers, referred to as a “normative sample.” NRTs compare test takers to each other. NRTs are designed to “rank-order” test takers—that is, to compare students’ scores. A norm-referenced test does not compare all the students who take the test in a given year. Instead, test developers select a subset of individuals (e.g., 50 ninth graders in 30 different states) from the target population (i.e., all ninth graders in the nation). The test is “normed” on this subset to fairly represent the entire target population—that is, the full range of “normal students.” The scores that you generate from individuals you test (e.g., ninth graders at your local high school) are then reported in relation to the scores of this “norming” group.

To make comparing scores easier, test developers often want results that look somewhat like a bell-shaped curve (i.e., the “normal” curve, shown in the diagram below). Most students will score near the middle, and some will score low (the left side of the curve) or high (the right side of the curve). Scores are usually reported as percentile ranks or standard scores. The scores range from 1st percentile to 99th percentile, with the average student score set at the 50th percentile. For example, if Steve scored at the 63rd percentile, it means he scored higher than 63% of the test takers in the norming group. It would also mean that Steve’s 63rd percentile rank equals a standard score of 105.  With standard scores average, or 50th percentile, always equals 100.  Scores also can be reported as “grade equivalents,” “stanines,” or “normal curve equivalents.” Some scores are derived from raw scores, and others are derived from standard scores.

The “bell curve” assumes a normal distribution of scores. A perfect curve never occurs, but if you sample enough people during norms development the whole group may give a result that is very close to this graphical profile.

Source: Dunn, L. M., Dunn, D. M. (2007). Manual: Peabody Picture Vocabulary Test, fourth edition. Bloomington, MN: Pearson Assessments.

Criterion-referenced tests (CRTs)

A criterion-referenced test is intended to measure how well a person has learned a specific body of knowledge and associated skills. Multiple-choice tests most people take to get a driver’s license and on-the-road driving tests are both examples of criterion-referenced tests. As on most other CRTs, it is possible for everyone to earn a passing score (e.g., 90% or better) if they know about driving rules and if they drive reasonably well.  Educators are concerned with students achieving passing scores on statewide standards.  In these kinds of tests there is an agreed upon set of criteria, and students are expected to score at a specified minimum level to pass.  Curriculum performance goals are another kind of CRT. To advance to the next learning packet, for example, the student must achieve 70% or better on the post test.

Testing with Reliability and Validity

Test reliability refers to the degree to which a test is consistent and stable in measuring what it is intended to measure. Reliability is a statistical estimation.  Most simply put, a test is reliable if it is consistent within itself and across time. To understand the basics of test reliability, think of a bathroom scale that gave you drastically different readings every time you stepped on it regardless of whether your had gained or lost weight. If such a scale existed, it would not be considered reliable.

Test validity refers to the degree to which the test actually measures what it claims to measure. Test validity is also the extent to which inferences, conclusions, and decisions made on the basis of test scores are appropriate and meaningful.

See the September 2006 Clinical Café article “The Validities” for some great examples of test validity.

The relationship of reliability and validity is straightforward. Test validity is required for a test to be considered reliable. If a test is not valid, then it cannot be reliable. And the converse is also true; if a test is not reliable it is also not valid.

All This Science…

Those of us who entered into a profession that favors highly verbal characteristics in people sometimes struggle with the quantitative aspects of research science.  We desire success in the clinical setting and over time we realize that art and science seem to meld together in practice. We are partly scientists! So, boring as it seems sometimes, stats and test theory are good for us to review; hopefully this refresher is useful for your practice.


Dunn, L. M., Dunn, D. M. (2007). Manual: Peabody Picture Vocabulary Test, fourth edition. Bloomington, MN: Pearson Assessments.

A Table Is Worth…

Being seated at a good table in a cafe is important to some people. To them, the ambiance is an important part of the meal. Other cafe patrons could care less about their table; they’re there for the food.

How about when you open the manual of your test? Are the tables important? Absolutely. The purpose of tables is to present test-related data to you in a clear and unambiguous manner. The presence and quality of tables offer you clues to the quality of the data and the quality of the entire test product. Obscured or missing data do and should raise red flags.

Standardization Sample Tables

The normative sample is the part of the population used to standardize a test. The normative sample should consist of a sufficiently large and random sample taken from the target population. If a test publisher has done an excellent job gathering a normative sample, the publisher will want you to know that. The simplest way to bring this information to your attention is a well-designed table. A table should highlight what is relevant and present the data in a format that is easy to understand. After all, the purpose of a table is to present complex information in a tidy format so you can further analyze what the author is saying.

What if there is no table in your test manual that shows the information you need to know? Maybe the statistical information you need is in the text, but chances are, if the data are not represented clearly in a table, they simply are not there. In this case, what is absent may speak as loudly as what is present.

One-Way versus Two-Way Tables

Here is a simple example of a one-way table. This is the simplest method for analyzing categorical (or nominal) data. It shows one kind of information about one variable. This kind of table is often used to explore data, for an initial look at what has been found. It is sometimes called a frequency table.

Category Count Percent
ALWAYS : Always interested 39 39
USUALLY : Usually interested 16 16
SOMETIMES: Sometimes interested 26 26
NEVER : Never interested 19 19
Missing 0 0

Here is a simple example of a two-way table. This is a combination of two (or more) one-way tables arranged such that each cell in the table represents a unique combination of information derived from the variables. Two-way tables allow examination of categories by frequencies of observations on more than one variable. By examining these frequencies, we can identify relationships between the variables.

GENDER: MALE 20 (40%) 30 (60%) 50 (50%)
GENDER: FEMALE 30 (60%) 20 (40%) 50 (50%)
50 (50%) 50 (50%) 100 (100%)

Two Examples

Here is an example of a two-way table from the PPVT-4 manual. It is actually a composite of four one-way tables. There is a wealth of information presented about the ethnic breakdown of the PPVT-4 norming sample in this table. In this example the sample population is described by four broad ethnic groups, and the table clearly indicates where each portion of the sample was drawn from by region of the USA. Plus, how that portion of the sample compares to US demographics overall.


Here is another example of a 2-way table from the Comprehensive Assessment of Spoken Language (CASL). The table clearly reports the number (N) and percentage of females and males in the sample at each age range, with a comparison between the sample and the U.S. population as a whole at each age range. You can see clearly that the sample size was sufficiently large to be statistically reliable. Also the percentage of subjects in each age and gender group closely matched the percentage of persons in that category in the U.S. population. For example, 50.7 percent of the 14-15 year olds in the sample were female, which closely matched the Bureau of the Census figure of 50.9 percent of 14-15 year olds in the U.S. population. So this table shows the closeness of the normative population on age and sex, simultaneously, rather than just one or the other.


In Closing

Next time you shop for a test, look at the tables in the manual. The strength of the product should be easily seen in them. Look for the data in plain view. If it isn’t easy to find in a simple clear format, you should ask why.

Good tables may not be important to everyone in a restaurant, but good tables are paramount for everyone who evaluates, chooses and uses tests. It is also paramount in serving your clients.


Carrow-Woolfolk, E. (1999). Comprehensive Assessment of Spoken Language, Manual. Minneapolis, MN: Pearson Assessments.

Dunn, L.M., Dunn, D. M. (2007). Peabody Picture Vocabulary Test fourth edition, Manual. Minneapolis, MN: Pearson Assessments.

How to Measure Expressive Vocabulary in 20 Minutes or Less

Ask author Dr. Kathleen Williams to name one of her main goals in developing the Expressive Vocabulary Test, Second Edition (EVT-2) and she’ll tell you, “Getting people to understand how important it is to measure vocabulary. Test results can predict success in learning to read and comprehend text, help clinicians understand why students have difficulty with oral instructions, and provide a screen for other potential academic difficulties.”

“Test results can predict success in learning to read and comprehend text, help clinicians understand why students have difficulty with oral instructions, and provide a screen for other potential academic difficulties.”
—Dr. Kathleen Williams

Five years in the making, the second edition of the highly respected EVT offers valuable contemporary enhancements, yet it remains as quick, easy, and reliable as ever. As a result, clinicians can spend less time in testing and more time in intervention. The EVT-2 measures expressive vocabulary and word retrieval for those aged 2½ to 90+ years and requires no reading or writing. It is conormed with the PPVT™-4 test, offering an unbeatable system for comparing receptive and expressive vocabulary, pinpointing students’ strengths and weaknesses, and identifying potential word retrieval concerns. Like the PPVT-4 assessment, the EVT-2 is an individually administered, norm-referenced test. It meets the needs of both general and special education professionals for vocabulary and language screening and assessment, progress measurement, Reading First goals, and fits all three tiers of the RTI model. In the same way that the PPVT-4 test can be used to support reading goals, the EVT-2 assesses oral expression as a foundation of writing skills.

Dr. Williams says, “With this new edition, we wanted clinicians to be able to test and retest. Therefore, the EVT-2 now includes two parallel forms.”

The EVT-2 also offers an updated and expanded item pool that incorporates recommendations from users and expert reviewers. New items use the statistical information that had been gathered on the previous edition and reflect changes in language usage.

“There are now 190 items in each form, giving us the chance to do even more with an assessment,” notes Dr. Williams. “For instance, there is a study showing that when a child is exposed to a lot of dialogue in the home, we can observe a ‘language bridge,’ which indicates the child is having expanded experiences with words. If the parent says ‘put on your red shirt instead of your blue shirt;’ or ‘I’ve peeled two potatoes and now I need to peel two more,’ the child is learning colors, numbers, and verbs, not just naming nouns. Since the EVT-2 now includes a wider variety of words, the clinician can get a broader sample of the child’s oral language experiences and vocabulary knowledge.”

Dr. Williams believes clinicians will also appreciate the five methods of qualitative analysis. “I hope this breadth of data motivates the clinician to look beyond the score. As an example, if the examiner administers both the PPVT-4 and the EVT-2, and the expressive score is higher than the receptive score, the results could indicate that rather than being delayed, the child simply has different labels for words and concepts. By allowing the child to speak, the child can demonstrate his or her unique vocabulary knowledge. The clinician may discover, for instance, that the child has never heard of a sofa, but perhaps knows the item as a couch or divan. When looking at the different ways a child labels things, I’d also encourage the clinician to also consider background or ethnicity. In this way, we can build on differences, instead of viewing the differences as a deficit.”

“. . . we can build on differences, instead of viewing the differences as a deficit.”
—Dr. Kathleen Williams

What else is new in the EVT-2? Stimulus questions are now included in the record form for more precise administration. Growth scale values (GSVs)—a new metric for easily measuring progress over time—have been added. There is a larger easel format (8½ inches by 11 inches) and core vocabulary has been modernized. Additionally, the artwork is realistic and more up to date, offering an exceptional balance of gender and race/ethnicity.

“The art is so appealing that I think a child might like to sit and look thorough the EVT-2 easel like a picture book,” says Dr. Williams. “The illustrations are all by the same artist, and I think this helps. Best of all, the art really holds a child’s attention.”

“The art is so appealing that I think a child might like to sit and look thorough the EVT-2 easel like a picture book.”
—Dr. Kathleen Williams

There is another outstanding new feature that Dr. Williams is especially pleased with. “It might not be as obvious to clinicians as other benefits, but AGS/Pearson did a great job in sampling. Age and grade norms are from the same pool. In essence, this means the grade norms are a subset of the age norms. I’ve been in this field for a long time and I’ve never seen this happen.

“Also, true to the way we build tests, there is a two-way control. Besides a match of ethnicity, and so forth, there is a cross-match with SES. Other publishers do not always do this when building tests.”

In the future, Dr. Williams envisions the possible creation of components to complement the EVT-2. “Maybe we could build an adjunct piece that takes the form of an easel with criterion-referenced items. For instance—if you got an indication that a child has a problem with colors, there could be a tab on the easel that assesses whether he or she knows other color words or how many are known. The easel could include other categories like food items, clothing, or body parts. If you are testing to see how a 15-year-old child with a disability is functioning—body parts are so important. Or, look at the other end of the spectrum—knowing adjectives and adverbs is vital. Otherwise, how are you going to read or write effectively in college or at work?”

The EVT-2 is built on the strength of the EVT and is filled with new enhancements. However, Dr. Williams says there is more to this new edition than first meets the eye. “If we just stop and think, we discover all the many, many benefits of vocabulary assessment.”

Dr. Doug Dunn Discusses The PPVT-4

The fourth version of the Peabody Picture Vocabulary Test (PPVT–4 instrument) will be released in November 2006. The most widely used norm-referenced test of receptive vocabulary, this edition continues a nearly half-century tradition of providing unparalleled vocabulary assessment.

The renowned PPVT assessment was created by Dr. Lloyd Dunn, who died in April 2006. His son Dr. Doug Dunn is the coauthor of the fourth edition and is eager to continue his father’s well-known practice of providing reliable, valid, and comprehensive assessment tools and educational programs for children.

“What makes this version of the test especially noteworthy is that it is Dad’s last,” said Dr. Doug Dunn. “It was his overwhelming desire to see that everything was done as well as possible—the art, standardization, norming, even the manual. Dad’s legacy to the future is a true focus on building a quality product. He wanted each version to be better and more useful.”

“Dad’s legacy to the future is a true focus on building a quality product. He wanted each version to be better and more useful.”
—Dr. Doug Dunn

The PPVT–4 instrument includes several new features. The format of the portable easel is a larger 8 ½ in. by 11 in. Core vocabulary has been updated with more stimulus words (now 228 per form). Users will find better representation of word types across all levels of difficulty, including very easy items to strengthen the floor of the test. What’s more, the PPVT–4 scale features updated, realistic art, now in full color.

“Dad always wanted a version in color,” says Dr. Dunn. “The color was carefully done to be very appealing. The balance of pictures—the ethnicity and gender role models—is greatly enhanced. Nothing was overlooked. For instance, to accommodate those with color blindness, the colors are vivid, and there are dark outlines around the art.

“Today, subjects are more engaged in media than ever before, as are teachers and other administrators. I believe the color in the PPVT–4 will go a long way toward holding interest and attention, which in turn will help provide a better assessment of vocabulary achievement.”

“I believe the color in the PPVT–4 will go a long way toward holding interest, which in turn will help provide a better assessment of vocabulary achievement.”
—Dr. Doug Dunn

As with previous editions, the PPVT–4 instrument is well suited for a wide range of users. Designed for ages 2 years 6 months through 90+, and to meet the needs of general and special education professionals, this versatile instrument is appropriate for screening and diagnostic purposes, progress monitoring, and meeting Reading First goals. The test is also applicable to all three tiers of the response to intervention (RTI) model. And like its predecessors, the PPVT–4 scale is quick and easy to use. It can be given in just 10 to15 minutes and requires no reading or writing.

Dr. Dunn says, “In this version, virtually everything is improved, including the packaging. For users who are itinerant, the newly designed carrying case makes it easy for them to travel with everything they need. And for those who work in one location, the packaging also works well on a bookcase.”

“In this version, virtually everything is improved, including the packaging.”
—Dr. Doug Dunn

A trained statistician, Dr. Dunn says, “It has been great to work with Dr. Mark Daniel and look at all aspects of the data analysis, including measures of bias and performance.”

A dedication to meeting cultural sensitivity goals has always guided the development of the test. Building on this commitment, special attention was paid to ensuring a balanced representation of sex and race/ethnicity in the illustrations. All items were reviewed by a panel of 15 professionals, and any items showing evidence of bias were eliminated.

Dr. Dunn adds, “The breadth of data in the norming sample matches the U.S. Census (with less than a 1% variance between the norms and the general population). As it was, we had more cases than we needed in some sections of the sample—we didn’t have to extrapolate information in any cross section of the population. This makes the test very current. Besides increasing items at the lower end, we also added items at certain critical levels to match vocabulary growth patterns. The basal and ceiling rules remain the same so that there will be no increase in the time to administer the test.”

Other new features were introduced to make it easier to measure progress, a practice that has gained increased emphasis in professional and government guidelines. Specifically, there have always been two parallel forms to facilitate retesting. However, this version of the test, as well as the PPVT-III instrument, now includes a growth scale values (GSVs), a new metric for measuring progress over time.

The ASSIST™ scoring software has been rebuilt, allowing the user to include an unlimited number of examinees in one database. Significantly enhanced ASSIST reporting options let users run reports by age and grade, as well as by a wide variety of subclassifications.

Dr. Dunn says he looks forward to personally adding to the growth and quality of the test. “In the years to come, I hope to contribute enhancements in the area of ‘mechanization of process’—in other words, technology solutions that focus on the administration, scoring and reporting of the test. Think speed. Think of making the test available to all teachers and other administrators using the best of mechanization.

“Dad built the foundation on which the test has been developed, and he was involved in the PPVT–4 all way to the end. He looked at everything, literally up to the last week of his life. His vision was truly ongoing.”

Lloyd Dunn, Author of Peabody Picture Vocabulary Test, Dies at 89

Bloomington, Minnesota, April 18 – Pearson Assessments today announced that well-known author Dr. Lloyd Dunn died April 6, at age 89, in Las Vegas, Nev. Dunn created some of today’s most respected and widely used vocabulary assessments and instructional programs. His Peabody Picture Vocabulary Test (PPVT) has been a leader in educational testing for nearly 50 years. Born in Canada in 1917, Dunn obtained bachelor’s and master’s degrees from the University of Saskatchewan. He later taught grades 1 through 12 and served as a principal in the Saskatchewan public schools. He completed a Ph.D. program in special education and psychology at the University of Illinois. After joining the faculty at Peabody College in Nashville, Tenn., he developed the first Peabody Picture Vocabulary Test (PPVT). The test was published in 1959 by AGS Publishing, known then as American Guidance Service, Inc. AGS Publishing was acquired by educational publisher Pearson in 2005.

Co-founder and past president of AGS Publishing John Yackel said, “I took a prototype of the PPVT to the 1960 American Speech and Hearing Association (ASHA) convention. People were swarming to our booth to see it. The test could be given quickly and easily and correlated very highly with vocabulary on the Stanford-Binet [intelligence scale]. This meant you could get a measure of vocabulary ability without administering a time-consuming test. You knew it was the start of something very important, and indeed later this year, the test will be released in its fourth edition.” Another article about Dr. Lloyd Dunn and his extensive career in educational measurement can be found on the ADVANCE Newsmagazine Web site.

Dunn served on an education advisory panel for President John F. Kennedy in the early 1960s, which identified a need for programs that could enhance communication skills. His newly published Peabody Language Development Kits (PLDK) were ideal for meeting this goal. Yackel said, “The Elementary and Secondary Education Act (ESEA) was passed in 1965 and the PLDK Level 1 was released just months later. The demand for this new language program was phenomenal.”

Yackel credited the early success of AGS Publishing to its association with Dunn. “Lloyd and I always said, ‘AGS made Lloyd Dunn and Lloyd Dunn made AGS.’ Lloyd himself was special, one of the most inquisitive and insightful people I’ve ever worked with.”

Over the course of his career, Dunn received many honors. He was senior past president of the Council for Exceptional Children and a fellow of the American Psychological Association. He was director of Peabody’s Mental Retardation Research Training Program, the first doctoral program in the nation for training researchers in this field. He conceived of the Institute on Mental Retardation and Intellectual Development, which was founded at Peabody in 1965, and was its first director. IMRID made major contributions to behavioral research in mental retardation.

Besides the Peabody Picture Vocabulary Test (PPVT) and the Peabody Language Development Kits (PLDK), Dunn also co-authored the Test de Vocabulario en Imágenes Peabody (TVIP), the Peabody Early Experiences Kit (PEEK), the Peabody Articulation Decks (PAD), and The Picture File, as well as college textbooks, including Exceptional Children in the Schools.

Dr. Ronald Goldman, author of several tests for speech and language and special education, said, “Lloyd Dunn was a pioneer in the area of special education. His creativity provided the profession with some of the most acclaimed language assessment tools and language intervention programs. While directing the IMRID project at Peabody, he served as a mentor to many professionals who have become current leaders and major contributors in special education. His foresight, productivity, and leadership have left an indelible mark on the provision of services for children with disabilities.”

Pearson Assessments will release the fourth edition of the Peabody Picture Vocabulary Test (PPVT) at the end of the year. Dunn’s son, Dr. Doug Dunn, is the new co-author of the PPVT-4, thereby continuing his father’s tradition for providing reliable and valid comprehensive assessments and educational programs for children.

About Pearson Assessments

Pearson Assessments provides assessment instruments and data capture tools and technologies for use in education, business and health care settings. Backed by a half century of knowledge and expertise, Pearson Assessments – integrating Pearson NCS and the assessment division of AGS Publishing with the original Pearson Assessments business – offers products and services to deliver the accurate, reliable and usable information that professionals seek. Pearson Assessments is a business of Pearson Education, the world’s largest integrated education company, which in turn is part of Pearson, the international media company. Pearson’s other primary operations include the Financial Times Group and the Penguin Group.

PPVT-4 / EVT-2 standardization opportunities

Download this article in a printable PDF file format

Help make the tests you know and trust even better!

As many of you know, AGS Publishing is developing the Peabody Picture Vocabulary Test, Fourth Edition (PPVT-4) and the Expressive Vocabulary Test, Second Edition (EVT-2). PPVT-III is the leading measure of receptive vocabulary for Standard English. EVT is conormed with PPVT—the tests are standardized on the same population of examinees. This lets you make direct comparisons of receptive and expressive vocabulary. In addition, the national sample is stratified to match the most recent U.S. Census data on gender, race/ethnicity, region, and parent or self-education level as a measure of socioeconomic status. When complete, PPVT-4 and EVT-2 will continue to offer the ease, quality, and reliability you’ve come to expect—including fully updated conorms and other new advantages. Stay tuned!

There are still opportunities available for participating in the standardization of these products. The testing takes about 15 to 30 minutes. AGS Publishing will provide all testing materials, and will compensate the examiner, examinee, and school/organization. Currently, we are looking for examiners who have access to and would be willing to test individuals who have any of the following characteristics:

Examinees who have not been diagnosed with a disability:

  • Ages 2 1/2 through 5 – African American, Hispanic, Asian, or Native American children from low socioeconomic backgrounds.
  • Ages 19 through 24 – Individuals who have an education level of high school or less and are not presently in school.
  • Ages 25 through 90+ – African American, Hispanic, Asian, or Native American adult males who have not graduated from high school.

Examinees who have been diagnosed with one of the following disabilities:

  • Mild Mental Retardation – Ages 6:0 through 17:11.
  • Emotional Disturbance/Behavioral Disturbance/Serious Emotional Disturbance – Ages 7:0 through 12:11.
  • Hearing Impairment without Cochlear Implant – Ages 4:0 through 12:11. Individuals must have mild to moderate hearing loss (40 to 55 dB) and the ability to function within a mainstream environment for most or all of the educational day.
  • Hearing Impairment with Cochlear Implant – Ages 4:0 through 12:11.
  • Language Disorder (Adult Aphasia without Head Injury) – Ages 50 through 90+.
  • Speech Impairment (Adult Dysarthria without Head Injury) – Ages 50 through 90+.

GRADE correlation study

Group Reading Assessment and Diagnostic Evaluation (GRADE) is the assessment that provides key formative and summative reading information, helping you monitor adequate yearly progress. GRADE gives you accurate, easy-to-read results at the individual, class, and school levels.

We are looking for sites that can help with a correlation study of PPVT-4/EVT-2 with GRADE. Consider partnering with your general education colleagues to complete this study with us.

Each examinee must complete the following within 4 weeks of each other:

  • One form of PPVT-4 followed by one form of EVT-2. Administration time for this testing is estimated at 15 to 30 minutes, depending on the examinee’s age.
  • The appropriate level GRADE test. Administration time for this testing is estimated at 60-90 minutes, depending on the examinee’s age.

If you are interested in participating in any of these studies, please contact Renee Vraa at reneev@agsnet.com or 800-328-2560, ext. 7311. Thank you for your consideration!

A Brief History of Sound Reading Solutions

Download this article in a printable PDF file format

Sound Reading Solutions was founded and is directed by Bruce Howlett. For over a decade, Howlett worked with a member of the National Academy of Science at Cornell University in biochemical and molecular research. In preparation for teaching graduate students, Howlett took science education courses, which led to a teaching position at a school for emotionally-disturbed teens. None of the students read well enough to use high school level texts. Hoping to understand how these students might learn to read Howlett became certified as a special education teacher.

As he studied, Howlett was shocked by the flimsy research foundation underlying reading instruction. After working as a middle school special education teacher, it became apparent to him that students who didn’t learn to read in elementary school were treated as second class students. Upon transferring to an elementary school in the Fall of 1997, Howlett was promptly handed 24 third and fourth graders whose reading skills were low level at best—despite instruction in some of the most well-known reading methods in the field.

Nothing in Mr. Howlett’s training had prepared him for these students, and when a bout with the flu left him with time away from work, he spent two days reading materials he had gathered from scientific journals, including the Shaywitz papers and Adams’ Beginning to Read. The disconnect between research and practice was frightening. Howlett spent the next year reading over 100 papers and developing working relationships with speech-language pathologists (SLPs). The following school year, Howlett shared 18 students with SLP Nancy Williams, who helped him apply methods from the speech-language-hearing field into reading remediation. Howlett and Williams placed their information on a Web site and were flooded with requests and suggestions for new methods. They hand-assembled these materials and gave them away, until the copy machine at Howlett’s school broke down and he paid a printer to make the copies. This began Sound Reading Solutions in 1998.

Sound Reading Solutions evolved into a network of SLPs and reading teachers. Currently, Sound Reading Solutions functions as a center for twenty SLPs; reading, ESL and special education teachers; and programmers who create highly effective, easy to use materials that have a measurable impact on the reading abilities of students. Sound Reading Solutions’ mission is to provide advanced programs, software and reading practice for literacy development, improvement and intervention that meet the needs of our diverse, multilingual population. Sound Reading programs, readers and software are currently being used by thousands of students in public schools and many home schools throughout the United States and Canada.

For more information about Sound Reading Solutions and their products, visit www.soundreading.com.