Pearson













Subscribe to SLP eNEWS - our FREE monthly eNewsletter. Enter your e-mail address:





Home | Clinical Café Archive | February 2008

What is your (sample) size?

February 2008 Clinical Café
By Jeff Evans, MS, CCC-SLP

Why do some tests have small samples, and others have huge samples; why is that OK? When talking about sample sizes in test development the conversation could quickly go statistical. That won’t happen here. Rather, this explanation is intended to be general in nature, and touch on some of the basic considerations for determining how sample sizes are determined for norm-referenced tests.

Several considerations go into determining sample size selection. These are:

  • The type of scores to be reported, whether
    • in broad ranges (e.g., below average, average, above average), or
    • on a continuum (e.g., age and grade equivalents)
  • The breadth of the scale, determined by the degree of behavioral variation seen in the population (i.e., low variation, high variation)
  • How the scores will be used (screener vs. placement; high stakes vs. low stakes)

In general, it’s preferable to have at least 100 persons contributing to each characteristic that defines the norm group (such as age, grade level, gender, race, SES, etc.) particularly when measuring constructs that are developmentally sensitive. As age increases variations in population become more stable, so fewer persons are required to obtain a statistically stable sample.

Let’s use two recently introduced Pearson tests as examples: the PPVT™-4 test and the OASES™ test. These products are quite different in their uses of scores and designs. The PPVT-4 is a vocabulary test, normed by age, and by grade in the spring and fall. It is intended primarily for use with children in early grades, but is also normed through age 81+ years. Scores are reported in standard score metric. The OASES is a product for adults who stutter, and is a self-rated survey. It has a criterion-referenced norm basis, meaning that scores fall into several (5) broad ranges of severity.

The PPVT-4 has a total sample size of 3540. The sample is broken out into many demographic variables, all of which closely mirror the U.S. population as a whole. There are 26 groupings by grade and 28 groupings by age. The age groups average126 persons. This many subjects was preferable because of the many combinations of age, grade, and demographic variables being considered, and the fact that the PPVT-4 measures abilities in children, which vary considerably particularly in the early grades. In addition, the PPVT-4 is often used to determine programming for children in schools.

In contrast, the OASES test has a total sample size of 173, which range more or less equally in age between 18 and 73. Why is this OK? Isn’t a bigger sample always better?

The OASES reports scores within broad ranges (Mild, Mild/Moderate, Moderate, Moderate/Severe, Severe). It is measuring a relatively narrow band of human behavior and reactions to those behaviors (stuttering), although variations among people who stutter can be large. The test is intended to be informative rather than diagnostic. The stakes associated with receiving one score or another on the OASES do not lead directly to a classification or impose life changes on the person. Additionally, stuttering is not an age-based or developmentally sensitive disorder. It affects adults of all races, gender, religious beliefs, geographic areas and socio-economic backgrounds more or less equally. So, these differences did not need to be accounted for in the sample. The OASES sample can be characterized as a clinical sample of adults who stutter, where age and clinical membership were the only variables that really mattered. So, a sample of 173 was sufficient, because the OASES is a “low stakes” kind of test, providing scores within broad ranges, and the scores on the test are relatively low in variation.

What does all of this mean for your practice? The sample sizes of quality testing tools can vary depending upon how the test’s scores will be used, and the variations in the population it measures. When the range of scores is expected to be wide within the population being measured, especially for children, a larger sample size is necessary. A test that is more focused, where people are expected to score within a narrower range, such as a screening test, may show very good validity and reliability results with a relatively small sample size. In all cases, the publisher should explain the reasons for their sample selection clearly, and with simple tables, in the test manual.

I hope this brief introduction helps you understand some basics of how norm sample sizes are determined for the tests you use. You should be able to find detailed information about sample size in the tables of your test manual (see http://www.speechandlanguage.com/cafe/june2007.asp).

The next level of detail about sample size in tests includes concepts like: standard distributions, confidence intervals, confidence levels, statistical significance, standard deviation, mean scores. But let’s save that for another day!


SLP Discussion Center

As always, we'd like to thank you for your ongoing service to people with communication needs and to remind you that we are here to support you in that effort. If you'd like to discuss this topic further, please feel free to use the SLP Discussion Center as the vehicle for an ongoing discussion with your colleagues. Should you have questions regarding these or other Pearson Speech and Language products, we welcome your phone calls at 800-627-7271 or use our web site at http://ags.pearsonassessments.com.





Pearson AGS Assessments are now part of Pearson's Assessment group,
Phone: 800.627.7271    |    Inquiries: pearsonassessments@pearson.com

ASHA Partnership
© 2005-2007, Pearson Education or its affiliates. All rights reserved.
Privacy Policy    |    Terms & Conditions